How to Map a Problem to Reinforcement Learning

Can you give us an example of mapping a problem? I understand this certain thing is that you can't share from your work and so on, but even just in general example, a problem and how you would map it to reinforcement learning. Sure, I would go back to my speaker example. If the speaker volume goes up while I expecting it to be very low, that's punishable. But how much, how do you define a metric of punishment? What if the smart speaker with this reinforcement learning agent increased the volume a little bit more than what I'm expecting in that environment? Now how do you punish that? Do you punish it or you basically ignore that setting? So basically it's

Play episode from 41:42

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app