How to Minimize the Puppy Period

This is this period of time where the reinforcement learning agent learn to a point that the behaviors start becoming acceptable. And so inevitably, in this example with your speaker, you would have to be patient, right? You'd have to go through situations when you're just sitting and the volume goes up and you have to click a button or say something to enact that punishment. But it will need to explore before it can actually come up with the right solution. So how long would it take for that exploration to occur? Exactly. Some applications you don't care about like how long it takes. It's usually it might be an order of minutes or hours and you can basically let the agent learn and

Play episode from 43:57

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app