
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731
298 snips
May 13, 2025 Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, dives into the innovative world of reinforcement learning (RL) and its impact on AI agents. He highlights the importance of data curation and evaluation, asserting that RL outperforms traditional prompting methods. The conversation touches on limitations of supervised fine-tuning, reward-shaping strategies, and specialized models like MiniCheck for hallucination detection. Mahesh also discusses tools like Curator and the exciting future of automated AI engineering, promising to make powerful solutions accessible to all.
AI Snips
Chapters
Transcript
Episode notes
Leverage Curator for Data Curation
- Use the open source library Curator to seamlessly curate data for fine tuning models.
- Curator supports batch mode data generation, filtering, visualization, and quality checking at scale.
Reasoning Data Unlocks Advanced Models
- Reasoning data is essential for training models that perform multi-step reasoning tasks.
- Bespoke Labs created the Open Thoughts dataset and Open Thinker model to boost reasoning capabilities surpassing past baselines.
RL Connects Full Feedback Loop
- RL embeds feedback into models, enabling them to learn good and bad paths rather than relying on brittle prompt instructions.
- RL closes the feedback loop which prompting alone cannot achieve effectively.

