The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731

298 snips

May 13, 2025

Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, dives into the innovative world of reinforcement learning (RL) and its impact on AI agents. He highlights the importance of data curation and evaluation, asserting that RL outperforms traditional prompting methods. The conversation touches on limitations of supervised fine-tuning, reward-shaping strategies, and specialized models like MiniCheck for hallucination detection. Mahesh also discusses tools like Curator and the exciting future of automated AI engineering, promising to make powerful solutions accessible to all.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Leverage Curator for Data Curation

Use the open source library Curator to seamlessly curate data for fine tuning models.
Curator supports batch mode data generation, filtering, visualization, and quality checking at scale.

INSIGHT

Reasoning Data Unlocks Advanced Models

Reasoning data is essential for training models that perform multi-step reasoning tasks.
Bespoke Labs created the Open Thoughts dataset and Open Thinker model to boost reasoning capabilities surpassing past baselines.

INSIGHT

RL Connects Full Feedback Loop

RL embeds feedback into models, enabling them to learn good and bad paths rather than relying on brittle prompt instructions.
RL closes the feedback loop which prompting alone cannot achieve effectively.

Get the Snipd Podcast app to discover more snips from this episode

Get the app