The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Google I/O 2025 Special Edition - #733

340 snips
May 28, 2025
Logan Kilpatrick and Shrestha Basu Mallick from Google DeepMind dive into groundbreaking advancements from Google I/O 2025. They discuss the Gemini API's impressive features like thinking budgets and thought summaries, enhancing voice AI’s expressiveness with native audio output. The duo shares insights on the challenges of building real-time voice applications, including latency and voice detection. They also send a playful wish list for next year's event, dreamily aiming for enhanced language capabilities to foster global inclusivity.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Live API Requires Commitment

  • Using the live API requires a strong commitment to one provider's bespoke infrastructure.
  • Model-agnostic tools may ease switching and reduce risk for developers in future.
INSIGHT

Handling Complex Voice Workflows

  • Complex workflows in voice agents demand dynamic system instructions and multi-state management.
  • Longer sessions and agent handoffs require flexible API features.
INSIGHT

Gemini's Capability Integration

  • Gemini aims to merge multiple capabilities into one powerful model rather than splintering.
  • This fusion creates unexpected improvements like better video understanding through integrated reasoning.
Get the Snipd Podcast app to discover more snips from this episode
Get the app