
Mixture of Experts AI on IBM z17, Meta's Llama 4 and Google Cloud Next 2025
10 snips
Apr 11, 2025 Join IBM trailblazers Hillery Hunter, an IBM Fellow and CTO of IBM Infrastructure, Shobhit Varshney, Head of Data and AI for the Americas, and Kate Soule, Director of Technical Product Management at Granite, as they dive into the launch of IBM z17 with its cutting-edge AI capabilities. Explore the unveiling of Meta's Llama 4, the innovations at Google Cloud Next, and the evolving perceptions of AI from the Pew Research Center. They tackle everything from zero downtime in financial transactions to AI's role in entertainment and industry dynamics.
AI Snips
Chapters
Transcript
Episode notes
Real-time LLM Integration
- Shobhit Varshney shared an anecdote about a credit card company struggling with fraud detection due to LLM latency.
- The new Z series mainframe addresses this by enabling real-time LLM integration into transaction flows.
Llama 4's Open Source Impact
- Meta's release of Llama 4, including a large mixture-of-experts model, may put pressure on closed AI labs.
- Kate Soule suggests this could lead to wider community support for this architecture.
Mixture of Experts Efficiency
- Mixture of Experts architecture offers inference efficiency, particularly at low batch sizes.
- Kate Soule emphasizes the need for broader community support to enhance its tooling and application.

