The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Mixture-of-Experts and Trends in Large-Scale Language Modeling with Irwan Bello - #569

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00

Exploring Sparse Expert Models

This chapter examines the mechanics of sparse expert models in neural networks, highlighting their ability to dynamically select relevant parameters for individual inputs to enhance capacity without increasing computational costs. It focuses on the mixture-of-experts approach, where a router network determines which expert(s) to engage, improving performance and scalability. Additionally, the chapter addresses challenges in quantization and stability in training, alongside strategies to optimize model efficiency.

Play episode from 01:53
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app