
Mixture-of-Experts and Trends in Large-Scale Language Modeling with Irwan Bello - #569
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
00:00
Exploring Sparse Expert Models
This chapter examines the mechanics of sparse expert models in neural networks, highlighting their ability to dynamically select relevant parameters for individual inputs to enhance capacity without increasing computational costs. It focuses on the mixture-of-experts approach, where a router network determines which expert(s) to engage, improving performance and scalability. Additionally, the chapter addresses challenges in quantization and stability in training, alongside strategies to optimize model efficiency.
Play episode from 01:53
Transcript


