
Mixture of Experts Episode 40: DeepSeek facts vs hype, model distillation, and open source competition
49 snips
Jan 31, 2025 In this engaging discussion, Kate Soule, Director of Technical Product Management at Granite, Chris Hay, Distinguished Engineer and CTO of Customer Transformation, and Aaron Baughman, IBM Fellow and Master Inventor dive into the realities behind DeepSeek R1. They debunk myths surrounding its hype and discuss the true implications of model distillation for AI competition. The trio explores the evolving landscape of open-source AI and how recent advancements can reshape industry strategy, shedding light on efficiency and innovation in model training.
AI Snips
Chapters
Transcript
Episode notes
Long-Chain-of-Thought's Importance
- Accurate long-chain-of-thought reasoning is crucial for model performance.
- Fine-tuning with accurate data, before applying RL, significantly speeds up the learning process.
Model Distillation Explained
- Model distillation involves transferring knowledge from a larger teacher model to a smaller student model.
- This allows for more efficient training and inference with the smaller model.
Open Source and Model Protection
- DeepSeek's open-source model with a permissive license challenges proprietary model protection.
- This model facilitates distillation, potentially eroding the competitive advantage of closed models.

