Deep Papers

A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I

Dec 27, 2023
ML Solutions Architect Dat Ngo and Product Manager Aman Khan discuss the new models Gemini and Mixtral-8x7B. They cover the background and context of Mixtral, its performance compared to Llama and GPT3.5, and its optimized fine-tuning. Part II will explore Gemini, developed by DeepMind and Google Research.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Train Longer To Cut Inference Cost

  • Overtraining on more tokens lets models compress knowledge into fewer parameters.
  • That compression reduces inference cost and latency, enabling smaller fast models like Llama 2 variants.
INSIGHT

Mistral's Founding Insight

  • Mistral AI was founded by researchers behind Llama 2 and Chinchilla insights.
  • They combined scaling and architecture optimizations to build high-performing smaller models.
ADVICE

Use Grouped Query Attention

  • Use grouped query attention to compute attention for several queries together to save memory bandwidth.
  • Keep accuracy while reducing per-token computation for faster inference.
Get the Snipd Podcast app to discover more snips from this episode
Get the app