
Deep Papers A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I
Dec 27, 2023
ML Solutions Architect Dat Ngo and Product Manager Aman Khan discuss the new models Gemini and Mixtral-8x7B. They cover the background and context of Mixtral, its performance compared to Llama and GPT3.5, and its optimized fine-tuning. Part II will explore Gemini, developed by DeepMind and Google Research.
AI Snips
Chapters
Transcript
Episode notes
Train Longer To Cut Inference Cost
- Overtraining on more tokens lets models compress knowledge into fewer parameters.
- That compression reduces inference cost and latency, enabling smaller fast models like Llama 2 variants.
Mistral's Founding Insight
- Mistral AI was founded by researchers behind Llama 2 and Chinchilla insights.
- They combined scaling and architecture optimizations to build high-performing smaller models.
Use Grouped Query Attention
- Use grouped query attention to compute attention for several queries together to save memory bandwidth.
- Keep accuracy while reducing per-token computation for faster inference.

