A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I

Dec 27, 2023

ML Solutions Architect Dat Ngo and Product Manager Aman Khan discuss the new models Gemini and Mixtral-8x7B. They cover the background and context of Mixtral, its performance compared to Llama and GPT3.5, and its optimized fine-tuning. Part II will explore Gemini, developed by DeepMind and Google Research.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Train Longer To Cut Inference Cost

Overtraining on more tokens lets models compress knowledge into fewer parameters.
That compression reduces inference cost and latency, enabling smaller fast models like Llama 2 variants.

INSIGHT

Mistral's Founding Insight

Mistral AI was founded by researchers behind Llama 2 and Chinchilla insights.
They combined scaling and architecture optimizations to build high-performing smaller models.

ADVICE

Use Grouped Query Attention

Use grouped query attention to compute attention for several queries together to save memory bandwidth.
Keep accuracy while reducing per-token computation for faster inference.

Get the Snipd Podcast app to discover more snips from this episode

Get the app