ThursdAI - The top AI news from the past week

đź“… ThursdAI - Live @ NeurIPS, Mixtral, GeminiPro, Phi2.0, StripedHyena, Upstage 10B SoTA & more AI news from last (insane) week

Dec 14, 2023
This podcast covers a range of interesting topics including Open Source LLMs, Mixtral MoE, Mistral 0.2 instruct, Upstage Solar 10B, Striped Hyena architecture, EAGLE decoding method, Deci.ai's new SOTA 7B model, Microsoft's Phi 2.0 weights, QuiP LLM quantization & Compression, and Gemini Pro access over API.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Mixtral Sparse Architecture Explained

  • Mixtral uses sparse architecture with a router network activating two experts per token for inference.
  • Each expert is a dense 7B model, but only part activates per token, making inference efficient.
ADVICE

Tips on Using Mixtral Models

  • Use Mixtral’s API for better inference speed and efficiency instead of loading full model locally.
  • Consider fine-tuning tools like Axolotl for adapting mixture of experts models properly.
INSIGHT

Hyena Architecture Challenges Transformers

  • New Hyena architecture from Together and state space models offer a promising alternative to transformers.
  • These can drastically speed up longer sequence processing, by 20x to 100x, improving inference efficiency significantly.
Get the Snipd Podcast app to discover more snips from this episode
Get the app