Interconnects

Interviewing Sebastian Raschka on the state of open LLMs, Llama 3.1, and AI education

10 snips
Aug 1, 2024
Sebastian Raschka, a staff research engineer at Lightning AI and AI educator, dives into the dynamic landscape of open language models. He discusses the evolution of Llama 3.1 and its implications for AI research. Sebastian shares insights from his experience as an Arxiv moderator, shedding light on the challenges of navigating academic papers. The conversation also covers advancements in model training techniques, the importance of ethics in AI, and how open access enhances AI education. Tune in for a fascinating look at the future of AI and language models!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Implement One Model End To End First

  • Start by implementing one clear architecture (GPT-2) end-to-end: data format, tokenization, forward pass, and next-token generation before exploring variants.
  • Then practice instruction fine-tuning and alignment (e.g., DPO) and only after that inspect architectural tweaks like activations or layer norms.
ADVICE

Use Small Models To Learn And Scale Up

  • Use tiny models (100M–1.5B) to learn and validate ideas locally; provide larger checkpoints for those with more compute.
  • Sebastian's book includes runnable examples so even a MacBook Air can run the smallest pretraining examples and A100/H100 use is only needed for larger models.
INSIGHT

Distillation Is Coming Back Practical

  • Distillation and synthetic-data-based 'distillation' are converging terms; true knowledge distillation stores teacher log-probabilities to train a smaller student efficiently.
  • Llama 3.1's license now permits using its outputs to improve other models, unlocking broader synthetic-data and distillation workflows.
Get the Snipd Podcast app to discover more snips from this episode
Get the app