Vanishing Gradients

Episode 26: Developing and Training LLMs From Scratch

May 15, 2024
Sebastian Raschka discusses developing and training large language models (LLMs) from scratch, covering topics like prompt engineering, fine-tuning, and RAG systems. They explore the skills, resources, and hardware needed, the lifecycle of LLMs, live coding to create a spam classifier, and the importance of hands-on experience. They also touch on using PyTorch Lightning and fabric for managing large models, and reveal insights on techniques in natural language processing models and evaluating LLMs for classification problems.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

LLMs Are Everyday Productivity Tools

  • Large language models are tools you use daily for writing, coding, and ideation because they directly speed up many small tasks.
  • Sebastian finds their everyday utility and observability (from gibberish to coherent text) a major motivation for working on them.
ADVICE

Learn The Full LLM Lifecycle First

  • Learn the full LLM lifecycle so you can choose the right entry point for your project and avoid unnecessary costs.
  • Consider architecture, pretraining, fine-tuning, RAG, deployment, and monitoring when scoping any LLM work.
INSIGHT

Continued Pretraining Can Use Data Better

  • Continued pretraining can be more data-efficient than instruction fine-tuning because next-token loss exposes more prediction targets per example.
  • Both strategies have pros and you can combine continued pretraining with instruction fine-tuning for best results.
Get the Snipd Podcast app to discover more snips from this episode
Get the app