
Vanishing Gradients Episode 26: Developing and Training LLMs From Scratch
May 15, 2024
Sebastian Raschka discusses developing and training large language models (LLMs) from scratch, covering topics like prompt engineering, fine-tuning, and RAG systems. They explore the skills, resources, and hardware needed, the lifecycle of LLMs, live coding to create a spam classifier, and the importance of hands-on experience. They also touch on using PyTorch Lightning and fabric for managing large models, and reveal insights on techniques in natural language processing models and evaluating LLMs for classification problems.
AI Snips
Chapters
Transcript
Episode notes
LLMs Are Everyday Productivity Tools
- Large language models are tools you use daily for writing, coding, and ideation because they directly speed up many small tasks.
- Sebastian finds their everyday utility and observability (from gibberish to coherent text) a major motivation for working on them.
Learn The Full LLM Lifecycle First
- Learn the full LLM lifecycle so you can choose the right entry point for your project and avoid unnecessary costs.
- Consider architecture, pretraining, fine-tuning, RAG, deployment, and monitoring when scoping any LLM work.
Continued Pretraining Can Use Data Better
- Continued pretraining can be more data-efficient than instruction fine-tuning because next-token loss exposes more prediction targets per example.
- Both strategies have pros and you can combine continued pretraining with instruction fine-tuning for best results.

