
Beyond The Pilot: Enterprise AI in Action Inside LinkedIn’s AI Engineering Playbook
Jan 21, 2026
Erran Berger, VP of Product Engineering at LinkedIn who led distilling large LLMs into ultra-efficient production models. He reveals how LinkedIn distilled 7B models down to 600M students, the multi-teacher split for policy vs. clicks, synthetic GPT-4 golden datasets, and the 10x latency savings from pruning, quantization, and context compression. He also explains the org shift to eval-first product design.
AI Snips
Chapters
Transcript
Episode notes
Why Off The Shelf LLMs Failed For Search
- LinkedIn found off-the-shelf LLMs and prompting couldn't meet their recommender quality or latency needs for search with tens of millions of daily users.
- Erran Berger says search required fine-tuning and distillation because large models were too compute-intensive and slow for production at LinkedIn scale.
From Product Policy To Synthetic Data Cookbook
- LinkedIn turned a 20–30 page product policy and a small human-labeled golden dataset into a large synthetic dataset using GPT to teach scoring rules.
- They trained a ~7B teacher on that synthetic set, then distilled further for production.
Multi Stage Distillation For Efficiency
- Distillation used staged compression: 7B teacher → 1.7B intermediate → 0.6B student to balance training efficiency and quality.
- Erran explains intermediate models speed iterative student training while minimizing quality loss.
