
Chroma | Context Engineering Drew Breunig
16 snips
Dec 11, 2025 Drew Breunig, a writer focused on applied AI and context engineering, joins Jeff Huber to explore his passion for AI writing. They discuss the tension between hype and rigorous research in AI narratives. Drew shares insights on the impact of the Gemini 1.5 paper, the ‘Karpathy Effect,’ and the importance of harness design for optimizing model performance. He critiques the challenges of 'black box' memory and proposes better context management. The conversation dives into the future of multi-agent systems and fosters trust through thoughtful UX in AI.
AI Snips
Chapters
Transcript
Episode notes
Long Contexts Aren't A Panacea
- Large model papers rarely reveal the full harness; revealed failures (like Gemini's >200k token issues) expose limits.
- Drew Breunig realized long context isn't a panacea and started framing Context Engineering around those failures.
Benchmarks Show Peak, Not Practice
- Benchmarks are a peak-performance metric, not daily reality, and can be gamed by harnesses and tools.
- Omar Khatab's weightlifting metaphor shows benchmarks reveal theoretical maximums, not representative reliability.
Build Personal, Reusable Benchmarks
- Create small personal benchmarks by reusing modular test programs to evaluate new models quickly.
- Use tools like DSPy to swap models into the same tasks and compare performance consistently.

