
The a16z Show What's Missing Between LLMs and AGI - Vishal Misra & Martin Casado
432 snips
Mar 17, 2026 Vishal Misra, Columbia professor and AI researcher, digs into how transformers may update predictions like Bayesian math. He explores why that still falls short of consciousness. The conversation turns to what AGI would really need: continual learning, causal reasoning, new abstractions, and why scaling alone won’t get us there.
AI Snips
Chapters
Transcript
Episode notes
The Cricket DSL That Pushed GPT-3 Into Production
- Vishal Misra got GPT-3 to translate English cricket-stat questions into a DSL it had never seen and shipped it at ESPN.
- He built semantic retrieval over 1,500 examples, effectively an early RAG system, then set out to understand why it worked.
LLMs Work Like A Sparse Prompt To Token Matrix
- Misra models an LLM as a gigantic sparse matrix from prompt rows to next-token probability distributions.
- The prompt protein shifts probability toward synthesis or shake, and that single token radically changes the entire continuation distribution.
In Context Learning Behaves Like Bayesian Updating
- In-context learning looks like Bayesian updating because each example changes the model's next-token beliefs in real time.
- In Misra's cricket DSL tests, English-token probabilities fell while DSL-token probabilities rose with each demonstration until the correct output became nearly certain.

