
Super Data Science: ML & AI Podcast with Jon Krohn A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)
10 snips
Mar 27, 2026 A hard Sudoku benchmark exposes a major weakness in top transformer models while a new post-transformer BDH architecture nails 97.4% accuracy. Discussion covers why Sudoku is a powerful test of constraint-solving and reasoning. Hear what BDH changes about model state, sparsity, and continual learning to achieve far better efficiency and scaling for reasoning tasks.
AI Snips
Chapters
Transcript
Episode notes
Attention Head Ceiling Limits Nuance
- Attention head vector dimension has plateaued around ~1,000, capping transformer's representational nuance.
- Adrian Kosovsky framed this as a fundamental ceiling on transformer's capacity for nuanced reasoning.
BDH Uses A Latent Reasoning Space
- Pathway's BDH maintains a larger latent reasoning space that doesn't have to verbalize every intermediate thought.
- BDH is described as a native reasoning model enabling internalized parallel reasoning like a chess grandmaster.
ClaudeCode Debugged A Subtle Data Leak
- Jon Krohn recounts using ClaudeCode to debug a training pipeline and find a subtle data-leak bug.
- Claude processed the repo, identified the leak, and wrote the fix, illustrating strong internalized code reasoning.
