Super Data Science: ML & AI Podcast with Jon Krohn

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

10 snips
Mar 27, 2026
A hard Sudoku benchmark exposes a major weakness in top transformer models while a new post-transformer BDH architecture nails 97.4% accuracy. Discussion covers why Sudoku is a powerful test of constraint-solving and reasoning. Hear what BDH changes about model state, sparsity, and continual learning to achieve far better efficiency and scaling for reasoning tasks.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Attention Head Ceiling Limits Nuance

  • Attention head vector dimension has plateaued around ~1,000, capping transformer's representational nuance.
  • Adrian Kosovsky framed this as a fundamental ceiling on transformer's capacity for nuanced reasoning.
INSIGHT

BDH Uses A Latent Reasoning Space

  • Pathway's BDH maintains a larger latent reasoning space that doesn't have to verbalize every intermediate thought.
  • BDH is described as a native reasoning model enabling internalized parallel reasoning like a chess grandmaster.
ANECDOTE

ClaudeCode Debugged A Subtle Data Leak

  • Jon Krohn recounts using ClaudeCode to debug a training pipeline and find a subtle data-leak bug.
  • Claude processed the repo, identified the leak, and wrote the fix, illustrating strong internalized code reasoning.
Get the Snipd Podcast app to discover more snips from this episode
Get the app