Super Data Science: ML & AI Podcast with Jon Krohn

A Post-Transformer Architecture Crushes Sudoku (Transformers Solve ~0%)

10 snips

Mar 27, 2026

A hard Sudoku benchmark exposes a major weakness in top transformer models while a new post-transformer BDH architecture nails 97.4% accuracy. Discussion covers why Sudoku is a powerful test of constraint-solving and reasoning. Hear what BDH changes about model state, sparsity, and continual learning to achieve far better efficiency and scaling for reasoning tasks.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Attention Head Ceiling Limits Nuance

Attention head vector dimension has plateaued around ~1,000, capping transformer's representational nuance.
Adrian Kosovsky framed this as a fundamental ceiling on transformer's capacity for nuanced reasoning.

INSIGHT

BDH Uses A Latent Reasoning Space

Pathway's BDH maintains a larger latent reasoning space that doesn't have to verbalize every intermediate thought.
BDH is described as a native reasoning model enabling internalized parallel reasoning like a chess grandmaster.

ANECDOTE

ClaudeCode Debugged A Subtle Data Leak

Jon Krohn recounts using ClaudeCode to debug a training pipeline and find a subtle data-leak bug.
Claude processed the repo, identified the leak, and wrote the fix, illustrating strong internalized code reasoning.

Get the Snipd Podcast app to discover more snips from this episode

Get the app