
LessWrong (30+ Karma) “Extracting Performant Algorithms Using Mechanistic Interpretability” by Ihor Kendiukhov
A Prequel: The Tree of Life Inside a DNA Language Model
Last year, researchers at Goodfire AI took Evo 2, a genomic foundation model, and found, quite literally, the evolutionary tree of life inside. The phylogenetic relationships between thousands of species were encoded as a curved manifold in the model's internal activations, with geodesic distances along that manifold tracking actual evolutionary branch lengths. Bacteria that diverged hundreds of millions of years ago were far apart on the manifold, and closely related species were nearby.
The model was trained to predict the next DNA token. Nobody told it about evolution or gave it a phylogenetic tree as a training signal. But the model needed to encode evolutionary relationships in order to predict DNA well, and so it built a structured geometric representation of those relationships as part of its internal computation, and the representation was good enough that you could extract it with interpretability tools and compare it meaningfully to the ground truth.
I saw this and decided to apply the same approach to another type of biological foundation models - those trained on single cell data.
If Evo 2 learned the tree of life from raw DNA [...]
---
Outline:
(00:12) A Prequel: The Tree of Life Inside a DNA Language Model
(01:30) Finding the Manifold
(04:09) But Does the Extracted Algorithm Actually Work?
(07:52) How Small Can You Go Though?
(10:14) So, Mechanistic Interpretability is Becoming Dual Use
(12:13) Mechanistic Intepretability for Novel Knowledge Discovery
(13:02) Join In
---
First published:
March 14th, 2026
---
Narrated by TYPE III AUDIO.
