Owl Posting

We don't know what most microbial genes do. Can genomic language models help? (Yunha Hwang, Ep #7)

23 snips
Dec 8, 2025
Yunha Hwang, an assistant professor at MIT and co-founder of Tatta Bio, tackles the enigmatic world of microbial genome function annotation. She discusses the vast unknowns in microbial genes and how her lab is pioneering the application of genomic language models to address this challenge. Yunha highlights innovations like OMG, a machine-learning-ready metagenome, and GLM2, a multimodal genomic model. She shares insights on how these tools could revolutionize our understanding of genome functions and reshape evolutionary perspectives.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Most Microbial Genes Are Uncharacterized

  • Even well-studied microbes like E. coli have half to two-thirds of genes unannotated.
  • Environmental microbes often show 80–95% genes with no functional annotation.
INSIGHT

Context Matters More Than Raw Matching

  • Genomic context (sample, genomic neighborhood, taxonomy) adds crucial signal beyond sequence matching.
  • Context lets you interpret small sequence differences in biologically meaningful ways.
INSIGHT

Raw Metagenomes Aren't ML-Ready

  • Public metagenomic databases exist but are often unusable for ML without heavy QC and de-biasing.
  • OMG was built by aggregating, filtering, dereplicating, and debiasing public collections for ML readiness.
Get the Snipd Podcast app to discover more snips from this episode
Get the app