Grammar Girl: For Writers and Language Lovers.

Why AI loves em dashes, with Sean Goedecke

Feb 5, 2026
Sean Goedecke, a software engineer who wrote about AI and language-model quirks, explains why models favor em dashes and archaic punctuation. He links those habits to training on digitized 19th-century books and to human feedback shaping modern AI style. Conversations touch on tokenization myths, training-data sources, and how AI patterns can alter writers' punctuation.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Em Dashes Are An Emergent Training Artifact

  • AI models' punctuation choices emerge from training data rather than deliberate design.
  • Sean Goedecke explains models are grown by tuning billions of parameters with text inputs, producing emergent behaviors like dash usage.
INSIGHT

Old Books Pushed M Dashes Into Models

  • Later models used more em dashes after teams added scanned books to training corpora.
  • Goedecke points to late 1800s and early 1900s public domain books, which feature heavy M-dash use, as a likely source.
ANECDOTE

Moby Dick Shows Historical Dash Obsession

  • Sean cites Moby Dick as an extreme example of M-dash density.
  • He notes Melville's pages contain about 20 M-dashes each, illustrating archaic punctuation norms.
Get the Snipd Podcast app to discover more snips from this episode
Get the app