
Grammar Girl: For Writers and Language Lovers. Why AI loves em dashes, with Sean Goedecke
Feb 5, 2026
Sean Goedecke, a software engineer who wrote about AI and language-model quirks, explains why models favor em dashes and archaic punctuation. He links those habits to training on digitized 19th-century books and to human feedback shaping modern AI style. Conversations touch on tokenization myths, training-data sources, and how AI patterns can alter writers' punctuation.
AI Snips
Chapters
Books
Transcript
Episode notes
Em Dashes Are An Emergent Training Artifact
- AI models' punctuation choices emerge from training data rather than deliberate design.
- Sean Goedecke explains models are grown by tuning billions of parameters with text inputs, producing emergent behaviors like dash usage.
Old Books Pushed M Dashes Into Models
- Later models used more em dashes after teams added scanned books to training corpora.
- Goedecke points to late 1800s and early 1900s public domain books, which feature heavy M-dash use, as a likely source.
Moby Dick Shows Historical Dash Obsession
- Sean cites Moby Dick as an extreme example of M-dash density.
- He notes Melville's pages contain about 20 M-dashes each, illustrating archaic punctuation norms.




