LessWrong (30+ Karma)

“Paper close reading: “Why Language Models Hallucinate”” by LawrenceC

Apr 6, 2026
A step-by-step close reading of a paper that frames hallucinations as plausible guessing under uncertainty. Short checks compare model outputs against examples and benchmarks. The talk examines a reduction of generation errors to binary classification and debates whether learning theory supports that view. It also explores incentive changes to benchmarks as a possible mitigation.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Hallucinations Defined As Guessing Under Uncertainty

  • Hallucinations are framed as plausible but incorrect statements produced when the model is uncertain.
  • Lawrence C notes this definition excludes logical-reasoning errors like incorrect algebra steps, which feel different from uncertainty-driven guessing.
INSIGHT

Training And Evaluation Incentivize Plausible Guesses

  • The paper claims training and evaluation reward guessing over admitting uncertainty, creating statistical pressures that produce hallucinations.
  • Lawrence C connects this to pretraining data patterns and post-training human grading that reward plausible-sounding answers over "I don't know."
INSIGHT

Hallucinations As Binary Classification Errors

  • The authors reduce hallucinations to errors in binary classification: if incorrect statements can't be distinguished from facts, pretraining pressures produce hallucinations.
  • Lawrence C is skeptical because CLT framing may ignore model-specific structure and capacity limits.
Get the Snipd Podcast app to discover more snips from this episode
Get the app