Oxide and Friends

Are LLMs Insufficently Lazy

42 snips
May 3, 2026
Greg (gregorein), a Polish software engineer and critic of AI-generated code, walks through his audit of a high-profile AI-built site. He explains the HAR findings, the viral fallout, and how he used Claude to automate part of the review. The conversation explores where LLMs should be used, the risks of measuring output by LOC, and why human review and minimal elegant solutions still matter.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

Betty Crocker Effect Explains LLM Overclaiming

  • CEOs discovering LLMs often fall into a Betty Crocker/IKEA effect where minimal input creates a false sense of ownership.
  • Greg compared Gary Tan's enthusiasm to reintroducing a single step (crack an egg) that made users feel they “made” the product.
INSIGHT

Lines Of Code And Tokens Create Perverse Incentives

  • Automated metrics like lines-of-code and token-usage produce perverse incentives that reward volume over quality.
  • Greg audited Gary Tan's G-Stack and found claims of 37,000 lines/day, with commits rewriting ~40% of code, exposing meaningless volume metrics.
ADVICE

Use LLMs For Audits Not Whole-Cloth Production

  • Use LLMs as reviewers or auditors rather than sole creators to expose issues while keeping human judgment in the loop.
  • Greg had Claude parse a HAR and saved site output to produce a 500-line audit revealing accessibility and legal problems.
Get the Snipd Podcast app to discover more snips from this episode
Get the app