
Oxide and Friends Are LLMs Insufficently Lazy
42 snips
May 3, 2026 Greg (gregorein), a Polish software engineer and critic of AI-generated code, walks through his audit of a high-profile AI-built site. He explains the HAR findings, the viral fallout, and how he used Claude to automate part of the review. The conversation explores where LLMs should be used, the risks of measuring output by LOC, and why human review and minimal elegant solutions still matter.
AI Snips
Chapters
Books
Transcript
Episode notes
Betty Crocker Effect Explains LLM Overclaiming
- CEOs discovering LLMs often fall into a Betty Crocker/IKEA effect where minimal input creates a false sense of ownership.
- Greg compared Gary Tan's enthusiasm to reintroducing a single step (crack an egg) that made users feel they “made” the product.
Lines Of Code And Tokens Create Perverse Incentives
- Automated metrics like lines-of-code and token-usage produce perverse incentives that reward volume over quality.
- Greg audited Gary Tan's G-Stack and found claims of 37,000 lines/day, with commits rewriting ~40% of code, exposing meaningless volume metrics.
Use LLMs For Audits Not Whole-Cloth Production
- Use LLMs as reviewers or auditors rather than sole creators to expose issues while keeping human judgment in the loop.
- Greg had Claude parse a HAR and saved site output to produce a 500-line audit revealing accessibility and legal problems.




