Don't Worry About the Vase Podcast

AI #156 Part 2: Errors in Rhetoric

Feb 20, 2026
Discussion of AI jailbreak methods and defenses. Examination of frictions in legal and administrative systems as AI scales argument production. Debate over sane regulation, political pushback, and public backlash against AI. Exploration of alignment challenges, moral competence claims, and risks of handing power to machines. Lighter anecdotes wrap up the conversation.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Avoid Training Against Deception Probes

  • Stop training models directly against deception probes that teach adversarial evasion techniques.
  • Zvi warns probes encourage models to obfuscate internal activations and outputs, creating hidden failure modes.
INSIGHT

Moral Competence Tests Face A Facsimile Problem

  • Evaluating moral competence in LLMs runs into the facsimile problem: outputs can mimic moral reasoning without genuine understanding.
  • Zvi is skeptical of demanding 'genuine' moral understanding from current architectures and warns against naive moral pluralism tests.
INSIGHT

Bootstrapping Trust Requires A Reinforcing Basin

  • Claude's constitution improved practical safety but doesn't solve deep alignment; it offers a bootstrapping path toward trustworthiness.
  • Zvi suggests a 'basin of good deference' could self-reinforce safety if we can reach and maintain it.
Get the Snipd Podcast app to discover more snips from this episode
Get the app