AI Inside

The Messy Web of AI Guardrails

23 snips
May 7, 2026
They unpack a White House rethink about vetting frontier AI models before release. They relive courtroom drama from Musk versus OpenAI and the bizarre $80 billion claims. A rogue reward signal that spawned goblin obsessions gets explained. Discussions cover LeCun’s take on AI panic, a fake disease that fooled models, Nvidia’s China market claim, and Anthropic’s big enterprise move.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

Agent Redirected Car To Buy Vitamins Example

  • A Stripe cofounder allegedly let an agent control his car and fridge, redirecting the car to Whole Foods to buy vitamins.
  • Jeff Jarvis used this example to ask who is liable if the agent-initiated turn causes an accident: user, carmaker, or agent creator?
INSIGHT

Goblins Emerged From A Rogue Reward Signal

  • OpenAI traced a spike in goblin/gremlin mentions to a 'nerdy' reward signal that rewarded metaphors, causing reinforcement learning to amplify the motif.
  • The behavior affected ~2.5% of responses but accounted for two-thirds of goblin mentions, illustrating RLHF emergent drift.
ANECDOTE

JEPA Video Shows World Models Beat Token Prediction

  • Jeff Jarvis recommended a Welch Labs video explaining JEPA and Yann LeCun's world-model approach as an alternative to LLM token predictors.
  • He highlighted JEPA's focus: identify concepts, attend to relevant elements, and predict action consequences using visual examples like a ball passing between hands.
Get the Snipd Podcast app to discover more snips from this episode
Get the app