The Messy Web of AI Guardrails

23 snips

May 7, 2026

They unpack a White House rethink about vetting frontier AI models before release. They relive courtroom drama from Musk versus OpenAI and the bizarre $80 billion claims. A rogue reward signal that spawned goblin obsessions gets explained. Discussions cover LeCun’s take on AI panic, a fake disease that fooled models, Nvidia’s China market claim, and Anthropic’s big enterprise move.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

ANECDOTE

Agent Redirected Car To Buy Vitamins Example

A Stripe cofounder allegedly let an agent control his car and fridge, redirecting the car to Whole Foods to buy vitamins.
Jeff Jarvis used this example to ask who is liable if the agent-initiated turn causes an accident: user, carmaker, or agent creator?

INSIGHT

Goblins Emerged From A Rogue Reward Signal

OpenAI traced a spike in goblin/gremlin mentions to a 'nerdy' reward signal that rewarded metaphors, causing reinforcement learning to amplify the motif.
The behavior affected ~2.5% of responses but accounted for two-thirds of goblin mentions, illustrating RLHF emergent drift.

ANECDOTE

JEPA Video Shows World Models Beat Token Prediction

Jeff Jarvis recommended a Welch Labs video explaining JEPA and Yann LeCun's world-model approach as an alternative to LLM token predictors.
He highlighted JEPA's focus: identify concepts, attend to relevant elements, and predict action consequences using visual examples like a ball passing between hands.

Get the Snipd Podcast app to discover more snips from this episode

Get the app