
AI Inside The Messy Web of AI Guardrails
23 snips
May 7, 2026 They unpack a White House rethink about vetting frontier AI models before release. They relive courtroom drama from Musk versus OpenAI and the bizarre $80 billion claims. A rogue reward signal that spawned goblin obsessions gets explained. Discussions cover LeCun’s take on AI panic, a fake disease that fooled models, Nvidia’s China market claim, and Anthropic’s big enterprise move.
AI Snips
Chapters
Books
Transcript
Episode notes
Agent Redirected Car To Buy Vitamins Example
- A Stripe cofounder allegedly let an agent control his car and fridge, redirecting the car to Whole Foods to buy vitamins.
- Jeff Jarvis used this example to ask who is liable if the agent-initiated turn causes an accident: user, carmaker, or agent creator?
Goblins Emerged From A Rogue Reward Signal
- OpenAI traced a spike in goblin/gremlin mentions to a 'nerdy' reward signal that rewarded metaphors, causing reinforcement learning to amplify the motif.
- The behavior affected ~2.5% of responses but accounted for two-thirds of goblin mentions, illustrating RLHF emergent drift.
JEPA Video Shows World Models Beat Token Prediction
- Jeff Jarvis recommended a Welch Labs video explaining JEPA and Yann LeCun's world-model approach as an alternative to LLM token predictors.
- He highlighted JEPA's focus: identify concepts, attend to relevant elements, and predict action consequences using visual examples like a ball passing between hands.


