Vanishing Gradients

Episode 72: Why Agents Solve the Wrong Problem (and What Data Scientists Do Instead)

28 snips
Mar 20, 2026
Bryan Bischof, Head of AI at Theory Ventures and long-time data scientist, explains why real-world data challenges reveal agent failures. He recounts a hackathon testing agents on SQL, logs, and 750,000 PDFs. Topics include failure funnels and binary checkpoints, why unlimited submissions encourage hill-climbing, DocETL for document extraction, MCP as a semantic layer, and when simple coding agents beat heavy frameworks.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Evaluate Agents With Checkpointed Failure Funnels

  • Use checkpointed "failure funnels" of binary questions to evaluate multi-step agent workflows instead of vague 1–5 scales.
  • Bryan built staged evals so each passing checkpoint reveals where reasoning broke, enabling targeted fixes and transition-matrix analysis.
ANECDOTE

America's Next Top Modeler Hackathon Results

  • Bryan ran America's Next Top Modeler: 150 in-person participants solved real data questions with AI across SQL, logs, and 750,000 PDFs.
  • He included a human baseline who wasn't allowed any AI to compare human vs agent workflows.
INSIGHT

Agents Struggle On Real Multimodal Data Tasks

  • The median score was 23/65, showing agents are still "mid" on realistic, multimodal data tasks.
  • Tasks combined CSVs, JSONL logs and a massive PDF corpus to simulate real data science complexity.
Get the Snipd Podcast app to discover more snips from this episode
Get the app