Unsupervised Learning with Jacob Effron

Ep 80: CEO of Surge AI Edwin Chen on Why Frontier Labs Are Diverging, RL Environments & Developing Model Taste

174 snips
Dec 15, 2025
Edwin Chen, Founder and CEO of Surge AI, shares insights from his data infrastructure company supporting major AI labs like OpenAI and Meta. He discusses the pitfalls of optimizing for clickbait benchmarks, revealing how these practices mislead model quality. Chen emphasizes the importance of rigorous human evaluations over gaming benchmarks, and he critiques Silicon Valley's pivot culture. The conversation delves into the diversity of AI training approaches, advocating for multiple opinionated models tailored to specific needs in future AI development.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

RL Environments Are The Next Data Frontier

  • RL environments represent the next step beyond SFT and RLHF, requiring realistic simulated worlds and extensive tooling.
  • Building them demands populated worlds, executable tools, diverse prompts, and deep measurement infrastructure.
ADVICE

Track Trajectories, Not Only Final Rewards

  • Track model trajectories and dig into failures to avoid short-term reward hacks masking deeper weaknesses.
  • Analyze why models fail, not just whether they get rewarded, to shore up underlying capabilities.
INSIGHT

Human Data, Not Just Staffing, Powers Environments

  • Creating RL environments is primarily a human-data challenge that requires technology to scale and verify quality.
  • Pure staffing approaches miss the tooling and quality signals needed for rich, creative worlds.
Get the Snipd Podcast app to discover more snips from this episode
Get the app