
Unsupervised Learning with Jacob Effron Ep 80: CEO of Surge AI Edwin Chen on Why Frontier Labs Are Diverging, RL Environments & Developing Model Taste
174 snips
Dec 15, 2025 Edwin Chen, Founder and CEO of Surge AI, shares insights from his data infrastructure company supporting major AI labs like OpenAI and Meta. He discusses the pitfalls of optimizing for clickbait benchmarks, revealing how these practices mislead model quality. Chen emphasizes the importance of rigorous human evaluations over gaming benchmarks, and he critiques Silicon Valley's pivot culture. The conversation delves into the diversity of AI training approaches, advocating for multiple opinionated models tailored to specific needs in future AI development.
AI Snips
Chapters
Transcript
Episode notes
RL Environments Are The Next Data Frontier
- RL environments represent the next step beyond SFT and RLHF, requiring realistic simulated worlds and extensive tooling.
- Building them demands populated worlds, executable tools, diverse prompts, and deep measurement infrastructure.
Track Trajectories, Not Only Final Rewards
- Track model trajectories and dig into failures to avoid short-term reward hacks masking deeper weaknesses.
- Analyze why models fail, not just whether they get rewarded, to shore up underlying capabilities.
Human Data, Not Just Staffing, Powers Environments
- Creating RL environments is primarily a human-data challenge that requires technology to scale and verify quality.
- Pure staffing approaches miss the tooling and quality signals needed for rich, creative worlds.

