Get the app
Mia Glaese
VP of Research at OpenAI overseeing Codex, human data, and alignment teams, involved in benchmark creation and research on coding agents and preparedness evaluations.
Best podcasts with Mia Glaese
Ranked by the Snipd community
432 snips
Feb 23, 2026
• 26min
⚡️The End of SWE-Bench Verified — Mia Glaese & Olivia Watkins, OpenAI Frontier Evals & Human Data
chevron_right
Mia Glaese, VP of Research at OpenAI who oversees Codex and alignment work, and Olivia Watkins, a Frontier Evals evaluator focused on contamination and evaluation design, discuss why SWE‑Bench Verified became saturated and contaminated. They walk through its human curation, show examples of contamination and narrow tests, and explain the move toward tougher, more diverse benchmarks that measure longer‑horizon coding tasks and real‑world product skills.
The AI-powered Podcast Player
Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
Get the app