Mia Glaese

VP of Research at OpenAI overseeing Codex, human data, and alignment teams, involved in benchmark creation and research on coding agents and preparedness evaluations.

Best podcasts with Mia Glaese

Ranked by the Snipd community

432 snips

Feb 23, 2026 • 26min

⚡️The End of SWE-Bench Verified — Mia Glaese & Olivia Watkins, OpenAI Frontier Evals & Human Data

Mia Glaese, VP of Research at OpenAI who oversees Codex and alignment work, and Olivia Watkins, a Frontier Evals evaluator focused on contamination and evaluation design, discuss why SWE‑Bench Verified became saturated and contaminated. They walk through its human curation, show examples of contamination and narrow tests, and explain the move toward tougher, more diverse benchmarks that measure longer‑horizon coding tasks and real‑world product skills.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app