
Google Launches Gemini 3.1 and YouTube AI
The AI Podcast
00:00
Real-World Leaderboards Matter
Jaeden argues blind human evaluations and leaderboards better indicate real improvements than self-reported scores.
Play episode from 05:11
Transcript

Jaeden argues blind human evaluations and leaderboards better indicate real improvements than self-reported scores.