
706: Large Language Model Leaderboards and Benchmarks
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Exploring Leaderboards, Chatbot Arena, and User Preferences
Exploring the Helm and Huggyface Open Leaderboard and the chatbot arena, gathering crucial data through human-driven head-to-head comparisons. The upcoming episode will feature Professor Joey Gonzales from Berkeley University explaining the evaluation process within the chatbot arena, utilizing ELO ratings for model assessment.
Play episode from 24:13
Transcript


