Super Data Science: ML & AI Podcast with Jon Krohn cover image

706: Large Language Model Leaderboards and Benchmarks

Super Data Science: ML & AI Podcast with Jon Krohn

00:00

Exploring Leaderboards, Chatbot Arena, and User Preferences

Exploring the Helm and Huggyface Open Leaderboard and the chatbot arena, gathering crucial data through human-driven head-to-head comparisons. The upcoming episode will feature Professor Joey Gonzales from Berkeley University explaining the evaluation process within the chatbot arena, utilizing ELO ratings for model assessment.

Play episode from 24:13
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app