Super Data Science: ML & AI Podcast with Jon Krohn cover image

706: Large Language Model Leaderboards and Benchmarks

Super Data Science: ML & AI Podcast with Jon Krohn

00:00

Discussing Large Language Model Evaluations and Leaderboards

This chapter explores how users perceive Large Language Models differently from benchmark evaluations and addresses the problem of evaluation dataset contamination. It also discusses the advantages and disadvantages of popular LLM leaderboards like Helm, Chatbot Arena, and Hugging Face Open LLM leaderboard.

Play episode from 31:34
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app