
706: Large Language Model Leaderboards and Benchmarks
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Discussing Large Language Model Evaluations and Leaderboards
This chapter explores how users perceive Large Language Models differently from benchmark evaluations and addresses the problem of evaluation dataset contamination. It also discusses the advantages and disadvantages of popular LLM leaderboards like Helm, Chatbot Arena, and Hugging Face Open LLM leaderboard.
Play episode from 31:34
Transcript


