Discussing Large Language Model Evaluations and Leaderboards

This chapter explores how users perceive Large Language Models differently from benchmark evaluations and addresses the problem of evaluation dataset contamination. It also discusses the advantages and disadvantages of popular LLM leaderboards like Helm, Chatbot Arena, and Hugging Face Open LLM leaderboard.

Play episode from 31:34

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app