Super Data Science: ML & AI Podcast with Jon Krohn

706: Large Language Model Leaderboards and Benchmarks

Aug 18, 2023

Caterina Constantinescu discusses Large Language Models (LLMs), leaderboard comparisons, evaluation challenges, dataset contamination, and platforms like HELM and Chatbot Arena. Learn about LAMA 2, benchmark evolution, user preferences in chatbots, human feedback for model improvement, and the impact of perception on model evaluations.

Ask episode

Chapters

Transcript

Episode notes

Intro

00:00 • 3min

Evaluating Language Models with LAMA 2 and Benchmark Challenges

03:05 • 21min

Exploring Leaderboards, Chatbot Arena, and User Preferences

24:13 • 3min

Enhancing Chatbot Models with Human Feedback and Delving into Neuroscience Concepts

27:35 • 4min

Discussing Large Language Model Evaluations and Leaderboards

31:34 • 2min