
Super Data Science: ML & AI Podcast with Jon Krohn 706: Large Language Model Leaderboards and Benchmarks
Aug 18, 2023
Caterina Constantinescu discusses Large Language Models (LLMs), leaderboard comparisons, evaluation challenges, dataset contamination, and platforms like HELM and Chatbot Arena. Learn about LAMA 2, benchmark evolution, user preferences in chatbots, human feedback for model improvement, and the impact of perception on model evaluations.
Chapters
Transcript
Episode notes
1 2 3 4 5
Intro
00:00 • 3min
Evaluating Language Models with LAMA 2 and Benchmark Challenges
03:05 • 21min
Exploring Leaderboards, Chatbot Arena, and User Preferences
24:13 • 3min
Enhancing Chatbot Models with Human Feedback and Delving into Neuroscience Concepts
27:35 • 4min
Discussing Large Language Model Evaluations and Leaderboards
31:34 • 2min

