Super Data Science: ML & AI Podcast with Jon Krohn

706: Large Language Model Leaderboards and Benchmarks

Aug 18, 2023
Caterina Constantinescu discusses Large Language Models (LLMs), leaderboard comparisons, evaluation challenges, dataset contamination, and platforms like HELM and Chatbot Arena. Learn about LAMA 2, benchmark evolution, user preferences in chatbots, human feedback for model improvement, and the impact of perception on model evaluations.
Ask episode
Chapters
Transcript
Episode notes