AXRP - the AI X-risk Research Podcast

Daniel Filan

AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.

Episodes

Mentioned books

Oct 4, 2024 • 1h 44min

37 - Jaime Sevilla on AI Forecasting

Jaime Sevilla, Director of Epoch AI, dives into the intricacies of AI forecasting and compute trends. He discusses the exponential growth in computational power and its implications for AI development. The conversation highlights the tight relationship between algorithmic improvements and scaling, considering whether scaling is the key to achieving AGI. Sevilla also tackles challenges in GPU production and the importance of transparent AI training processes. Get ready for some thought-provoking insights into the future of artificial intelligence!

Sep 29, 2024 • 1h 48min

36 - Adam Shai and Paul Riechers on Computational Mechanics

Adam Shai, co-founder of Simplex AI Safety, dives into the realm of computational mechanics and its application to AI safety. He explores how computational mechanics can improve our understanding of neural network models, especially in predicting outcomes. The discussion covers the intriguing world models that transformers create and how fractals emerge in these networks. Shai also highlights the potential of combining insights from quantum information theory with computational mechanics to enhance AI interpretability.

Sep 28, 2024 • 6min

New Patreon tiers + MATS applications

Patreon: https://www.patreon.com/axrpodcast MATS: https://www.matsprogram.org Note: I'm employed by MATS, but they're not paying me to make this video.

Aug 24, 2024 • 2h 17min

35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

In this discussion, Peter Hase, a researcher specializing in large language models, dives into the intriguing world of AI beliefs. He explores whether LLMs truly have beliefs and how to detect and edit them. A key focus is on the complexities of interpreting neural representations and the implications of belief localization. The conversation also covers the concept of easy-to-hard generalization, revealing insights on how AI tackles different task difficulties. Join Peter as he navigates these thought-provoking topics, blending philosophy with practical AI research.

Jul 28, 2024 • 2h 14min

34 - AI Evaluations with Beth Barnes

Beth Barnes, the founder and head of research at METR, dives into the complexities of evaluating AI systems. They discuss tailored threat models and the unpredictability of AI performance, stressing the need for precise assessment methodologies. Barnes highlights issues like sandbagging and behavior misrepresentation, emphasizing the importance of ethical considerations in AI evaluations. The conversation also touches on the role of policy in shaping effective evaluation science, as well as the disparities between different AI labs in security and monitoring.

Jun 12, 2024 • 1h 41min

33 - RLHF Problems with Scott Emmons

Expert Scott Emmons discusses challenges in Reinforcement Learning from Human Feedback (RLHF): deceptive inflation, overjustification, bounded human rationality, and solutions. Touches on dimensional analysis and his research program, emphasizing the importance of addressing these challenges in AI systems.

May 30, 2024 • 2h 22min

32 - Understanding Agency with Jan Kulveit

Jan Kulveit, who leads the Alignment of Complex Systems research group, dives into the fascinating intersection of AI and human cognition. He discusses active inference, the differences between large language models and the human brain, and how feedback loops influence behavior. The conversation explores hierarchical agency, the complexities of aligning AI with human values, and the philosophical implications of self-awareness in AI. Kulveit also critiques existing frameworks for understanding agency, shedding light on the dynamics of collective behaviors.

May 7, 2024 • 2h 32min

31 - Singular Learning Theory with Daniel Murfet

Daniel Murfet, a researcher specializing in singular learning theory and Bayesian statistics, dives into the intricacies of deep learning models. He explains how singular learning theory enhances our understanding of learning dynamics and phase transitions in neural networks. The conversation explores local learning coefficients, their impact on model accuracy, and how singular learning theory compares with other frameworks. Murfet also discusses the potential for this theory to contribute to AI alignment, emphasizing interpretability and the challenges of integrating AI capabilities with human values.

Apr 30, 2024 • 2h 16min

30 - AI Security with Jeffrey Ladish

AI security expert Jeffrey Ladish discusses the robustness of safety training in AI models, dangers of open LLMs, securing against attackers, and the state of computer security. They explore undoing safety filters, AI phishing, and making AI more legible. Topics include securing model weights, defending against AI exfiltration, and red lines in AI development.

Apr 25, 2024 • 2h 14min

29 - Science of Deep Learning with Vikrant Varma

Vikrant Varma discusses challenges with unsupervised knowledge discovery, grokking in neural networks, circuit efficiency, and the role of complexity in deep learning. The conversation delves into the balance between memorization and generalization, exploring neural circuits, implicit priors, optimization, and alignment projects at DeepMind.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

App store banner

Play store banner