43 - David Lindner on Myopic Optimization with Non-myopic Approval

Jun 15, 2025

David Lindner, a research scientist at DeepMind, dives into Myopic Optimization with Non-myopic Approval (MONA). He explains how MONA addresses multi-step reward hacking by ensuring actions align with human approval. The discussion covers MONA’s strengths and failure cases, comparing it to approaches like conservativism. Lindner highlights the importance of choosing approval signals carefully and the potential of MONA to create capable AI agents. The episode also explores practical deployment strategies and the complex dynamics of safety versus capability.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ADVICE

Pick A Safe Myopic Timescale

Choose the largest myopic time step that you judge safe against catastrophic harms.
Combine that with threat modeling and empirical oversight testing before expanding the horizon.

ADVICE

Combine MONA With Amplified Oversight

Use scalable or amplified oversight to improve non-myopic approval accuracy across longer horizons.
Test how quickly oversight can be corrupted and limit horizon accordingly.

ADVICE

Prioritize Realistic Experiments

Build realistic, modern agents and run MONA-style RL to test safety-performance trade-offs.
Prefer experimental comparisons across approval designs rather than purely theoretical claims.

Get the Snipd Podcast app to discover more snips from this episode

Get the app