
AXRP - the AI X-risk Research Podcast 43 - David Lindner on Myopic Optimization with Non-myopic Approval
Jun 15, 2025
David Lindner, a research scientist at DeepMind, dives into Myopic Optimization with Non-myopic Approval (MONA). He explains how MONA addresses multi-step reward hacking by ensuring actions align with human approval. The discussion covers MONA’s strengths and failure cases, comparing it to approaches like conservativism. Lindner highlights the importance of choosing approval signals carefully and the potential of MONA to create capable AI agents. The episode also explores practical deployment strategies and the complex dynamics of safety versus capability.
AI Snips
Chapters
Transcript
Episode notes
Pick A Safe Myopic Timescale
- Choose the largest myopic time step that you judge safe against catastrophic harms.
- Combine that with threat modeling and empirical oversight testing before expanding the horizon.
Combine MONA With Amplified Oversight
- Use scalable or amplified oversight to improve non-myopic approval accuracy across longer horizons.
- Test how quickly oversight can be corrupted and limit horizon accordingly.
Prioritize Realistic Experiments
- Build realistic, modern agents and run MONA-style RL to test safety-performance trade-offs.
- Prefer experimental comparisons across approval designs rather than purely theoretical claims.
