
LessWrong (30+ Karma) [Linkpost] “Metagaming matters for training, evaluation, and oversight” by jenny, Bronson Schoen
Mar 19, 2026
01:13
This is a link post.
Following up on our previous work on verbalized eval awareness:
we are sharing a post investigating the emergence of metagaming reasoning in a frontier training run.
- Metagaming is a more general, and in our experience a more useful concept, than evaluation awareness.
- It arises in frontier training runs and does not require training on honeypot environments.
- Verbalization of metagaming can go down over the course of training.
We also share some quantitative analyses, qualitative examples, and upcoming work.
---
First published:
March 18th, 2026
Linkpost URL:
https://alignment.openai.com/metagaming
---
Narrated by TYPE III AUDIO.
---
Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
