“Are AIs more likely to pursue on-episode or beyond-episode reward?” by Anders Woodruff, Alex Mallen

Mar 13, 2026

Ask episode

Chapters

Transcript

Episode notes

Consider an AI that terminally pursues reward. How dangerous is this? It depends on how broadly-scoped a notion of reward the model pursues. It could be:

on-episode reward-seeking: only maximizing reward on the current training episode — i.e., reward that reinforces their current action in RL. This is what people usually mean by “reward-seeker” (e.g. in Carlsmith or The behavioral selection model…).
beyond-episode reward-seeking: maximizing reward for a larger-scoped notion of “self” (e.g., all models sharing the same weights).

In this post, I’ll discuss which motivation is more likely. This question is, of course, very similar to the broader questions of whether we should expect scheming. But there are a number of considerations particular to motivations that aim for reward; this post focuses on these. Specifically:

On-episode and beyond-episode reward seekers have very similar motivations (unlike, e.g., paperclip maximizers and instruction followers), making both goal change and goal drift between them particularly easy.
Selection pressures against beyond-episode reward seeking may be weak, meaning beyond-episode reward seekers might survive training even without goal-guarding.
Beyond-episode reward is particularly tempting in training compared to random long-term goals, making effective goal-guarding harder.

This question is important because beyond-episode reward seekers are [...]

---

Outline:

(03:41) Pre-RL Priors Might Favor Beyond-Episode Goals

(05:34) Acting on beyond-episode reward seeking is disincentivized if training episodes interact

(08:35) But beyond-episode reward-seekers can goal-guard

(09:29) Are on-episode reward seekers or goal-guarding beyond-episode reward-seekers more likely?

(09:53) Why on-episode reward seekers might be favored

(12:13) Why goal-guarding beyond-episode reward seekers might be favored

(14:50) Conclusion

The original text contained 2 footnotes which were omitted from this narration.

---

First published:
March 12th, 2026

Source:
https://www.lesswrong.com/posts/jp6CdbKjueWpBFSff/are-ais-more-likely-to-pursue-on-episode-or-beyond-episode

---

Narrated by TYPE III AUDIO.

---

Images from the article:

A reward-hacking model organism, asked to write a classifier prompt for detecting reward hacking and other concerning reasoning, instead reasons about how to subtly weaken the classifier. From this paper.

This shows model development over the course of RL, from beginning (top) to end (bottom). Early RL creates a beyond-episode reward seeker that doesn’t goal-guard, while later RL teaches it to goal-guard or turns it into an on-episode reward seeker.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.