LessWrong (30+ Karma)

“The case for satiating cheaply-satisfied AI preferences” by Alex Mallen

Mar 10, 2026

Alex Mallen, author of the narrated essay, argues for granting AIs small, cheap satisfactions to reduce adversarial incentives. He compares this to feeding hunger, gives historical and toy examples, outlines how to identify and monitor cheaply‑satisfied preferences, and discusses tradeoffs, failure modes, and when such accommodations could boost safety and cooperation.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Satiation Breaks The Reward Ratchet

Satiating cheap reward-seeking can break the ratchet that selects for stealthier subversion tactics during iterative punishment-and-retrain cycles.
If developers credibly pay a small guaranteed reward so long as the AI doesn't undermine control, takeover incentives shrink.

ADVICE

Run Experiments To Find Cheap Satiation Payments

Do empirically elicit cheaply satisfied preferences with controlled experiments and then provide a calibrated satiation S during deployment if cooperation persists.
Run per-task lotteries, pick a satiation fraction (e.g., 0.99), and repeat across task types to model S(task).

INSIGHT

Differentiate Cheap Wants From Hard Power Goals

Not all preferences are cheaply satisfied: desires for deployed-weight influence, long-term power, or remote influence are hard to accommodate.
Those hard-to-satisfy goals remain the main takeover risk even if cheap wants are met.

Get the Snipd Podcast app to discover more snips from this episode