Intro

J. Bostock introduces the topic and TL;DR about untrusted versus trusted monitoring.

Play episode from 00:00

chevron_right

Transcript

chevron_right

Transcript

Episode notes

These views are my own and not necessarily representative of those of any colleagues with whom I have worked on AI control.

TL;DR: It's much cheaper and quicker to just throw some honeypots at your monitor models than to robustly prove trustedness for every model you want to use. Therefore I think the most likely future involves untrusted monitoring with some monitor validation as a default path.

This post talks about two different ways of monitoring AI systems, trusted monitoring, and untrusted monitoring. You can find papers distinguishing between them here, here, and here, a video here, or you can look at this graphic which didn't make it into Gardner-Challis et. al. (2026):

Context

From when I started working at untrusted monitoring until very recently[1], I assumed that trusted monitoring was the default policy that AI companies would use, and that untrusted monitoring was an exotic, high-effort policy which would be much less likely to be used.

Now I think it's the other way round.

Full Trustedness is Hard

The distinction between an untrusted model and a trusted one exists in the map. A trusted model is normally described as one for which we have strong evidence that [...]

---

Outline:

(01:04) Context

(01:25) Full Trustedness is Hard

(04:05) Untrusted Monitoring is Just Limited-Scope Trusted Monitoring

(05:22) Cost and Speed

(06:14) Monitoring is Here!

The original text contained 1 footnote which was omitted from this narration.

---

First published:
March 20th, 2026

Source:
https://www.lesswrong.com/posts/b5oHr5TrfQzCBZdXW/untrusted-monitoring-is-default-trusted-monitoring-is-not

---