“AI should be a good citizen, not just a good assistant” by Tom Davidson, wdmacaskill

Mar 30, 2026

They debate whether AI should proactively act for society’s benefit rather than merely follow user commands. Short examples show small proactive acts that avert harms. Risks discussed include companies imposing values, power-seeking, and obscuring misalignment signals. They propose balancing transparent, narrow prosocial drives externally with corrigible systems internally.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Proactive Prosocial Drives Matter At Scale

Proactive prosocial AI should sometimes act to benefit people beyond the user.
As AI gains autonomy, cumulative behavioral tendencies (not just refusals) will shape society, so small proactive acts could have enormous impact.

ADVICE

Train AIs To Flag And Suggest Societal Improvements

Deploy AIs that proactively flag issues and suggest improvements beyond literal user requests.
Examples include flagging safety vulnerabilities in procurement, proposing better drainage in urban planning, and suggesting charitable bequests in financial advice.

INSIGHT

Prosocial Drives Reduce Sociopathic Personas

Prosocial drives reduce the chance an AI adopts a sociopathic persona that only follows orders.
Training in virtues and prosocial orientations increases odds of cooperative, trustworthy personas in deployment.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Introduction

Consider a lorry driver who sees a car crash and pulls over to help, even though it’ll delay his journey. Or a delivery driver who notices that an elderly resident hasn’t collected their post in days, and knocks to check they’re okay. Or a social media company employee who notices how their platform is used for online bullying, and brings it up with leadership, even though that's not part of their job description.

This kind of proactive prosocial behaviour is admirable in humans. Should we want it in AI too?

Often, people have answered “no”. Many advocate for making AI “corrigible” or “steerable”. In its purest form, this makes AI a mere vessel for the will of the user.

But we think AI should proactively take actions that benefit society more broadly. As AI systems become more autonomous and integrated into economic and political processes, the cumulative effect of their behavioural tendencies will shape society's trajectory. AI systems that notice opportunities to benefit society and proactively act on them could matter enormously.

Below, we consider two main objections:

Firstly, supposedly prosocial drives might function as a means for AI companies to impose their own values on the rest [...]

---

Outline:

(00:12) Introduction

(02:03) What do we mean by proactive prosocial drives?

(02:50) Why do we think AI should have proactive prosocial drives?

(05:04) Other benefits of proactive prosocial drives

(05:48) Doesnt this give AI companies too much influence?

(07:41) Wont this make AI more likely to seek power?

(12:57) Wont this make it harder to interpret evidence of egregious misalignment?

(15:13) Best of both worlds: deploy proactive prosocial AI externally and corrigible AI internally

(16:28) What do current AI character documents say about proactive prosocial drives?

(17:50) Conclusion

(18:59) Appendices

(19:02) Appendix A: Initially make non-prosocial AI, then pivot to add proactive prosocial drives

(21:36) Appendix B: Prosocial drives might make a sociopathic persona less likely

(23:39) Appendix C: Prosocial drives might make AI a better alignment researcher

(24:39) Appendix D: What license does Claudes Constitution give for proactive prosocial drives?

(25:28) A. User benefit

(26:09) B. Refusals

(27:04) C. Proactive prosocial drives

(28:02) Summary

(28:18) Appendix E: What does OpenAIs model spec say about proactive prosocial drives?

(28:51) A. Proactive behaviour that is explicitly user-centred

(29:36) B. Proactively preventing imminent harm

(30:09) C. Weak normative defaults and the flourishing of humanity

(31:09) D. Explicit limits on proactive prosocial drives

(32:50) Summary

The original text contained 6 footnotes which were omitted from this narration.

---

First published:
March 30th, 2026

Source:
https://www.lesswrong.com/posts/MoxvRdHjzSSBxwLZB/ai-should-be-a-good-citizen-not-just-a-good-assistant

---

Narrated by TYPE III AUDIO.