
The Nonlinear Library LW - 'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata by Mateusz Bagiński
Nov 16, 2023
37:06
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 'Theories of Values' and 'Theories of Agents': confusions, musings and desiderata, published by Mateusz Bagiński on November 16, 2023 on LessWrong.
Meta:
Content signposts: we talk about limits to expected utility theory; what values are (and ways in which we're confused about what values are); the need for a "generative"/developmental logic of agents (and their values); types of constraints on the "shape" of agents; relationships to FEP/active inference; and (ir)rational/(il)legitimate value change.
Context: we're basically just chatting about topics of mutual interests, so the conversation is relatively free-wheeling and includes a decent amount of "creative speculation".
Epistemic status: involves a bunch of "creative speculation" that we don't think is true at face value and which may or may not turn out to be useful for making progress on deconfusing our understanding of the respective territory.
Expected utility theory (stated in terms of the VNM axioms or something equivalent) thinks of rational agents as composed of two "parts", i.e., beliefs and preferences. Beliefs are expressed in terms of probabilities that are being updated in the process of learning (e.g., Bayesian updating). Preferences can be expressed as an ordering over alternative states of the world or outcomes or something similar. If we assume an agent's set of preferences to satisfy the four VNM axioms (or some equivalent desiderata), then those preferences can be expressed with some real-valued utility function u and the agent will behave as if they were maximizing that u.
On this account, beliefs change in response to evidence, whereas values/preferences in most cases don't. Rational behavior comes down to (behaving as if one is) ~maximizing one's preference satisfaction/expected utility. Most changes to one's preferences are detrimental to their satisfaction, so rational agents should want to keep their preferences unchanged (i.e., utility function preservation is an instrumentally convergent goal).
Thus, for a preference modification to be rational, it would have to result in higher expected utility than leaving the preferences unchanged. My impression is that the most often discussed setup where this is the case involves interactions between two or more agents. For example, if you and and some other agent have somewhat conflicting preferences, you may go on a compromise where each one of you makes them preferences somewhat more similar to the preferences of the other. This costs both of you a bit of (expected subjective) utility, but less than you would lose (in expectation) if you engaged in destructive conflict.
Another scenario justifying modification of one's preferences is when you realize the world is different than you expected on your priors, such that you need to abandon the old ontology and/or readjust it. If your preferences were defined in terms of (or strongly entangled with) concepts from the previous ontology, then you will also need to refactor your preferences.
You think that this is a confused way to think about rationality. For example, you see self-induced/voluntary value change as something that in some cases is legitimate/rational.
I'd like to elicit some of your thoughts about value change in humans. What makes a specific case of value change (il)legitimate? How is that tied to the concepts of rationality, agency, etc? Once we're done with that, we can talk more generally about arguments for why the values of an agent/system should not be fixed.
Sounds good?
On a meta note: I've been using the words "preference" and "value" more or less interchangeably, without giving much thought to it. Do you view them as interchangeable or would you rather first make some conceptual/terminological clarification?
Sounds great!
(And I'm happy to use "preferences" and "values" interc...
Meta:
Content signposts: we talk about limits to expected utility theory; what values are (and ways in which we're confused about what values are); the need for a "generative"/developmental logic of agents (and their values); types of constraints on the "shape" of agents; relationships to FEP/active inference; and (ir)rational/(il)legitimate value change.
Context: we're basically just chatting about topics of mutual interests, so the conversation is relatively free-wheeling and includes a decent amount of "creative speculation".
Epistemic status: involves a bunch of "creative speculation" that we don't think is true at face value and which may or may not turn out to be useful for making progress on deconfusing our understanding of the respective territory.
Expected utility theory (stated in terms of the VNM axioms or something equivalent) thinks of rational agents as composed of two "parts", i.e., beliefs and preferences. Beliefs are expressed in terms of probabilities that are being updated in the process of learning (e.g., Bayesian updating). Preferences can be expressed as an ordering over alternative states of the world or outcomes or something similar. If we assume an agent's set of preferences to satisfy the four VNM axioms (or some equivalent desiderata), then those preferences can be expressed with some real-valued utility function u and the agent will behave as if they were maximizing that u.
On this account, beliefs change in response to evidence, whereas values/preferences in most cases don't. Rational behavior comes down to (behaving as if one is) ~maximizing one's preference satisfaction/expected utility. Most changes to one's preferences are detrimental to their satisfaction, so rational agents should want to keep their preferences unchanged (i.e., utility function preservation is an instrumentally convergent goal).
Thus, for a preference modification to be rational, it would have to result in higher expected utility than leaving the preferences unchanged. My impression is that the most often discussed setup where this is the case involves interactions between two or more agents. For example, if you and and some other agent have somewhat conflicting preferences, you may go on a compromise where each one of you makes them preferences somewhat more similar to the preferences of the other. This costs both of you a bit of (expected subjective) utility, but less than you would lose (in expectation) if you engaged in destructive conflict.
Another scenario justifying modification of one's preferences is when you realize the world is different than you expected on your priors, such that you need to abandon the old ontology and/or readjust it. If your preferences were defined in terms of (or strongly entangled with) concepts from the previous ontology, then you will also need to refactor your preferences.
You think that this is a confused way to think about rationality. For example, you see self-induced/voluntary value change as something that in some cases is legitimate/rational.
I'd like to elicit some of your thoughts about value change in humans. What makes a specific case of value change (il)legitimate? How is that tied to the concepts of rationality, agency, etc? Once we're done with that, we can talk more generally about arguments for why the values of an agent/system should not be fixed.
Sounds good?
On a meta note: I've been using the words "preference" and "value" more or less interchangeably, without giving much thought to it. Do you view them as interchangeable or would you rather first make some conceptual/terminological clarification?
Sounds great!
(And I'm happy to use "preferences" and "values" interc...
