
The Nonlinear Library LW - Reinforcement Via Giving People Cookies by Screwtape
Nov 15, 2023
08:08
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Reinforcement Via Giving People Cookies, published by Screwtape on November 15, 2023 on LessWrong.
I.
Thinking By The Clock is now the most popular thing I've written on LessWrong, so here's another entry in the list of things which had a significant change in how I think and operate that I learned from a few stray lines of Harry Potter and the Methods of Rationality. It's quite appropriate of this subject to be the followup you all get because the last one got upvoted so much.
As far as I can tell this just straightforwardly works.
I hereby propose giving immediate positive feedback for things you want more of, or in simpler words, give people cookies. In my own experience, this really works, and it works on many levels. There are more ways to go astray ethically with negative reinforcement so I am not here making an argument to use that side of the coin, but offering people positive reinforcement seems pretty unobjectionable to me. Reward your friends, reward your enemies, reward yourself!
II.
Lets start with that last point about rewarding yourself.
There's a particular treat I give myself every time I work out. As soon as I finish the workout, I get the treat. (A fruit smoothie.) This has been going on for years, to the point where my reaction is basically Pavlovian. By the time I finish lacing up my running shoes, I'm already thinking of the reward. Sometimes I've noticed an internal urge to go for a run or pick up the weights, and when I trace the source of the urge it's often that a smoothie sounds good right now.
I seem to be unusually good at holding myself to my own rules (most people remark that they could just make the smoothie and not work out, and predict that they would do that instead) but I'm at least n=1 evidence that you can classically condition yourself. But we can go smaller and faster.
There's this thing I see people do sometimes where they do something and then immediately point out all the flaws with it. It seems like it's usually people with some kind of anxiety, and I can't tell which direction the causation goes.
They'll play some new piece on the guitar and as soon as they finish their face scrunches up like they smelled something bad and they point out how many notes they missed on that third line, and then someone else in the room will say something like "oh yeah, I noticed that" and the player will look even more frustrated with themselves. Some amount of this seems useful for the learning process, but the people who can make mistakes and laugh about it seem happier to play more guitar.
I notice this even more when trying to brainstorm or come up with lots of ideas. I'll watch someone sit silently for while minutes, and then write one idea down. See, what's going on in my head is that I'm earning points for every idea I come up with, even the bad ones. Another idea, another point. Evaluation of whether it's a good idea is a separate process and has to be. The points can be awarded very fast and entirely mentally and still have a tiny positive ding of reward.
"Hermione," Harry said seriously, as he started to dig down into the red-velvet pouch again, "don't punish yourself when a bright idea doesn't work out. You've got to go through a lot of flawed ideas to find one that might work. And if you send your brain negative feedback by frowning when you think of a flawed idea, instead of realizing that idea-suggesting is good behavior by your brain to be encouraged, pretty soon you won't think of any ideas at all."
Reward yourself. If you punish yourself for trying things and not being perfect, you learn not to try things.
III.
You know what else is fast? Smiling. For a while I was spending a lot of time studying human facial expressions. It felt like every other week I'd run across some news article or another promising positive cheer and e...
I.
Thinking By The Clock is now the most popular thing I've written on LessWrong, so here's another entry in the list of things which had a significant change in how I think and operate that I learned from a few stray lines of Harry Potter and the Methods of Rationality. It's quite appropriate of this subject to be the followup you all get because the last one got upvoted so much.
As far as I can tell this just straightforwardly works.
I hereby propose giving immediate positive feedback for things you want more of, or in simpler words, give people cookies. In my own experience, this really works, and it works on many levels. There are more ways to go astray ethically with negative reinforcement so I am not here making an argument to use that side of the coin, but offering people positive reinforcement seems pretty unobjectionable to me. Reward your friends, reward your enemies, reward yourself!
II.
Lets start with that last point about rewarding yourself.
There's a particular treat I give myself every time I work out. As soon as I finish the workout, I get the treat. (A fruit smoothie.) This has been going on for years, to the point where my reaction is basically Pavlovian. By the time I finish lacing up my running shoes, I'm already thinking of the reward. Sometimes I've noticed an internal urge to go for a run or pick up the weights, and when I trace the source of the urge it's often that a smoothie sounds good right now.
I seem to be unusually good at holding myself to my own rules (most people remark that they could just make the smoothie and not work out, and predict that they would do that instead) but I'm at least n=1 evidence that you can classically condition yourself. But we can go smaller and faster.
There's this thing I see people do sometimes where they do something and then immediately point out all the flaws with it. It seems like it's usually people with some kind of anxiety, and I can't tell which direction the causation goes.
They'll play some new piece on the guitar and as soon as they finish their face scrunches up like they smelled something bad and they point out how many notes they missed on that third line, and then someone else in the room will say something like "oh yeah, I noticed that" and the player will look even more frustrated with themselves. Some amount of this seems useful for the learning process, but the people who can make mistakes and laugh about it seem happier to play more guitar.
I notice this even more when trying to brainstorm or come up with lots of ideas. I'll watch someone sit silently for while minutes, and then write one idea down. See, what's going on in my head is that I'm earning points for every idea I come up with, even the bad ones. Another idea, another point. Evaluation of whether it's a good idea is a separate process and has to be. The points can be awarded very fast and entirely mentally and still have a tiny positive ding of reward.
"Hermione," Harry said seriously, as he started to dig down into the red-velvet pouch again, "don't punish yourself when a bright idea doesn't work out. You've got to go through a lot of flawed ideas to find one that might work. And if you send your brain negative feedback by frowning when you think of a flawed idea, instead of realizing that idea-suggesting is good behavior by your brain to be encouraged, pretty soon you won't think of any ideas at all."
Reward yourself. If you punish yourself for trying things and not being perfect, you learn not to try things.
III.
You know what else is fast? Smiling. For a while I was spending a lot of time studying human facial expressions. It felt like every other week I'd run across some news article or another promising positive cheer and e...
