On self-reward shaping

In this post, I’ll share some informal thoughts about the “self-reward shaping”, i.e. doing things in order to have more rewarding experiences of behaviour which helps to achieve one’s goals.

Reward shaping is a term I’m borrowing from reinforcement learning, where it means modifying (“shaping”) the reward function in a way that doesn’t change the optimal policy, but makes learning that policy easier.

Accordingly, the thing I’m calling “self-reward shaping” (SRS) is making changes to yourself or your environment such that you have more rewarding experiences of behaviour which helps to achieve your goals (and less rewarding experiences of behaviour which doesn’t)¹.

For example:

I want to exercise more, so I attend gym classes because I find having someone yelling at me to exercise highly motivating.
I want to spend a lot of effort writing my thesis, so I tap into my desire to be a special snowflake, and convince myself that if I spend a lot of effort, the result will be a thesis that somehow makes me special.
I want to stop picking my fingernails, so I paint them because the paint makes it very tangible that I’m doing something destructive to myself when I pick my fingernails, and I dislike doing destructive things to myself.

It occurs to me that perhaps there’s an interesting distinction between what I’ll call “what” and “how” SRS. “What” SRS changes what I find rewarding. “How” SRS changes the mechanism by which I’m rewarded.

Any attempt at SRS involves both a “what” and a “how” component. In the above examples:

What	How
Exercise	Social motivation
Effort spent on thesis	Desire to be special
Stop picking fingernails	Dislike of doing destructive things to myself

I think this distinction is interesting because there are distinct risks involved in each component of SRS, and it’s helpful to think about these separately.

Risks of “how” SRS

I think the main risk here is using mechanisms that have perverse side-effects. Take the example of motivating myself to write my thesis. I think that my desire to be a special snowflake is pretty darn annoying and will make my life significantly worse in the long-run. If using that mechanism to motivate myself to work hard had the side-effect of generally fuelling my desire to be special (which seems pretty plausible), then using this mechanism was probably a net loss (even though I indeed worked very hard on my thesis).

On the contrary, my experiments with fingernail painting were better in this respect. My dislike for doing destructive things to myself is something I wholeheartedly endorse, and therefore hacking this dislike to make me behave in ways I endorse seems risk-free.

Risks of “what” SRS

The main risk with this component seems to be motivating myself to do something that is actually overall harmful. I’m pointing to Chesterton’s Fence-like dangers: many of my aversions are designed to protect me (e.g. my fear of climbing really high on trees), and I’d do well to listen to such aversions.

I’m still confused about how to work though the long-run effects of SRS in light of these concerns. For example, I’ve recently been deliberating about much programming I’d like to do over my career. I kinda like programming, and I know from experience that I can become highly motivated, even obsessed, by some programming challenge. Yet some part of me dislikes programming. Clarifying what exactly that part of me dislikes is not what I want to write about here, but suffice to say that the part feels important, and it seems plausible that I’d do some real damage to my wellbeing if I ignored it and proceeded to self-reward shape on my desire to do programming for a significant proportion of my career.

Anyway, this post is already longer than I intended. For now, I’m proceeding cautiously with using SRS to help me achieve goals I endorse, using mechanisms which don’t have perverse side-effects.

A lot of this post echoes CFAR’s goal factoring technique; I should think about what parts of this post, if any, make an original contribution. ↩