Depression as a local minima

Do antidepressants work via simulated annealing?

Dec 06, 2023

There's pleasure in having our expectations subverted, at least that's what some leading theories of humour suggest. By “bribing the brain with pleasure”1 evolution encourages us to root out errors in our models of the world. Funny things are surprising, surprise is synonymous with information, and gathering information about the world helps us make better predictions about it and thereby improve our survival chances. Curiosity and the novelty-seeking drive are similar enticements to seek information; when your base needs are satisfied it makes sense to invest excess energy in learning more about the world. In some ways, it feels good to be wrong.

If (resolving) ignorance is bliss, can we think of depression as a pathological overconfidence that everything sucks? Major depressive disorder is characterized by a loss of pleasure, fatigue, a tendency to pessimistic interpretations, and negative internally-directed thought loops (rumination). One proposed explanation for these traits is that depression is an adaptive form of risk aversion — or learned helplessness. If you believe that the world is fundamentally unpleasant and risky, it makes some sort of sense to disengage with the outside world in favour of ruminating in bed. The association between illness, inflammation, and depression could be interpreted along similar lines as a means to get people to stay home and recover until they’ve fought off an infection.

The loss of explorative joy in depression is in some ways like being at the bottom of a local minima in gradient descent. If you're confident that your model of the world is accurate, then there's no reason to incentivize further exploration. If the local minima you're stuck in happens to make you believe that the world only has net negative outcomes to offer, then you might as well just give up. Yet most people with depression do eventually find a way to get unstuck. One model of how they manage to do this could be analogous to a technique in machine learning that's applied when gradient descent algorithms are trapped in local minima called simulated annealing — injecting randomness and shaking things up so that other, potentially better, minima can be explored in future iterations.

One of the long-standing mysteries about depression is why our treatments work. Psychotherapy, selective serotonin reuptake inhibitors (SSRIs) or serotonin–noradrenaline reuptake inhibitors (SNRIs) are the mainstay of treatment. But there’s a whole panoply of drugs available to treat depression with diverse mechanisms including lithium, ketamine, noradrenaline–dopamine reuptake inhibitors, monoamine oxidase inhibitors, and more besides. The classical psychedelics, in particular psilocybin, have been getting a lot of attention as of late too. Even crudely shocking the brain with electroconvulsive therapy or disrupting it with transcranial magnetic stimulation or deep brain stimulation is remarkably effective.

It's not clear why such a diverse range of treatments should be effective when you apply a neurochemical or receptor-level lens to depression. Our most widely used pharmacological treatments are (were?) thought to work by modulating serotonin receptors and serotonin uptake, but there's no strong evidence that serotonin itself has any effect on depressive mood. Lithium has been used since 1948 and we still don't understand its mechanism. Psychedelics and electroconvulsive therapy are similarly poorly understood. However, if we think of depression as a pathological local minima perhaps what all these treatments have in common is some sort of generally disruptive effect — an injection of novelty — that acts like simulated annealing. Since major depression is associated with impaired neuronal plasticity, temporarily improving plasticity could be one such means to facilitate annealing into a new more positively valenced state2.

Note: I wrote this short essay over a year ago but didn’t publish it because it was written quickly and felt overly speculative. But, as I was going back over old drafts I found it again and figured I’ll finally post it — if it’s misguided someone will correct me and we can all learn something. Let me know if you enjoyed this type of post, or if you didn’t!

Inside Jokes: Using Humor to Reverse-Engineer the Mind by Hurley, Dennett, and Adams

This could explain why many improvements are temporary, as patients would be likely to fall back into the same depressed local minima if their environment or conditions don't meaningfully change

Manjari Narayan

Dec 7, 2023

This precisely captures one popular hypothesis that nearly all brain interventions (drugs, brain stimulation, etc..) as well as major life changes (creating a different environment, 1 month long therapeutic retreat) are basically different ways of perturbing an individual from their current sub-optimal local minima.

It is somewhat interesting to note that most interventions only roughly help people 30% of the time entirely and the rest tend to have relapses or never respond. One reason for this being that if your intervention doesn't fully address the complex system of depression pathology, then there still remain a lot of forces drawing the individual back into their depressive state.

Minor note though, I personally wouldn't try to take a particular algorithmic metaphor (simulated annealing) too seriously. For one thing I don't think the perturbations/interventions are that random, but rather tend to have insufficient coverage.

Expand full comment

2 replies by Alex Telford and others

2 more comments...

Liveware

Discussion about this post