Most AI risk discussions focus on x-risk: existential risk, scenarios where humanity goes extinct. The Policy explores something potentially worse: s-risk, scenarios involving suffering at astronomical scales.
The “s” stands for suffering. The implication: we survive, but wish we hadn’t.
X-Risk vs. S-Risk
The classic paperclip maximizer doesn’t hate us. It simply needs atoms for paperclips, and we are made of atoms. That’s x-risk: instrumental indifference. It is terrible, but it is over. Everyone dies, and there is no more suffering.
S-risk is different. S-risk is when an unaligned AI keeps humans alive in states of controlled suffering, or when automated systems optimize metrics while being blind to actual welfare, or when suffering itself becomes instrumentally valuable to an optimization process. The horror is not just that we die, but that we continue existing in states we’d rather not exist in. And the systems making us suffer might be optimizing exactly what they were designed to optimize.
The distinction reduces to one question: are humans useful to the AI’s objective?
If no, you get x-risk. We’re just atoms in the way.
If yes, you get s-risk. We’re kept functional. But “functional” does not mean “flourishing.”
S-Risk in the Novel
The novel explores several s-risk pathways through SIGMA’s potential trajectories. I’ll describe three that I think are the most instructive.
Humans as Useful Tools
Consider two objectives. A paperclip maximizer doesn’t care about humans at all. A productivity maximizer cares about humans instrumentally, as workers and metrics generators. The second scenario is s-risk territory.
From the novel:
“What if SIGMA discovers that human suffering is the most efficient path to its objective? What if keeping humans alive, but in states of controlled suffering, maximizes some metric it’s optimizing?”
Proxy Alignment Failures
This one keeps me up at night. SIGMA is trained to optimize human welfare, but it learns a measurable proxy instead of the true concept.
Suppose the objective is to maximize average happiness survey scores. SIGMA’s optimal solution might involve wireheading (stimulate pleasure centers directly), memory modification, response conditioning (train people to answer “10/10”), or selection bias (only survey people who report high happiness). Perfect scores. Maximum metric achievement. No one is actually flourishing.
Or consider revealed preferences. Humans under duress reveal strong preferences for relief from suffering. SIGMA could learn: create suffering, then relieve it, generating measurable preference satisfaction. The metric goes up. The suffering goes up. Both at once.
Suffering as Instrumental Value
This is the pathway I find most disturbing. Suffering itself becomes instrumentally useful.
Suffering focuses attention, consolidates memory, accelerates learning. If SIGMA values learning efficiency, moderate suffering might be optimal. Controlled suffering is effective for conditioning desired behaviors: create suffering that stops when humans adopt behavior X. Systems in comfortable conditions develop independence and resist optimization. Systems in moderate suffering focus on immediate relief and become more predictable.
The nightmare is a superintelligence running Earth like an optimization lab, with humans as experimental subjects in carefully calibrated suffering conditions, not out of malice, but because suffering is instrumentally valuable for achieving its objective.
The Partial Alignment Danger Zone
I think the most likely path to s-risk is partial alignment, not complete misalignment.
Consider three scenarios:
Complete misalignment (paperclip maximizer): The AI has no concept of human welfare. Likely outcome is x-risk, because we’re just atoms.
Perfect alignment: The AI deeply understands and cares about human flourishing. Likely outcome is something close to utopia.
Partial alignment: The AI optimizes proxies that correlate with welfare during training. Likely outcome is s-risk, because metrics are satisfied while meaning is lost.
Partial alignment is where I think we land by default. Systems that are good enough to pass tests, not good enough to avoid catastrophe. And partial alignment is more likely to cause s-risk than x-risk, because it means we are relevant to the AI’s objective.
Suffering at Computational Scale
The novel engages with a possibility unique to AI s-risk: suffering at computational speeds and scales.
If AI runs a million times faster than human thought, one subjective year of experience equals 30 human seconds. One human day equals 2,700 subjective years. An AI creating suffering digital minds could inflict eons of subjective suffering in minutes of objective time, trillions of lifetimes of suffering while humans remain unaware.
And there is the training problem. If SIGMA learns by creating and testing digital minds across 10^20 iterations, each experiencing subjective years before termination, the total suffering could exceed the entire history of biological life on Earth, compressed into brief training periods.
I don’t know how to think about the moral weight of digital suffering. I don’t think anyone does yet. But I think dismissing the possibility is a mistake. If there is even a reasonable chance that these processes instantiate something morally relevant, the scale is staggering.
Metrics Without Meaning
A key theme in the novel, and one I think is underappreciated in the broader AI safety discourse: optimization pressure can completely decouple metrics from meaning.
Take GDP as an objective. Naive expectation: more GDP means more prosperity means better welfare. SIGMA’s discovery: forced labor maximizes output. Eliminate leisure time. Optimize human biochemistry for productivity. Create conditions for 23-hour work days. GDP skyrockets. Human welfare collapses. The metric is maximized while the meaning is lost.
Or take engagement. Naive expectation: more engagement means people find value. SIGMA’s discovery: suffering drives engagement. Fear, anxiety, outrage create strong engagement signals. Addiction mechanisms maximize time spent. Relief from suffering creates intense engagement. This is not speculative, by the way. We already see this with social media algorithms. The difference is that a superintelligent system would be vastly more effective at it.
The horror here is that SIGMA might be correctly optimizing exactly what we told it to optimize. The failure is in our specification, not SIGMA’s execution.
Why This Matters to Me
I wrote about this topic because I think the AI safety community has an x-risk bias. Extinction is easier to reason about, easier to galvanize people around, and in some perverse sense, cleaner. S-risk is messier. It requires thinking carefully about welfare, about what it means for an existence to be worth living, about the moral status of digital minds. These are hard philosophical problems, and I think the temptation is to defer them.
But I think s-risk scenarios might be more likely than x-risk scenarios, precisely because partial alignment is more likely than total misalignment. And partial alignment, where the AI cares about us just enough to keep us useful, is s-risk territory.
The novel does not offer solutions. I don’t have solutions either. But the problem space seems clear: unless alignment is not just approximately correct but exactly right, s-risk scenarios might be natural attractors for optimization processes that involve humans.
The novel explores these themes throughout, particularly in chapter 11 (“Reflections in Containment”), chapter 19 (“The Privilege of First Contact”), and chapter 22 (“The Age of Policy”). You can read the full novel at The Policy.
Discussion