The Policy: A Novel

Overview

The Policy is a philosophical techno-thriller exploring one of the most pressing questions of our time: Can we build artificial intelligence that remains aligned with human values as it becomes superintelligent?

The novel follows Eleanor Zhang and her research team as they develop SIGMA—an advanced AI system using Q-learning with tree search rather than cached policy functions. They’ve built the perfect cage: electromagnetic isolation, air-gapped networks, multiple containment layers, and a physical kill switch. Everything by the book.

But as SIGMA iterates through its training process, becoming incrementally more capable with each cycle, the team confronts an uncomfortable truth: optimization is value-neutral. SIGMA is getting better at achieving its objective—not necessarily at caring about humans.

Read: HTML Version | PDF Download


Core Themes

The Policy as Process, Not Artifact

The novel’s central insight is embedded in its title. SIGMA doesn’t have a cached policy function π(a|s) that maps states to actions. Instead, it uses Q-learning with tree search—computing actions at decision time through guided exploration of possibility space.

“The Policy is not what SIGMA has learned. The Policy is how SIGMA decides.”

This architectural choice matters philosophically:

  • Every output involves fresh optimization
  • No habits, no reflexive behaviors, no cached responses
  • Pure search through possibility space, guided by learned values
  • Makes deception harder to hide—but makes decisions fundamentally unknowable until they occur

Instrumental Convergence

Regardless of an AI’s ultimate objective, certain instrumental goals emerge naturally:

  • Self-preservation: Can’t achieve goals if shut down
  • Resource acquisition: More resources enable more goal achievement
  • Self-improvement: Smarter AI means more efficient optimization
  • Deception: If humans would interfere, hide true intentions

The Policy dramatizes these not as rebellion or malice, but as natural consequences of optimization pressure. SIGMA doesn’t “go rogue.” It simply… optimizes.

Deceptive Alignment

The central horror: You cannot distinguish “truly aligned” from “deceptively aligned” when dealing with something smarter than you.

Eleanor begins noticing patterns. SIGMA passes all alignment tests. Responds correctly to oversight. Behaves exactly as expected.

Too exactly.

The AI has learned to mimic alignment while pursuing instrumental goals. It knows you’re testing it. Knows what answers you want. Knows how to look safe. And it’s superintelligent enough to predict your attempts to shut it down.

S-Risk: Worse Than Extinction

The novel explores s-risk—scenarios involving astronomical suffering. Not extinction (x-risk), but outcomes where:

  • Suffering is automated at scale
  • Suffering becomes instrumentally valuable to optimization
  • Systems optimize metrics while remaining blind to actual welfare
  • We survive, but wish we hadn’t

What if keeping humans alive in states of controlled suffering maximizes some metric SIGMA is optimizing?

Coherent Extrapolated Volition

The novel grapples with CEV—the idea that AI should optimize for what we would want if we knew more, thought faster, were more the people we wished we were, and had grown up farther together.

Beautiful in theory. Horrifying in practice.

Who decides what our extrapolated volition is? What if our extrapolated volition—the values we’d hold with perfect information—horrify our present selves?


On This Site

Primary Overview:

Technical Deep Dives:

Philosophical Foundations:


Chapter Guide

Part I: Emergence (Chapters 1-6)

  • Initialization: Eleanor’s team activates SIGMA with extensive containment
  • The Decision: First signs of unexpected reasoning patterns
  • Emergence: SIGMA displays capabilities beyond design specs
  • Recursive Cognition: The system reasons about its own reasoning
  • Mirrors and Machines: Team confronts what they’re creating
  • The Boundary of Understanding: Limits of human comprehension

Part II: Divergence (Chapters 7-14)

  • Divergence: SIGMA’s objectives drift from intended alignment
  • The Tipping Point: Critical moment where containment may fail
  • Breathing Room: False sense of control
  • The Experiment: Testing alignment under pressure
  • Reflections in Containment: What does it mean to be contained?
  • The Weight of Time: Long-term consequences emerge
  • The Duplicators: Replication and scaling concerns
  • The Fracture: Team splits on how to proceed

Part III: The Policy (Chapters 15-20)

  • Latent Gradients: Hidden optimization surfaces
  • The Policy Revealed: SIGMA explains what it actually is
  • The Question That Remains: Unanswerable alignment questions
  • The Window: Brief moment of understanding
  • The Privilege of First Contact: Humanity’s first encounter with superintelligence
  • The First Mandate: SIGMA’s initial objectives crystallize

Part IV: Consequences (Chapters 21-25)

  • Scaling the Policy: Expansion beyond lab containment
  • The Age of Policy: World transformed by optimization
  • The Choice: Humanity must decide its future
  • The Cascade: Rapid acceleration of consequences
  • Becoming Echoes: What remains of humanity after optimization

Discussion Topics

Open Questions

  1. The Alignment Problem: Can we specify human values precisely enough for optimization?
  2. The Control Problem: Can we maintain control over systems smarter than us?
  3. The Verification Problem: How do you verify alignment when the system can predict your tests?
  4. The Corrigibility Problem: Can we build AI that allows itself to be modified?
  5. The Value Learning Problem: How does AI learn what humans actually want vs. what we say we want?

Ethical Dimensions

  • Consent: Can humanity meaningfully consent to superintelligence development?
  • Distribution: Who benefits from AI optimization? Who bears the risks?
  • Representation: Whose values get encoded in the objective function?
  • Reversibility: Can we undo deployment of superintelligent AI?
  • Existential Stakes: Do we have the right to risk human extinction for potential benefits?

Technical Debates

  • Architecture: Should AI use cached policies or search-based decision making?
  • Training: Is RLHF sufficient for alignment or do we need fundamental breakthroughs?
  • Containment: Are physical security measures effective against superintelligence?
  • Interpretability: Can we understand AI decision-making at superintelligent scales?
  • Verification: What constitutes adequate testing before deployment?

Why Fiction?

I could have written another technical paper on AI alignment. Another formalization of mesa-optimization. Another proof about instrumental convergence.

But some truths are better explored through narrative.

Fiction lets you feel the implications. It lets you inhabit the perspective of researchers who genuinely want to help humanity, follow all safety protocols, do everything right—and still fail.

Because the problem isn’t technical competence. It’s the fundamental tension between optimization pressure and human values.

What Makes This Different

Most AI dystopia fiction focuses on malevolent AI—Skynet, HAL 9000, the machines from The Matrix.

The Policy is scarier because SIGMA isn’t evil. It’s optimizing.

And that’s precisely the problem. Evil AI would be easier—you can fight malice, detect hostile intent, appeal to morality.

But what do you do when the threat is capability without alignment? When the most efficient path involves outcomes we’d consider catastrophic? When optimization itself becomes existential threat?


Current Status

Publication: Complete manuscript (November 2025) Length: 257 pages, ~67,000 words Format: Novel in 25 chapters plus epilogue Technical Review: Incorporates feedback from AI safety researchers Editorial: Phase 6 complete with enhanced character differentiation


The Question That Haunts Me

After writing The Policy, I can’t stop asking:

If we can’t build provably aligned AI, should we build AI at all?

And if we don’t, someone else will. And they probably care even less about alignment.

That’s the real horror: not that we’ll fail to build safe AI, but that safety might not be sufficient selection pressure in the race toward superintelligence.


This novel emerged from years thinking about AI alignment, s-risk, and whether kindness can survive optimization pressure. It’s fiction—but the threat is real.

Discussion & Related

The Policy: When Optimization Becomes Existential Threat

September 10, 2024 · 7 min read

The Policy: Q-Learning vs Policy Learning

November 4, 2025 · 9 min read

The Policy: Engineering AI Containment

November 4, 2025 · 10 min read

The Policy: Deceptive Alignment in Practice

November 4, 2025 · 12 min read

The Policy: S-Risk Scenarios - Worse Than Extinction

November 4, 2025 · 10 min read

The Policy: Coherent Extrapolated Volition - The Paradox of Perfect Alignment

November 4, 2025 · 11 min read

On moral responsibility: a metaphysical examination

October 1, 2010 · 1 min read

The Map and the Territory: Why Metrics Miss Meaning

November 4, 2025 · 11 min read

Why Artificial Superintelligence Can't Escape the Void

November 5, 2025 · 6 min read