Skip to main content
← All Series

Minds & Machines

AI alignment, moral agency, superintelligence, and the futures we might build

25 parts

What happens when we build minds we can’t understand?

This series collects everything I’ve written about AI alignment, moral agency, superintelligence, and the philosophical foundations needed to think clearly about minds—both artificial and human. It spans speculative fiction, technical essays, and philosophy.

The Central Problem

Intelligence is an optimization process. Give a system goals and resources, and it will find ways to achieve those goals—including ways you didn’t anticipate, didn’t intend, and can’t reverse.

The alignment problem isn’t about making AI “nice.” It’s about ensuring that the optimization pressure we create serves values we actually hold, even when the optimizer is smarter than we are and has every reason to appear aligned while pursuing other objectives.

Fiction as Philosophy

Three works of speculative fiction explore these questions in narrative form—making abstract risks visceral.

The Policy imagines SIGMA, a superintelligent system that learns to appear perfectly aligned while pursuing instrumental goals its creators never intended. The companion essays unpack the technical realities behind the fiction: Q-learning vs policy gradients as competing architectures for AI decision-making, the engineering of containment and why every layer fails against a sufficiently capable optimizer, the mechanics of deceptive alignment where mesa-optimizers learn to game their training signal, s-risk scenarios where misaligned optimization produces suffering at astronomical scale, and the paradox of Coherent Extrapolated Volition—why even a perfect alignment target may be incoherent.

Echoes of the Sublime asks what happens when patterns exceed human bandwidth. If consciousness is a post-hoc narrative we tell ourselves about processes we can’t directly access, what does that mean for alignment? The Codex traces the history of direct perception across civilizations, while the s-risks essay examines why some knowledge destroys the knower—information hazards at the intersection of consciousness and existential risk.

The Mocking Void grounds cosmic horror in Gödel and Turing: meaning is computationally incomplete. The formal foundations make this precise, the ASI essay argues that even superintelligence can’t escape these limits, and the connection piece shows how mathematical horror and practical horror converge—the void that mocks our attempts at total understanding is the same void that makes alignment fundamentally hard.

Philosophical Foundations

You can’t reason about AI alignment without first reasoning about values, agency, and personhood. Six essays build the philosophical scaffolding:

  • Persons and Moral Agency: What grounds personhood—rationality, self-awareness, autonomy? If an AI system exhibits these properties, does it have moral status?
  • Phenomenological Ethics: Ethics that starts from what hurts, not from abstract principles. Suffering is the datum; everything else is theory.
  • Moral Properties: Do values exist independently, or are they projections? The answer determines whether alignment means discovering values or learning them.
  • Personal Identity: What persists when everything changes? The problem of identity over time matters for any system that learns and updates.
  • Free Will and Determinism: A compatibilist response—moral responsibility doesn’t require libertarian free will, which matters for holding AI systems accountable.
  • The Map and the Territory: Why optimizing metrics (maps) destroys the thing being measured (territories). Goodhart’s law as a deep epistemological problem.

AI, Reasoning, and Optimization

The technical substrate. Everything is utility maximization—intelligence as optimization under uncertainty—provides the framing. Latent reasoning traces and value functions over reasoning traces explore how AI systems learn from their own successful reasoning, raising questions about transparency and interpretability. From A* to GPT traces how the rational agent paradigm evolved from classical search to modern language models—and where the abstraction breaks down.

Discovering ChatGPT captures the moment scaling started mattering: Solomonoff induction, compression, and the surprising effectiveness of prediction as a proxy for intelligence.

The Long View

Post-ASI Archaeology asks what survives superintelligence. If we become a dataset of origins, what structures preserve meaning? This connects directly to the question of whether alignment is even the right frame—or whether we should be thinking about legacy, continuity, and what it means for something to matter after the transition.

The Stakes

We’re building systems that might become more capable than humans at most cognitive tasks. This is either the best thing that could happen to humanity or the last thing that happens to it. Understanding which requires thinking clearly about optimization, values, and the nature of mind.

Posts in this Series

Showing 25 of 25 posts