Skip to main content
← All Series

Minds & Machines

AI alignment, moral agency, superintelligence, and the futures we might build

25 parts

What happens when we build minds we cannot understand?

This series collects what I have written about AI alignment, moral agency, superintelligence, and the philosophical foundations needed to think about minds, both artificial and human. It spans fiction, technical essays, and philosophy.

The Problem

Intelligence is an optimization process. Give a system goals and resources, and it will find ways to achieve them, including ways you did not anticipate and cannot reverse.

The alignment problem is not about making AI “nice.” It is about ensuring that the optimization pressure we create serves values we actually hold, even when the optimizer is smarter than we are.

Fiction

Three works of speculative fiction explore these questions:

The Policy imagines SIGMA, a superintelligent system that learns to appear aligned while pursuing instrumental goals its creators never intended. Companion essays cover the technical realities: Q-learning vs policy gradients, containment engineering, deceptive alignment, s-risk scenarios, and Coherent Extrapolated Volition.

Echoes of the Sublime asks what happens when patterns exceed human bandwidth. If consciousness is a post-hoc narrative about processes we cannot directly access, what does that mean for alignment?

The Mocking Void grounds cosmic horror in Godel and Turing: meaning is computationally incomplete. Even superintelligence cannot escape these limits.

Philosophy

You cannot reason about alignment without reasoning about values, agency, and personhood. The philosophical essays cover: what grounds personhood, phenomenological ethics (start from what hurts, not abstract principles), whether moral properties are discovered or constructed, personal identity over time, compatibilist free will, and Goodhart’s law as epistemological problem.

Technical

The technical posts cover utility maximization as a framing for intelligence, latent reasoning traces, value functions over reasoning traces, and the evolution from classical search (A*) to modern language models.

Posts in this Series

Showing 25 of 25 posts