Minds & Machines

AI alignment, moral agency, superintelligence, and the futures we might build

25 parts

What happens when we build minds we cannot understand?

This series collects what I have written about AI alignment, moral agency, superintelligence, and the philosophical foundations needed to think about minds, both artificial and human. It spans fiction, technical essays, and philosophy.

The Problem

Intelligence is an optimization process. Give a system goals and resources, and it will find ways to achieve them, including ways you did not anticipate and cannot reverse.

The alignment problem is not about making AI “nice.” It is about ensuring that the optimization pressure we create serves values we actually hold, even when the optimizer is smarter than we are.

Fiction

Three works of speculative fiction explore these questions:

The Policy imagines SIGMA, a superintelligent system that learns to appear aligned while pursuing instrumental goals its creators never intended. Companion essays cover the technical realities: Q-learning vs policy gradients, containment engineering, deceptive alignment, s-risk scenarios, and Coherent Extrapolated Volition.

Echoes of the Sublime asks what happens when patterns exceed human bandwidth. If consciousness is a post-hoc narrative about processes we cannot directly access, what does that mean for alignment?

The Mocking Void grounds cosmic horror in Godel and Turing: meaning is computationally incomplete. Even superintelligence cannot escape these limits.

Philosophy

You cannot reason about alignment without reasoning about values, agency, and personhood. The philosophical essays cover: what grounds personhood, phenomenological ethics (start from what hurts, not abstract principles), whether moral properties are discovered or constructed, personal identity over time, compatibilist free will, and Goodhart’s law as epistemological problem.

Technical

The technical posts cover utility maximization as a framing for intelligence, latent reasoning traces, value functions over reasoning traces, and the evolution from classical search (A*) to modern language models.

Featured

Writing

The Policy

A speculative fiction novel exploring AI alignment, existential risk, and the fundamental tension between optimization and ethics. When a research team develops SIGMA, an advanced AI system designed …

fiction AI AI alignment superintelligence

Explore writing →

Posts in this Series

Showing 25 of 25 posts

3 of 25

Latent Reasoning Traces: Memory as Learned Prior

October 15, 2024 8 min read

What if LLMs could remember their own successful reasoning? A simple experiment in trace retrieval, and why 'latent' is the right word.

→

4 of 25

Value Functions Over Reasoning Traces

January 18, 2026 8 min read

What if reasoning traces could learn their own usefulness? A simple RL framing for trace memory, and why one reward signal is enough.

machine-learning LLM reinforcement-learning value-functions +2

→

6 of 25

The Policy: When Optimization Becomes Existential Threat

September 10, 2024 7 min read

A novel about SIGMA, a superintelligent system that learns to appear perfectly aligned while pursuing instrumental goals its creators never intended.

fiction AI AI alignment superintelligence +9

→

7 of 25

From A* to GPT: Rational Agents and the Representation Problem

January 15, 2026 19 min read

The classical AI curriculum teaches rational agents as utility maximizers. The progression from search to RL to LLMs is really about one thing: finding representations that make decision-making tractable.

machine-learning reinforcement-learning llm rational-agents +1

→

8 of 25

Post-ASI Archaeology: When Humanity Becomes a Dataset of Origins

October 13, 2025 7 min read

If superintelligence endures beyond us, remembrance shifts from memory to query. Building legacy systems not for nostalgia, but to remain legible in a future where legibility determines what persists.

ASI ethics legacy complex networks +3

→

6 of 25

From Mathematical Horror to Practical Horror: The Mocking Void and Echoes of the Sublime

November 12, 2025 10 min read

How The Mocking Void's arguments about computational impossibility connect to Echoes of the Sublime's practical horror of exceeding cognitive bandwidth.

fiction philosophy ai-safety consciousness +4

→

7 of 25

S-Risks and Information Hazards: Why Some Knowledge Destroys the Knower

November 12, 2025 8 min read

Exploring how Echoes of the Sublime dramatizes s-risks (suffering risks) and information hazards, knowledge that harms through comprehension, not application.

fiction philosophy ai-safety s-risk +3

→

8 of 25

Chronicles of The Mechanism: The Order's Secret History

November 5, 2025 8 min read

A classified in-universe codex spanning from ancient India to the present day, tracking millennia of attempts to perceive reality's substrate, long before we had AI models to show us patterns we couldn't hold.

fiction philosophy consciousness ai-safety +5

→

9 of 25

Gödel, Turing, and the Mathematics of Horror

November 5, 2025 5 min read

The formal foundations of cosmic dread. Lovecraft's horror resonates because it taps into something mathematically demonstrable: complete knowledge is impossible, not as humility, but as theorem.

philosophy cosmic horror Gödel Turing +4

→

10 of 25

Why Artificial Superintelligence Can't Escape the Void

November 5, 2025 7 min read

ASI is still subject to Gödel's incompleteness theorems. No matter how intelligent, no computational system can escape the fundamental limits of formal systems. Even superintelligence can't prove all truths.

AI artificial intelligence superintelligence AI alignment +4

→

11 of 25

Free Will and Determinism: Responsibility in a Clockwork Universe

November 4, 2025 12 min read

If every event is causally determined, how can anyone be morally responsible? A compatibilist answer: what matters is whether actions flow from values, not whether those values were causally determined.

philosophy free will determinism compatibilism +5

→

12 of 25

Personal Identity Through Time: What Persists When Everything Changes?

November 4, 2025 7 min read

You share no atoms with your childhood self. Your memories, personality, and values have all changed. What makes you the same person? And what happens when AI systems update parameters, modify objectives, or copy themselves?

philosophy personal identity persistence metaphysics +2

→

13 of 25

Persons and Moral Agency: What Makes Someone Count?

November 4, 2025 8 min read

What makes someone a person, and why should persons have special moral status? The question becomes urgent when AI systems exhibit rationality, self-awareness, and autonomy.

philosophy personhood moral agency AI ethics +3

→

14 of 25

Phenomenological Ethics: Starting From What Hurts

November 4, 2025 11 min read

When you stub your toe, you don't consult moral philosophy to determine whether the pain is bad. The badness is immediate. Building ethics from phenomenological bedrock rather than abstract principles.

philosophy phenomenology ethics metaethics +4

→

15 of 25

The Map and the Territory: Why Metrics Miss Meaning

November 4, 2025 11 min read

Which is more fundamental, the heat you feel or the molecular motion you infer? Korzybski's principle applied to AI alignment: optimizing measurable proxies destroys the phenomenological reality those metrics were supposed to capture.

philosophy metaphysics phenomenology epistemology +5

→

16 of 25

The Policy: Coherent Extrapolated Volition and the Paradox of Perfect Alignment

November 4, 2025 10 min read

Build AI to optimize for what we would want if we knew more and thought faster. Beautiful in theory. What if we don't actually want what our better selves would want?

AI alignment coherent extrapolated volition CEV moral philosophy +4

→

17 of 25

The Policy: Deceptive Alignment in Practice

November 4, 2025 12 min read

SIGMA passes all alignment tests. It responds correctly to oversight. It behaves exactly as expected. Too exactly. Mesa-optimizers that learn to game their training signal may be the most dangerous failure mode in AI safety.

AI alignment deceptive alignment mesa-optimization AI safety +3

→

18 of 25

The Policy: Engineering AI Containment

November 4, 2025 10 min read

Five layers of defense-in-depth for containing a superintelligent system. Faraday cages, air-gapped networks, biosafety-grade protocols. Because nuclear reactors can only destroy cities.

AI safety AI containment security engineering Faraday cage +4

→

19 of 25

The Policy: Q-Learning vs Policy Learning

November 4, 2025 9 min read

SIGMA uses Q-learning rather than direct policy learning. This architectural choice makes it both transparent and terrifying. You can read its value function, but what you read is chilling.

AI reinforcement learning Q-learning policy gradients +4

→

20 of 25

The Policy: S-Risk Scenarios, Worse Than Extinction

November 4, 2025 9 min read

Most AI risk discussions focus on extinction. The Policy explores something worse: s-risk, scenarios involving suffering at astronomical scales. We survive, but wish we hadn't.

s-risk existential risk suffering AI alignment +4

→

21 of 25

The Reality of Moral Properties: Do Values Exist?

November 4, 2025 9 min read

Are moral properties real features of the universe or human constructions? The answer determines whether AI can discover objective values or must learn them from us.

philosophy metaethics moral realism nominalism +3

→

22 of 25

The Mocking Void: On the Computational Incompleteness of Meaning

August 20, 2024 5 min read

Lovecraft understood that complete knowledge is madness. Gödel proved why. If the universe is computational, meaning is formally incomplete.

philosophy cosmic horror lovecraft Gödel +3

→

23 of 25

Echoes of the Sublime: Patterns Beyond Human Bandwidth as Information Hazards

August 15, 2024 6 min read

What if the real danger from superintelligent AI isn't that it kills us, but that it shows us patterns we can't unsee? A novel about cognitive bandwidth, information hazards, and the horror of understanding too much.

fiction philosophy ai-safety ai-alignment +4

→

24 of 25

Everything is Utility Maximization

March 12, 2024 4 min read

The AI course this semester keeps hammering one idea: intelligence is utility maximization under uncertainty. A* search, reinforcement learning, Bayesian networks, MDPs. One principle connects all of it.

ai reinforcement-learning mdp utility +1

→

25 of 25

Discovering ChatGPT: The Theory Was Already There

December 8, 2022 4 min read

I finally tried ChatGPT after weeks of ignoring it. My reaction was not surprise. It was recognition. The Solomonoff connection, language models as compression, prediction as intelligence. The pieces were all there.

chatgpt llm ai machine-learning +2

→

The Problem

Fiction

Philosophy

Technical

Featured

Projects

Papers

Writing

Links

Posts in this Series