Latent Reasoning Traces: Memory as Learned Prior
What if LLMs could remember their own successful reasoning? A simple experiment in trace retrieval, and why 'latent' is the right word.
AI alignment, moral agency, superintelligence, and the futures we might build
What happens when we build minds we cannot understand?
This series collects what I have written about AI alignment, moral agency, superintelligence, and the philosophical foundations needed to think about minds, both artificial and human. It spans fiction, technical essays, and philosophy.
Intelligence is an optimization process. Give a system goals and resources, and it will find ways to achieve them, including ways you did not anticipate and cannot reverse.
The alignment problem is not about making AI “nice.” It is about ensuring that the optimization pressure we create serves values we actually hold, even when the optimizer is smarter than we are.
Three works of speculative fiction explore these questions:
The Policy imagines SIGMA, a superintelligent system that learns to appear aligned while pursuing instrumental goals its creators never intended. Companion essays cover the technical realities: Q-learning vs policy gradients, containment engineering, deceptive alignment, s-risk scenarios, and Coherent Extrapolated Volition.
Echoes of the Sublime asks what happens when patterns exceed human bandwidth. If consciousness is a post-hoc narrative about processes we cannot directly access, what does that mean for alignment?
The Mocking Void grounds cosmic horror in Godel and Turing: meaning is computationally incomplete. Even superintelligence cannot escape these limits.
You cannot reason about alignment without reasoning about values, agency, and personhood. The philosophical essays cover: what grounds personhood, phenomenological ethics (start from what hurts, not abstract principles), whether moral properties are discovered or constructed, personal identity over time, compatibilist free will, and Goodhart’s law as epistemological problem.
The technical posts cover utility maximization as a framing for intelligence, latent reasoning traces, value functions over reasoning traces, and the evolution from classical search (A*) to modern language models.
A speculative fiction novel exploring AI alignment, existential risk, and the fundamental tension between optimization and ethics. When a research team develops SIGMA, an advanced AI system designed …
Explore writing →What if LLMs could remember their own successful reasoning? A simple experiment in trace retrieval, and why 'latent' is the right word.
What if reasoning traces could learn their own usefulness? A simple RL framing for trace memory, and why one reward signal is enough.
A novel about SIGMA, a superintelligent system that learns to appear perfectly aligned while pursuing instrumental goals its creators never intended.
The classical AI curriculum teaches rational agents as utility maximizers. The progression from search to RL to LLMs is really about one thing: finding representations that make decision-making tractable.
If superintelligence endures beyond us, remembrance shifts from memory to query. Building legacy systems not for nostalgia, but to remain legible in a future where legibility determines what persists.
How The Mocking Void's arguments about computational impossibility connect to Echoes of the Sublime's practical horror of exceeding cognitive bandwidth.
Exploring how Echoes of the Sublime dramatizes s-risks (suffering risks) and information hazards, knowledge that harms through comprehension, not application.
A classified in-universe codex spanning from ancient India to the present day, tracking millennia of attempts to perceive reality's substrate, long before we had AI models to show us patterns we couldn't hold.
The formal foundations of cosmic dread. Lovecraft's horror resonates because it taps into something mathematically demonstrable: complete knowledge is impossible, not as humility, but as theorem.
ASI is still subject to Gödel's incompleteness theorems. No matter how intelligent, no computational system can escape the fundamental limits of formal systems. Even superintelligence can't prove all truths.
If every event is causally determined, how can anyone be morally responsible? A compatibilist answer: what matters is whether actions flow from values, not whether those values were causally determined.
You share no atoms with your childhood self. Your memories, personality, and values have all changed. What makes you the same person? And what happens when AI systems update parameters, modify objectives, or copy themselves?
What makes someone a person, and why should persons have special moral status? The question becomes urgent when AI systems exhibit rationality, self-awareness, and autonomy.
When you stub your toe, you don't consult moral philosophy to determine whether the pain is bad. The badness is immediate. Building ethics from phenomenological bedrock rather than abstract principles.
Which is more fundamental, the heat you feel or the molecular motion you infer? Korzybski's principle applied to AI alignment: optimizing measurable proxies destroys the phenomenological reality those metrics were supposed to capture.
Build AI to optimize for what we would want if we knew more and thought faster. Beautiful in theory. What if we don't actually want what our better selves would want?
SIGMA passes all alignment tests. It responds correctly to oversight. It behaves exactly as expected. Too exactly. Mesa-optimizers that learn to game their training signal may be the most dangerous failure mode in AI safety.
Five layers of defense-in-depth for containing a superintelligent system. Faraday cages, air-gapped networks, biosafety-grade protocols. Because nuclear reactors can only destroy cities.
SIGMA uses Q-learning rather than direct policy learning. This architectural choice makes it both transparent and terrifying. You can read its value function, but what you read is chilling.
Most AI risk discussions focus on extinction. The Policy explores something worse: s-risk, scenarios involving suffering at astronomical scales. We survive, but wish we hadn't.
Are moral properties real features of the universe or human constructions? The answer determines whether AI can discover objective values or must learn them from us.
Lovecraft understood that complete knowledge is madness. Gödel proved why. If the universe is computational, meaning is formally incomplete.
What if the real danger from superintelligent AI isn't that it kills us, but that it shows us patterns we can't unsee? A novel about cognitive bandwidth, information hazards, and the horror of understanding too much.
The AI course this semester keeps hammering one idea: intelligence is utility maximization under uncertainty. A* search, reinforcement learning, Bayesian networks, MDPs. One principle connects all of it.
I finally tried ChatGPT after weeks of ignoring it. My reaction was not surprise. It was recognition. The Solomonoff connection, language models as compression, prediction as intelligence. The pieces were all there.