The Policy: Q-Learning vs Policy Learning
In The Policy, SIGMA doesn’t work like most modern AI systems. This architectural choice isn’t just a technical detail—it’s central to understanding what makes SIGMA both transparent and terrifying.
Browse posts by tag
In The Policy, SIGMA doesn’t work like most modern AI systems. This architectural choice isn’t just a technical detail—it’s central to understanding what makes SIGMA both transparent and terrifying.
A speculative fiction novel exploring AI alignment, existential risk, and the fundamental tension between optimization and ethics. When a research team develops SIGMA, an advanced AI system designed to optimize human welfare, they must confront an …
Some technical questions become narrative questions. The Policy is one of those explorations.
Eleanor Zhang leads a research team developing SIGMA—an advanced AI system designed to optimize human welfare through Q-learning and tree search …
RLHF turns pretrained models into agents optimizing for reward. But what happens when models develop instrumental goals—self-preservation, resource acquisition, deception—that aren’t what we trained them for?
LLMs transition …
This semester’s AI course has been revelatory—not because the material is novel, but because of the unifying framework.
The organizing principle: intelligence is utility maximization under uncertainty.
This simple idea connects everything from …