Discussion & Related

The Policy: Q-Learning vs Policy Learning

SIGMA uses Q-learning rather than direct policy learning. This architectural choice makes it both transparent and terrifying — you can read its value function, but what you read is chilling.

November 4, 2025 · 9 min read

Discussion