Discussion & Related
The Infinite Table
Part 1 of What Your RL Algorithm Actually Assumes — tabular Q-learning makes zero assumptions about state similarity and pays for it in sample complexity.
March 15, 2026 · 6 min read
The Policy: Q-Learning vs Policy Learning
SIGMA uses Q-learning rather than direct policy learning. This architectural choice makes it both transparent and terrifying. You can read its value function, but what you read is chilling.
November 4, 2025 · 6 min read