Learning to Prompt in Unknown Environments: A POMDP Framework with Compositional Actions for Large Language Models

Alex Towell

Discussion & Related

The Infinite Table

Part 1 of What Your RL Algorithm Actually Assumes — tabular Q-learning makes zero assumptions about state similarity and pays for it in sample complexity.

March 15, 2026 · 6 min read

The Policy: Q-Learning vs Policy Learning

SIGMA uses Q-learning rather than direct policy learning. This architectural choice makes it both transparent and terrifying. You can read its value function, but what you read is chilling.

November 4, 2025 · 6 min read