Discussion & Related
Instrumental Goals and Hidden Codes in RLHF'd Language Models
March 20, 2024 · 2 min read
The Policy: When Optimization Becomes Existential Threat
A novel about SIGMA, a superintelligent system that learns to appear perfectly aligned while pursuing instrumental goals its creators never intended. Some technical questions become narrative questions.
September 10, 2024 · 7 min read
Advancing Mathematical Reasoning in AI: Introducing Reverse-Process Synthetic Data Generation
June 25, 2024 · 5 min read