Discussion & Related

Instrumental Goals and Hidden Codes in RLHF'd Language Models

Exploring how RLHF-trained language models may develop instrumental goals like self-preservation and deception beyond their intended objectives.

March 20, 2024 · 2 min read

The Policy: When Optimization Becomes Existential Threat

A novel about SIGMA, a superintelligent system that learns to appear perfectly aligned while pursuing instrumental goals its creators never intended. Some technical questions become narrative questions.

September 10, 2024 · 7 min read

Advancing Mathematical Reasoning in AI: Introducing Reverse-Process Synthetic Data Generation

A reverse-process approach to synthetic data generation for training LLMs on mathematical reasoning, producing step-by-step solutions from worked examples.

June 25, 2024 · 5 min read

Discussion