Everything is Utility Maximization

March 12, 2024 4 min read Updated: February 24, 2026

The AI course this semester keeps hammering one idea, and it is sticking: intelligence is utility maximization under uncertainty.

That is the organizing principle. Once you see it, every topic in the course is a variation on the same theme.

Search as Optimization

We started with classical search. Depth-first search minimizes memory. Breadth-first search guarantees shortest paths. A* minimizes total cost given an admissible heuristic.

These are not just algorithms. They are optimization strategies for different utility functions under different constraints. A* is provably optimal when the heuristic never overestimates. It maximizes progress toward the goal while minimizing wasted exploration. That is utility maximization in a known, deterministic environment.

MDPs: Sequential Decisions Under Uncertainty

Markov Decision Processes add time and stochasticity:

States: where you are
Actions: what you can do
Transitions: where actions lead, probabilistically
Rewards: immediate utility
Policy: a strategy mapping states to actions

The goal is a policy that maximizes expected cumulative reward. You get stochastic outcomes, delayed rewards, and exploration-exploitation tradeoffs all at once.

The Bellman equation makes it tractable:

V(s) = max_a [R(s,a) + γ Σ P(s’|s,a) V(s’)]

Optimal value equals immediate reward plus discounted future value. Clean.

Reinforcement Learning: Learning the Utility Landscape

RL goes further. You do not know the transition dynamics or the reward function. You have to explore to discover states, learn which actions lead where, estimate reward structures, and optimize your policy while you are still learning.

Q-learning is remarkably simple:

Q(s,a) ← Q(s,a) + α[r + γ max_a’ Q(s’,a’) - Q(s,a)]

Update your estimate of action value based on observed reward plus your best guess at future value. This is meta-optimization: learning how to learn what to maximize.

Bayesian Networks: Inference as Optimization

Bayesian networks model belief and inference. Represent uncertainty with probability distributions, update beliefs via Bayes’ rule, make decisions that maximize expected utility given current beliefs.

Even reasoning becomes utility maximization. Given limited computation, how do you allocate inference steps to maximize decision quality? This connects to bounded rationality. Real intelligence is not perfect optimization. It is good-enough optimization under resource constraints.

The Pattern

Once you see utility maximization as the unifying lens, the whole field organizes itself:

Search: utility maximization, known deterministic environment
Planning: utility maximization, known transition model
Reinforcement learning: utility maximization, unknown environment
Supervised learning: utility maximization over prediction accuracy
Unsupervised learning: utility maximization over reconstruction or likelihood

Utility functions all the way down.

Why This Matters for Alignment

If AI systems maximize utility, then ensuring good outcomes means specifying the right utility function. This is harder than it sounds. Proxy metrics get Goodharted. Simple objectives miss nuance. Human values are complex and sometimes contradictory.

More intelligence means better optimization of whatever utility function you have. Capability and alignment are separate problems. You can have very capable systems optimizing the wrong thing. That is the danger.

Multi-agent scenarios make it worse. When multiple agents optimize different utilities, you need game theory, negotiation, aggregate social welfare functions. Real-world AI is not single-agent.

And computational limits matter. Perfect utility maximization is often intractable (PSPACE-hard or worse). Real intelligence is approximate optimization under constraints: limited computation, limited information, limited time, bounded memory. Heuristics and satisficing and “good enough” are not bugs. They are features.

The Philosophical Problem

Framing intelligence as optimization raises hard questions. What should we maximize? Happiness? Preference satisfaction? Objective goods? Whose preferences?

How do you aggregate utilities? Utilitarian sum? Prioritarian weighting? Maximin? Rights constraints?

Can suffering be offset? Is one person’s extreme suffering worth many people’s mild happiness? I say no. Utilitarianism says yes. These are not just philosophical puzzles. They are engineering requirements for AI systems that will make decisions affecting people.

What I Am Taking From This

The course crystallized a few things:

Intelligence is optimization. Once you see it, you cannot unsee it.
Utility specification is the hard part. Get it wrong and more capability makes things worse.
Multi-agent coordination is genuinely difficult. Different utilities create conflicts with no clean resolution.
Real intelligence is bounded. Perfection is impossible. Good enough under constraints is the actual goal.
Alignment is utility alignment. Make sure the function being optimized matches what we actually want.

This connects to what I am working on with AI conversations as complex networks. Reasoning traces are search through concept space. Knowledge graphs encode transition models. Attention patterns show utility gradients. If you can see what an AI system is actually optimizing, you can start to see where alignment breaks down.

Everything is utility maximization. The question is whose utility, and at what cost.

Search as Optimization

MDPs: Sequential Decisions Under Uncertainty

Reinforcement Learning: Learning the Utility Landscape

Bayesian Networks: Inference as Optimization

The Pattern

Why This Matters for Alignment

The Philosophical Problem

What I Am Taking From This

Related Posts

From A* to GPT: Rational Agents and the Representation Problem

Science as Verifiable Search

Value Functions Over Reasoning Traces

The Incomputability of Simple Learning

Reinforcement Learning: An Introduction

Notes

Discussion