Getting Started with compositional.mle
Alexander Towell
2025-12-17
Source:vignettes/getting-started.Rmd
getting-started.RmdIntroduction
The compositional.mle package provides
composable optimization strategies for maximum
likelihood estimation (MLE). Following SICP principles, it offers:
-
Primitive solvers -
gradient_ascent(),newton_raphson(),bfgs(),nelder_mead(), etc. -
Composition operators -
%>>%(sequential),%|%(race),with_restarts() - Closure property - Combining solvers yields a solver
Quick Start: Normal Distribution MLE
# Generate sample data
set.seed(123)
data <- rnorm(100, mean = 5, sd = 2)
# Define the problem (separate from solver strategy)
problem <- mle_problem(
loglike = function(theta) {
if (theta[2] <= 0) return(-Inf)
sum(dnorm(data, theta[1], theta[2], log = TRUE))
},
score = function(theta) {
mu <- theta[1]; sigma <- theta[2]; n <- length(data)
c(sum(data - mu) / sigma^2,
-n / sigma + sum((data - mu)^2) / sigma^3)
},
constraint = mle_constraint(
support = function(theta) theta[2] > 0,
project = function(theta) c(theta[1], max(theta[2], 1e-6))
),
theta_names = c("mu", "sigma")
)
# Solve with gradient ascent
result <- gradient_ascent()(problem, theta0 = c(0, 1))
cat("Estimated mean:", result$theta.hat[1], "(true: 5)\n")
#> Estimated mean: 5.180812 (true: 5)
cat("Estimated sd:", result$theta.hat[2], "(true: 2)\n")
#> Estimated sd: 1.816481 (true: 2)The Problem-Solver Separation
The key design principle is separating what you’re estimating from how you estimate it:
# The problem encapsulates the statistical model
print(problem)
#> MLE Problem
#> Parameters: mu, sigma
#> Score: analytic
#> Fisher: numerical
#> Constraints: yes
# Solvers are independent strategies
solver1 <- gradient_ascent(max_iter = 100)
solver2 <- newton_raphson(max_iter = 50)
solver3 <- bfgs()
# Same problem, different solvers
result1 <- solver1(problem, c(0, 1))
result2 <- solver2(problem, c(0, 1))
result3 <- solver3(problem, c(0, 1))
cat("Gradient ascent:", result1$theta.hat, "\n")
#> Gradient ascent: 5.180812 1.816481
cat("Newton-Raphson:", result2$theta.hat, "\n")
#> Newton-Raphson: 0 1
cat("BFGS:", result3$theta.hat, "\n")
#> BFGS: 100.7711 567.4039Composing Solvers
Sequential Chaining (%>>%)
Chain solvers for coarse-to-fine optimization:
# Grid search finds a good region, then gradient ascent refines
strategy <- grid_search(lower = c(-10, 0.5), upper = c(10, 5), n = 5) %>>%
gradient_ascent(max_iter = 50)
result <- strategy(problem, theta0 = c(0, 1))
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481Three-Stage Refinement
# Coarse grid -> gradient ascent -> Newton-Raphson polish
strategy <- grid_search(lower = c(-10, 0.5), upper = c(10, 5), n = 5) %>>%
gradient_ascent(max_iter = 30) %>>%
newton_raphson(max_iter = 10)
result <- strategy(problem, theta0 = c(0, 1))
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481
cat("Converged:", result$converged, "\n")
#> Converged: FALSEParallel Racing (%|%)
Race multiple methods and keep the best result:
# Try gradient-based and derivative-free methods
strategy <- gradient_ascent() %|% bfgs() %|% nelder_mead()
result <- strategy(problem, theta0 = c(0, 1))
cat("Winner:", result$solver, "\n")
#> Winner: gradient_ascent
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481Random Restarts
Escape local optima with multiple starting points:
strategy <- with_restarts(
gradient_ascent(),
n = 10,
sampler = uniform_sampler(c(-10, 0.5), c(10, 5))
)
result <- strategy(problem, theta0 = c(0, 1))
cat("Best restart:", result$best_restart, "of", result$n_restarts, "\n")
#> Best restart: 1 of 10
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481Conditional Refinement
Only refine if the first solver didn’t converge:
strategy <- unless_converged(
gradient_ascent(max_iter = 10), # Quick attempt
newton_raphson(max_iter = 50) # Refine if needed
)
result <- strategy(problem, theta0 = c(0, 1))
cat("Result:", result$theta.hat, "\n")
#> Result: 5.180812 1.816481Available Solvers
| Factory | Method | Requires |
|---|---|---|
gradient_ascent() |
Steepest ascent with line search | score (or numerical) |
newton_raphson() |
Second-order Newton | score, fisher (or numerical) |
bfgs() |
Quasi-Newton BFGS | score (or numerical) |
lbfgsb() |
L-BFGS-B with box constraints | score (or numerical) |
nelder_mead() |
Simplex (derivative-free) | - |
grid_search() |
Exhaustive grid | - |
random_search() |
Random sampling | - |
Constraints
Define domain constraints with support checking and projection:
# Positive parameters
pos_constraint <- mle_constraint(
support = function(theta) all(theta > 0),
project = function(theta) pmax(theta, 1e-8)
)
# Box constraints [0, 10]
box_constraint <- mle_constraint(
support = function(theta) all(theta >= 0 & theta <= 10),
project = function(theta) pmax(0, pmin(10, theta))
)
# Use in problem definition
problem_constrained <- mle_problem(
loglike = function(theta) -sum((theta - 5)^2),
constraint = pos_constraint
)Function Transformers
Stochastic Gradient (Mini-batching)
For large datasets, subsample observations:
# Original log-likelihood uses all data
loglike_full <- function(theta, obs = large_data) {
sum(dnorm(obs, theta[1], theta[2], log = TRUE))
}
# Stochastic version uses random subsets
loglike_sgd <- with_subsampling(loglike_full, data = large_data, subsample_size = 100)Regularization
Add penalty terms for regularization:
loglike <- function(theta) -sum(theta^2)
# L1 (Lasso), L2 (Ridge), Elastic Net
loglike_l1 <- with_penalty(loglike, penalty_l1(), lambda = 0.1)
loglike_l2 <- with_penalty(loglike, penalty_l2(), lambda = 0.1)
loglike_enet <- with_penalty(loglike, penalty_elastic_net(alpha = 0.5), lambda = 0.1)
theta <- c(1, 2, 3)
cat("Original:", loglike(theta), "\n")
#> Original: -14
cat("With L1:", loglike_l1(theta), "\n")
#> With L1: -14.6
cat("With L2:", loglike_l2(theta), "\n")
#> With L2: -15.4Tracing Optimization
Track the optimization path for diagnostics:
trace_config <- mle_trace(values = TRUE, path = TRUE, gradients = TRUE)
result <- gradient_ascent(max_iter = 20)(
problem,
theta0 = c(0, 1),
trace = trace_config
)
if (!is.null(result$trace_data)) {
cat("Iterations:", result$trace_data$total_iterations, "\n")
cat("Final log-likelihood:", tail(result$trace_data$values, 1), "\n")
}
#> Iterations: 20
#> Final log-likelihood: -201.5839API Summary
Problem Specification: - mle_problem()
- Define the estimation problem - mle_constraint() - Domain
constraints
Solver Factories: - gradient_ascent(),
newton_raphson(), bfgs(),
lbfgsb(), nelder_mead() -
grid_search(), random_search()
Composition: - %>>% - Sequential
chaining - %|% - Parallel racing -
with_restarts() - Multiple starting points -
unless_converged() - Conditional refinement -
compose() - Compose multiple solvers
Samplers: - uniform_sampler(),
normal_sampler()
Transformers: - with_subsampling(),
with_penalty() - penalty_l1(),
penalty_l2(), penalty_elastic_net()