vignettes/algebraic-mle-integration.Rmd
algebraic-mle-integration.RmdThe likelihood.model package is part of an ecosystem of
three interoperable packages:
algebraic.dist — probability
distribution objects with a common interface (params,
sampler, expectation, etc.)algebraic.mle — maximum likelihood
estimation infrastructure, defining the mle class and
generics like params, nparams,
observed_fim, rmap, marginal, and
expectation
likelihood.model — likelihood model
specification and Fisherian inferenceWhen you fit a model with likelihood.model, the result
is a fisher_mle object that inherits from
algebraic.mle::mle. This means all of the
algebraic.mle generics work on it directly:
params, nparams,
observed_fim
Let’s fit a Weibull model and explore the MLE through the
algebraic.mle interface:
set.seed(42)
true_shape <- 2.5
true_scale <- 3.0
n <- 200
df <- data.frame(x = rweibull(n, shape = true_shape, scale = true_scale))
model <- weibull_uncensored("x")
mle_result <- fit(model)(df, par = c(1, 1))
# algebraic.mle generics — these work because fisher_mle inherits from mle
params(mle_result)
#> [1] 2.289 2.961
nparams(mle_result)
#> [1] 2The MLE estimates (roughly 2.29 for shape, 2.96 for scale) are close
to the true values (2.5, 3.0), demonstrating good recovery with
observations. The params() and nparams()
generics work identically to how they work on native
algebraic.mle::mle objects — because
fisher_mle inherits from that class, the entire
algebraic.mle interface is available without any adapter
code.
The observed_fim() function returns the observed Fisher
information matrix, computed as the negative Hessian of the
log-likelihood at the MLE:
observed_fim(mle_result)
#> [,1] [,2]
#> [1,] 76.00 -30.96
#> [2,] -30.96 119.58The off-diagonal element (roughly ) reveals a negative correlation between the shape and scale estimators — when one is overestimated, the other tends to be underestimated. This is a common feature of location-scale families: the parameters trade off against each other in explaining the data.
This should be positive definite at the MLE — we can verify:
eigen(observed_fim(mle_result))$values
#> [1] 135.65 59.94Both eigenvalues are positive, confirming the MLE sits at a proper maximum of the log-likelihood. The ratio of the eigenvalues (roughly 2.3:1) tells us the likelihood surface is moderately elongated — the curvature is steeper in one direction than the other, meaning one linear combination of the parameters is better determined than the other.
For comparison, vcov() is the inverse of the observed
FIM:
vcov(mle_result)
#> [,1] [,2]
#> [1,] 0.014708 0.003807
#> [2,] 0.003807 0.009348
solve(observed_fim(mle_result))
#> [,1] [,2]
#> [1,] 0.014708 0.003807
#> [2,] 0.003807 0.009348The numerical equality confirms that vcov() is computed
as solve(observed_fim()), the classical relationship
.
This is worth noting because the two matrices carry different conceptual
meaning — the FIM measures information content (how much the
data tell us about the parameters), while the variance-covariance matrix
measures estimation uncertainty — yet they are matrix inverses
of each other.
rmap
One powerful feature of algebraic.mle is the
rmap() function, which transforms MLE parameters using the
delta method. This propagates uncertainty through nonlinear
transformations.
For a Weibull distribution with shape and scale , the mean lifetime is:
We can transform our (shape, scale) MLE to get an MLE for the mean lifetime:
# Transform Weibull (shape, scale) -> mean lifetime
mean_life_mle <- rmap(mle_result, function(p) {
c(mean_lifetime = p[2] * gamma(1 + 1 / p[1]))
})
params(mean_life_mle)
#> mean_lifetime
#> 2.623
se(mean_life_mle)
#> [1] 0.08492The delta method gives an estimated mean lifetime of roughly 2.62 with an SE of about 0.085. This propagates the joint uncertainty in (shape, scale) through the nonlinear -function transformation — something not possible with simple error propagation formulas that treat parameters independently. The off-diagonal covariance we saw earlier is correctly accounted for.
Compare to the true mean lifetime:
true_mean <- true_scale * gamma(1 + 1 / true_shape)
cat("True mean lifetime:", true_mean, "\n")
#> True mean lifetime: 2.662
cat("Estimated mean lifetime:", params(mean_life_mle), "\n")
#> Estimated mean lifetime: 2.623
cat("95% CI:", confint(mean_life_mle), "\n")
#> 95% CI: 2.456 2.789The true mean lifetime (about 2.66) falls within the 95% confidence interval, as expected. The estimate slightly undershoots because our shape estimate (roughly 2.29) is below the true 2.5 — and the mean lifetime is sensitive to shape through the function.
We can also transform to multiple derived quantities at once:
# Derive mean, variance, and median of the Weibull distribution
derived_mle <- rmap(mle_result, function(p) {
k <- p[1]; lam <- p[2]
c(
mean = lam * gamma(1 + 1/k),
var = lam^2 * (gamma(1 + 2/k) - gamma(1 + 1/k)^2),
median = lam * log(2)^(1/k)
)
})
params(derived_mle)
#> mean var median
#> 2.623 1.475 2.523
se(derived_mle)
#> [1] 0.08758 0.14780 0.09227Notice that the variance has the largest SE (roughly 0.15) relative to its estimate — second moments are inherently harder to estimate than first moments like the mean and median. This is a general phenomenon: higher-order moments amplify estimation uncertainty because they involve larger powers of the data.
The marginal() function extracts the sampling
distribution of a single parameter from a multivariate MLE:
# Marginal for shape parameter
shape_mle <- marginal(mle_result, 1)
params(shape_mle)
#> [1] 2.289
se(shape_mle)
#> [1] 0.1213
confint(shape_mle)
#> 2.5% 97.5%
#> param1 2.051 2.527
# Marginal for scale parameter
scale_mle <- marginal(mle_result, 2)
params(scale_mle)
#> [1] 2.961
se(scale_mle)
#> [1] 0.09668
confint(scale_mle)
#> 2.5% 97.5%
#> param1 2.771 3.15Both 95% CIs contain the true values (shape: 2.5; scale: 3.0). The marginal distributions are useful for reporting single-parameter uncertainties, though they discard the correlation between parameters. For inference that respects the joint structure — such as a confidence region for (shape, scale) simultaneously — you would use the full multivariate MLE object.
The expectation() function computes Monte Carlo
expectations over the MLE’s asymptotic sampling distribution. This is
useful for computing quantities that are nonlinear functions of the
parameters.
set.seed(123)
# E[shape^2] under the asymptotic distribution
e_shape_sq <- expectation(mle_result, function(p) p[1]^2,
control = list(n = 10000L))
cat("E[shape^2]:", e_shape_sq, "\n")
#> E[shape^2]: 5.254
cat("shape^2 at MLE:", params(mle_result)[1]^2, "\n")
#> shape^2 at MLE: 5.24
# Probability that shape > 2 (under asymptotic distribution)
pr_shape_gt_2 <- expectation(mle_result, function(p) as.numeric(p[1] > 2),
control = list(n = 10000L))
cat("P(shape > 2):", pr_shape_gt_2, "\n")
#> P(shape > 2): 0.9917versus the plug-in value . The small difference (roughly 0.01) equals , as expected from the identity . The probability gives strong evidence that the true shape exceeds 2, which would be useful in practice to confirm the hazard rate is increasing (a Weibull shape implies increasing hazard; implies the hazard rate itself is accelerating).
Under standard MLE asymptotics, bias is zero and MSE equals the variance-covariance matrix:
mse(mle_result)
#> [,1] [,2]
#> [1,] 0.014708 0.003807
#> [2,] 0.003807 0.009348
all.equal(mse(mle_result), vcov(mle_result))
#> [1] TRUEThe TRUE result confirms that MSE = Vcov under the
assumption of zero asymptotic bias. In finite samples (especially small
),
a bootstrap-based mse() can differ from vcov()
because the bias component
is no longer negligible.
The sampler() generic creates a bootstrap sampling
distribution. The resulting fisher_boot object also
satisfies the mle interface:
set.seed(42)
boot_sampler <- sampler(model, df = df, par = c(1, 1))
boot_result <- boot_sampler(n = 200)
# Same algebraic.mle generics work
params(boot_result)
#> [1] 2.289 2.961
nparams(boot_result)
#> [1] 2
se(boot_result)
#> [1] 0.11575 0.09085
bias(boot_result)
#> [1] 0.006446 -0.005939The bootstrap bias (roughly 0.006 for shape, for scale) is negligible compared to the standard errors, consistent with the MLE being asymptotically unbiased. The bootstrap SEs are slightly smaller than the asymptotic SEs — both methods agree well with observations, but they need not agree exactly because they estimate the sampling variability in different ways (resampling vs the curvature of the log-likelihood).
The bootstrap provides an alternative (non-parametric) estimate of the sampling distribution. Compare bootstrap and asymptotic confidence intervals:
cat("Asymptotic 95% CI:\n")
#> Asymptotic 95% CI:
confint(mle_result)
#> 2.5% 97.5%
#> [1,] 2.051 2.527
#> [2,] 2.771 3.150
cat("\nBootstrap percentile 95% CI:\n")
#>
#> Bootstrap percentile 95% CI:
confint(boot_result, type = "perc")
#> 2.5% 97.5%
#> [1,] 2.069 2.577
#> [2,] 2.778 3.130The asymptotic and bootstrap CIs are close but not identical. The bootstrap percentile intervals are slightly asymmetric (the upper tails extend a bit further), capturing the mild skewness in the finite-sample distribution that the symmetric Wald intervals miss. For well-behaved likelihoods with moderate , the practical difference is small — but in small-sample or non-regular problems, the bootstrap can be substantially more accurate.
algebraic.dist Distributions
The algebraic.dist package provides distribution objects
with a consistent interface. We can compare the MLE’s asymptotic
sampling distribution to theoretical distribution objects.
For example, the MLE of an exponential rate parameter is asymptotically normal with variance . We can create this theoretical distribution:
set.seed(42)
lambda_true <- 2.0
n_exp <- 200
df_exp <- data.frame(t = rexp(n_exp, rate = lambda_true))
model_exp <- exponential_lifetime("t")
mle_exp <- fit(model_exp)(df_exp)
cat("MLE:", params(mle_exp), "\n")
#> MLE: 1.727
cat("SE:", se(mle_exp), "\n")
#> SE: 0.1221
# Theoretical asymptotic distribution of the MLE
asymp_var <- lambda_true^2 / n_exp
asymp_dist <- normal(mu = lambda_true, var = asymp_var)
cat("\nTheoretical asymptotic distribution:\n")
#>
#> Theoretical asymptotic distribution:
cat(" Mean:", params(asymp_dist)[1], "\n")
#> Mean: 2
cat(" Variance:", params(asymp_dist)[2], "\n")
#> Variance: 0.02
# Compare: sample from the MLE's estimated distribution
mle_sampler <- sampler(mle_exp)
set.seed(1)
mle_samples <- mle_sampler(5000)
# vs. sample from the theoretical distribution
dist_sampler <- sampler(asymp_dist)
set.seed(1)
dist_samples <- dist_sampler(5000)
cat("\nMLE sampler mean:", mean(mle_samples), "\n")
#>
#> MLE sampler mean: 1.727
cat("Theoretical sampler mean:", mean(dist_samples), "\n")
#> Theoretical sampler mean: 2
cat("MLE sampler sd:", sd(mle_samples), "\n")
#> MLE sampler sd: 0.1254
cat("Theoretical sampler sd:", sd(dist_samples), "\n")
#> Theoretical sampler sd: 0.1452The MLE sampler is centered at (the estimate from our data), while the theoretical distribution is centered at the true . The gap between these centers reflects sampling variability — our particular data set yielded a rate estimate somewhat different from the truth. The standard deviations are comparable, with the MLE-based SE slightly smaller because it uses the estimated (smaller) rate rather than the true rate in the variance formula .
Distribution objects from algebraic.dist also support
computing exact expectations via numerical integration, which we can
compare to our Monte Carlo estimates:
# Exact E[X^2] for the asymptotic distribution
exact_e_x2 <- expectation(asymp_dist, function(x) x^2)
cat("Exact E[lambda_hat^2]:", exact_e_x2, "\n")
#> Exact E[lambda_hat^2]: 4.02
# Compare to Monte Carlo estimate from the MLE
set.seed(42)
mc_e_x2 <- expectation(mle_exp, function(p) p[1]^2,
control = list(n = 50000L))
cat("Monte Carlo E[lambda_hat^2]:", mc_e_x2, "\n")
#> Monte Carlo E[lambda_hat^2]: 2.998The exact (from the theoretical distribution centered at ) differs noticeably from the Monte Carlo estimate based on the MLE. This is not an error — they are expectations under different distributions. The theoretical distribution uses the true parameter, while the MLE-based distribution is centered at . This distinction is important in practice: the MLE-based quantities are what you can compute from data alone, while the theoretical ones require knowing the true parameter (which is usually unknown).
The fisher_mle objects returned by
likelihood.model are fully compatible with the
algebraic.mle interface:
| Generic | What it does on fisher_mle
|
|---|---|
params() |
Parameter estimates (same as coef()) |
nparams() |
Number of parameters |
se() |
Standard errors |
vcov() |
Variance-covariance matrix |
observed_fim() |
Observed Fisher information () |
bias() |
Asymptotic bias (zero) |
mse() |
Mean squared error (equals vcov asymptotically) |
aic() |
Akaike information criterion |
confint() |
Wald confidence intervals |
rmap() |
Delta method parameter transformations |
marginal() |
Marginal sampling distributions |
expectation() |
Monte Carlo expectations over sampling distribution |
sampler() |
Asymptotic or bootstrap sampling function |