vignettes/likelihood-contributions.Rmd
likelihood-contributions.RmdYou can install the development version of
likelihood.model from GitHub with:
if (!require(devtools)) {
install.packages("devtools")
}
devtools::install_github("queelius/likelihood.model")Consider a series system with components, where the th component in the -th system has lifetime with rate , and for all are independent. The system lifetime is the minimum of the component lifetimes, i.e., .
Let’s draw a sample for this model for and for a sample size of .
n <- 150
rates <- c(1.1, 1.2, 1.3)
set.seed(1235)
df <- data.frame(
t1 = rexp(n, rates[1]),
t2 = rexp(n, rates[2]),
t3 = rexp(n, rates[3])
)
df$t <- apply(df, 1, min)
# map each observation to the corresponding index (component) that was minimum
df$k <- apply(df, 1, which.min)
for (i in 1:n) {
for (p in 1:3) {
if (p == df$k[i]) next
df[i, p] <- NA
}
}
head(df)
#> t1 t2 t3 t k
#> 1 NA 0.320 NA 0.320 2
#> 2 NA 0.045 NA 0.045 2
#> 3 NA 0.049 NA 0.049 2
#> 4 0.083 NA NA 0.083 1
#> 5 NA NA 0.14 0.141 3
#> 6 0.035 NA NA 0.035 1Column is the failure time of the -th component, column is the column for the series system lifetime, the same as , and column is the component that caused the system failure.
We see that the component cause of failure is observed and the other component lifetimes are not observed (NA).
In what follows, we consider two cases of masking or censoring, or otherwise incomplete data of some form.
We know the system lifetime and the component cause of failure. (Columns and are observed.
We observe a system lifetime with a right-censoring mechanism and the component cause of failure.
Suppose we have a series system. As a series system, whenever a component fails, the system fails, and thus the system lifetime is equal to the minimum of the component lifetimes.
Suppose the component lifetimes are exponentially distributed with rate for component .
In this case, we assume we can observe the system lifetime and the component that caused the system failure, which means we know a few things about each system in the sample:
We might be tempted to think that we can just consider each observation, where we know the system lifetime and the component cause, and then just estimate the component’s parameter based on its failure time. However, this is incorrect, because when we condition on the component that caused the system failure, we are not observing a random sample of that component’s lifetime, but a conditional sample. For instance, if , then we are observing .
What is this distribution? We can derive it as: By the memoryless property of the exponential distribution, we have since the failure rates are constant (and thus independent of the time), and thus the probability that a componet is the cause of a system failure is independent of the time of the system failure.
That means that which in the exponential case is just the pdf of the series system. Since the minimum of exponentially distributed random variables is exponentially distributed with a falure rate equal to the sum of the failure rates of the components, this estimator is an estimator of the sum of the failure rates (or the failure rate of the series system).
Instead, we just consider this to be a kind of right-censoring problem, where we have components, and thus when component is the component cause of failure at , that means we observe an exact failure time and right-censor the other two components at . The loglikehood contribution for this observation is which simplifies to which has a score and a Hessian
To get the total log-likelihood, score, and Hessian, we just sum them up (by i.i.d.).
In fact, that’s what we do here. We show how to use the likelihood
contribution model. We show two ways to configure the model to dispatch
the loglik and score functions for the
observation type.
We implement methods for the loglik and
score functions. This allows dispatching based on the
observation type, as defined in obs_type.
score.observed <- function(df, rates, ...) {
rep(-sum(df$t), length(rates)) + (df |> dplyr::count(k))$n / rates
}
loglik.observed <- function(df, rates, ...) {
sum(log(rates[df$k])) - sum(rates) * sum(df$t)
}
model.observed <- likelihood_contr_model$new(
obs_type = function(df) { "observed" }
)Now that we have the likelihood model, we can use it to estimate the
parameters. We use the optim function to do the numerical
optimization. We use the hessian = TRUE argument to get the
Hessian matrix, which we can use to get the standard errors of the
estimates.
optim(rates, fn = loglik(model.observed), df=df, control=list(fnscale=-1))
#> $par
#> [1] 1.4 1.3 1.2
#>
#> $value
#> [1] -112
#>
#> $counts
#> function gradient
#> 66 NA
#>
#> $convergence
#> [1] 0
#>
#> $message
#> NULLThe estimates are close to the true rates , and convergence code 0 indicates successful optimization.
Note that likelihood_model objects have a default method
for fitting MLEs, fit. Internally, this method uses
optim to maximize the log-likelihood, but it packs in some
additional information and returns the result as a
fisher_mle object.
rates.hat <- fit(model.observed)(df, par = rates)
summary(rates.hat)
#> Maximum Likelihood Estimate (Fisherian)
#> ----------------------------------------
#>
#> Coefficients:
#> Estimate Std. Error 2.5% 97.5%
#> [1,] 1.3669 0.1878 0.9989 1.735
#> [2,] 1.2639 0.1806 0.9100 1.618
#> [3,] 1.2383 0.1787 0.8880 1.589
#>
#> Log-likelihood: -111.7
#> AIC: 229.4
#> Number of observations: 150The 95% confidence intervals all contain the true parameter values, and the standard errors are small relative to the estimates, reflecting the information in observations.
We implement hess_loglik.observed for completeness.
hess_loglik.observed <- function(df, rates, ...) {
p <- length(rates)
H <- matrix(0, p, p)
counts <- df |> dplyr::count(k)
for (j in 1:p) {
H[j, j] <- -counts$n[j] / rates[j]^2
}
H
}
# Observed information from MLE (negative Hessian)
cat("Observed information matrix (from MLE):\n")
#> Observed information matrix (from MLE):
print(-rates.hat$hessian)
#> [,1] [,2] [,3]
#> [1,] 2.8e+01 -9.9e-11 -1.5e-09
#> [2,] -9.9e-11 3.1e+01 1.0e-09
#> [3,] -1.5e-09 1.0e-09 3.1e+01
# Computed directly via hess_loglik
cat("\nHessian at MLE:\n")
#>
#> Hessian at MLE:
print(hess_loglik(model.observed)(df, coef(rates.hat)))
#> [,1] [,2] [,3]
#> [1,] -28 0 0
#> [2,] 0 -31 0
#> [3,] 0 0 -31The two matrices agree, confirming that our analytical
hess_loglik.observed implementation is correct.
Now, we use the Bootstrap method to estimate the sampling distribution of the MLE.
model.samp <- sampler(model.observed, df = df, par = rates)
boot_result <- model.samp(n = 500)
cat("Bootstrap MLE:\n")
#> Bootstrap MLE:
print(coef(boot_result))
#> [1] 1.4 1.3 1.2
cat("\nBootstrap standard errors:\n")
#>
#> Bootstrap standard errors:
print(se(boot_result))
#> [1] 0.19 0.18 0.17
cat("\nBootstrap bias:\n")
#>
#> Bootstrap bias:
print(bias(boot_result))
#> [1] 0.0045 0.0057 0.0039
cat("\nBootstrap covariance:\n")
#>
#> Bootstrap covariance:
print(vcov(boot_result))
#> [,1] [,2] [,3]
#> [1,] 0.0364 0.0003 -0.0025
#> [2,] 0.0003 0.0326 -0.0031
#> [3,] -0.0025 -0.0031 0.0292The bootstrap standard errors should be close to the asymptotic standard errors from the Fisher information matrix, and the bootstrap bias estimates should be negligible (all under 0.01), confirming that the MLE is nearly unbiased for this sample size.
We add right-censoring to the model. We assume that the right-censoring times are independent of the component lifetimes and the candidate sets.
In particular, we consider the data described earlier, but with right-censoring at time . The presence of this right-censoring means that we know the event occurred after time for these censored observations, but we do not know exactly when (and it may still be ongoing).
As a first stab, you might be inclined to just ignore the censored data. However, this is not a good idea, because it will lead to biased estimates. This is problematic because then we are estimating the parameters of a truncated distribution, , rather than the original distribution, where if and otherwise . This is a well-known problem in statistics, and it is why we need to use survival analysis techniques to handle right-censored data.
This is similiar to the earlier problem we described where we said it was a mistake to only focus on the component cause of failure and estimate the parameters of the components independently. In that case, we would be estimating component ’s parameters from the truncated distribution, , rather than the original distribution, . In either case, the censoring or masking mechanisms must not be ignored, otherwise we will get biased estimates.
Here’s what that right-censored data looks like, with right-censoring time of .
# if df$t > .5, then the system was right-censored
df.censored <- df |> mutate(
censored = t > .5,
t = ifelse(censored, .5, t)
)
df.censored[df.censored[, "censored"], "k"] <- NA
df.censored[df.censored[, "censored"], paste0("t", 1:3)] <- rep(NA, 3)
head(df.censored)
#> t1 t2 t3 t k censored
#> 1 NA 0.320 NA 0.320 2 FALSE
#> 2 NA 0.045 NA 0.045 2 FALSE
#> 3 NA 0.049 NA 0.049 2 FALSE
#> 4 0.083 NA NA 0.083 1 FALSE
#> 5 NA NA 0.14 0.141 3 FALSE
#> 6 0.035 NA NA 0.035 1 FALSEWe construct their censoring functions with:
loglik.censored <- function(df, rates) {
-sum(rates) * sum(df$t)
}
score.censored <- function(df, rates) {
rep(-sum(df$t), length(rates))
}
hess_loglik_censored <- function(df, rates) {
p <- length(rates)
matrix(0, p, p)
}
model.censored <- likelihood_contr_model$new(
obs_type = function(df) {
ifelse(df$censored, "censored", "observed")
}
)
mle.censored <- fit(model.censored)(df.censored, par = rates)
summary(mle.censored)
#> Maximum Likelihood Estimate (Fisherian)
#> ----------------------------------------
#>
#> Coefficients:
#> Estimate Std. Error 2.5% 97.5%
#> [1,] 1.3242 0.1996 0.9329 1.715
#> [2,] 1.2639 0.1950 0.8816 1.646
#> [3,] 1.2037 0.1903 0.8306 1.577
#>
#> Log-likelihood: -96.38
#> AIC: 198.8
#> Number of observations: 150So, recall the true parameters are . We can see that the estimates are reasonable, particularly for the relatively small sample size.
Let’s examine the censored estimates more closely and compare them to the uncensored case. Censoring reduces information, so we expect wider confidence intervals:
cat("Uncensored estimates:\n")
#> Uncensored estimates:
cat(" Estimates:", coef(rates.hat), "\n")
#> Estimates: 1.4 1.3 1.2
cat(" Std errors:", se(rates.hat), "\n")
#> Std errors: 0.19 0.18 0.18
cat("\nCensored estimates:\n")
#>
#> Censored estimates:
cat(" Estimates:", coef(mle.censored), "\n")
#> Estimates: 1.3 1.3 1.2
cat(" Std errors:", se(mle.censored), "\n")
#> Std errors: 0.2 0.2 0.19
cat("\n95% Confidence Intervals (uncensored):\n")
#>
#> 95% Confidence Intervals (uncensored):
print(confint(rates.hat))
#> 2.5% 97.5%
#> [1,] 1.00 1.7
#> [2,] 0.91 1.6
#> [3,] 0.89 1.6
cat("\n95% Confidence Intervals (censored):\n")
#>
#> 95% Confidence Intervals (censored):
print(confint(mle.censored))
#> 2.5% 97.5%
#> [1,] 0.93 1.7
#> [2,] 0.88 1.6
#> [3,] 0.83 1.6
cat("\nTrue parameters:", rates, "\n")
#>
#> True parameters: 1.1 1.2 1.3As expected, censoring inflates the standard errors and widens the confidence intervals because the censored observations carry less information than exact ones. Both sets of intervals still contain the true parameter values.
We can also use Fisherian likelihood inference to examine the strength of evidence for the censored estimates:
# Support at MLE (should be 0)
s_at_mle <- support(mle.censored, coef(mle.censored), df.censored,
model.censored)
cat("Support at MLE:", s_at_mle, "\n")
#> Support at MLE: 0
# Evidence for MLE vs true parameters
ev <- evidence(model.censored, df.censored, coef(mle.censored), rates)
cat("Evidence (MLE vs true):", ev, "\n")
#> Evidence (MLE vs true): 0.89
# Likelihood interval for rate 1 (first component)
li <- likelihood_interval(mle.censored, df.censored, model.censored,
k = 8, param = 1)
cat("\n1/8 likelihood interval for lambda_1:\n")
#>
#> 1/8 likelihood interval for lambda_1:
print(li)
#> 1/8 Likelihood Interval (R >= 0.125)
#> -----------------------------------
#> lower upper
#> [1,] 0.96 1.8
#> attr(,"k")
#> [1] 8
#> attr(,"relative_likelihood_cutoff")
#> [1] 0.12The evidence value near 0 means the MLE and true parameters are about equally well-supported by the data, which is expected when the MLE is close to the truth. The likelihood interval for provides a plausible range for the parameter without making probability statements — it contains all parameter values whose likelihood is at least of the maximum.
The likelihood_contr_model also works well with
single-parameter models. Here we estimate a single exponential rate from
data where some observations are exact and others are right-censored,
using separate contribution types for each.
This example also demonstrates that the score function
works correctly with scalar parameters (a case that requires careful
handling of R’s dimension-dropping behavior).
# Define contribution types for a single-rate exponential
loglik.exp_exact <- function(df, par, ...) {
lambda <- par[1]
sum(log(lambda) - lambda * df$t)
}
loglik.exp_right <- function(df, par, ...) {
lambda <- par[1]
-lambda * sum(df$t)
}
score.exp_exact <- function(df, par, ...) {
lambda <- par[1]
nrow(df) / lambda - sum(df$t)
}
score.exp_right <- function(df, par, ...) {
lambda <- par[1]
-sum(df$t)
}
# Simulate right-censored exponential data
set.seed(42)
lambda_true <- 2.0
n_single <- 200
censor_time_single <- 0.4
raw_lifetimes <- rexp(n_single, lambda_true)
df_single <- data.frame(
t = pmin(raw_lifetimes, censor_time_single),
type = ifelse(raw_lifetimes > censor_time_single, "exp_right", "exp_exact")
)
cat("Exact observations:", sum(df_single$type == "exp_exact"), "\n")
#> Exact observations: 106
cat("Right-censored:", sum(df_single$type == "exp_right"), "\n")
#> Right-censored: 94
model_single <- likelihood_contr_model$new(
obs_type = function(df) df$type,
assumptions = c("exponential lifetimes", "non-informative right censoring")
)
# Fit and verify
mle_single <- fit(model_single)(df_single, par = c(lambda = 1))
cat("\nMLE:", coef(mle_single), "(true:", lambda_true, ")\n")
#>
#> MLE: 1.8 (true: 2 )
# Closed-form MLE for comparison: d / T
d <- sum(df_single$type == "exp_exact")
total_T <- sum(df_single$t)
cat("Closed-form MLE:", d / total_T, "\n")
#> Closed-form MLE: 1.8
cat("95% CI:", confint(mle_single)[1, ], "\n")
#> 95% CI: 1.5 2.2The optim-based MLE matches the closed-form estimate
(number of exact observations divided by total observation time), and
the 95% confidence interval contains the true value
.
| Scenario | Recommended approach |
|---|---|
| Standard distribution, possibly censored | likelihood_name() |
| Heterogeneous observation types | likelihood_contr_model |
| Known analytical derivatives | Specialized model (e.g., weibull_uncensored) |
| Closed-form MLE exists | Custom class with fit override (e.g.,
exponential_lifetime) |
| Standard distribution as a contribution | Define loglik.<type> using
likelihood_name internally |
Numerical differentiation (the default fallback) works but is slow —
each gradient evaluation requires
function evaluations (where
and
is the number of parameters). When analytical derivatives are available,
providing score.<type> and
hess_loglik.<type> functions gives 10–100x
speedups:
# Slow (default): numerical differentiation of loglik.my_type
loglik.my_type <- function(df, par, ...) { ... }
# Fast: provide analytical derivatives
score.my_type <- function(df, par, ...) { ... }
hess_loglik.my_type <- function(df, par, ...) { ... }The likelihood_contr_model checks for these functions in
order:
logliks,
scores, hess_logliks argumentsloglik.<type>, etc.)The S3 wrappers (loglik(model),
score(model), hess_loglik(model)) cache the
data frame split and resolved dispatcher functions. During optimization,
optim calls the returned function hundreds of times with
the same df but varying par — caching
eliminates the repeated split() and obs_type()
overhead. Always use the S3 interface for optimization:
# Good: cached S3 wrapper
ll <- loglik(model)
result <- optim(par, fn = ll, df = df, control = list(fnscale = -1))
# Also good: using fit() which does this internally
result <- fit(model)(df, par = par)
# Slower: calling R6 method directly in a loop (no caching)
# model$loglik(df, par) -- re-splits df every call