Installation

You can install the development version of likelihood.model from GitHub with:

if (!require(devtools)) {
    install.packages("devtools")
}
devtools::install_github("queelius/likelihood.model")

Exponential series system

Consider a series system with mm components, where the ppth component in the ii-th system has lifetime TipT_{i p} with rate λp\lambda_p, and TijT_{i j} for all i,ji,j are independent. The system lifetime is the minimum of the component lifetimes, i.e., Ti=min{Ti1,,Tim}T_i = \min\{T_{i 1}, \ldots, T_{i m}\}.

Let’s draw a sample for this model for m=3m=3 and λ=(1.1,1.2,1.3)\lambda = (1.1, 1.2, 1.3) for a sample size of n=150n=150.

n <- 150
rates <- c(1.1, 1.2, 1.3)
set.seed(1235)

df <- data.frame(
    t1 = rexp(n, rates[1]),
    t2 = rexp(n, rates[2]),
    t3 = rexp(n, rates[3])
)
df$t <- apply(df, 1, min)

# map each observation to the corresponding index (component) that was minimum
df$k <- apply(df, 1, which.min)

for (i in 1:n) {
    for (p in 1:3) {
        if (p == df$k[i]) next
        df[i, p] <- NA
    }
}
head(df)
#>      t1    t2   t3     t k
#> 1    NA 0.320   NA 0.320 2
#> 2    NA 0.045   NA 0.045 2
#> 3    NA 0.049   NA 0.049 2
#> 4 0.083    NA   NA 0.083 1
#> 5    NA    NA 0.14 0.141 3
#> 6 0.035    NA   NA 0.035 1

Column tkt_k is the failure time of the kk-th component, column tt is the column for the series system lifetime, the same as tkt_k, and column kk is the component that caused the system failure.

We see that the component cause of failure is observed and the other component lifetimes are not observed (NA).

In what follows, we consider two cases of masking or censoring, or otherwise incomplete data of some form.

  1. We know the system lifetime and the component cause of failure. (Columns tt and kk are observed.

  2. We observe a system lifetime with a right-censoring mechanism and the component cause of failure.

Case 1: Observed system failure time and component cause of failure

Suppose we have a series system. As a series system, whenever a component fails, the system fails, and thus the system lifetime is equal to the minimum of the component lifetimes.

Suppose the component lifetimes are exponentially distributed with rate λk\lambda_k for component kk.

In this case, we assume we can observe the system lifetime and the component that caused the system failure, which means we know a few things about each system in the sample:

  1. The component cause of failure KiK_i for each system ii.
  2. The system failure time TiT_i for each system ii.
  3. The component failure time TiKiT_{i K_i} for each system ii.
  4. Each of the non-failed component failure times TipT_{i p} for each system ii and each component pKip \neq K_i survived at least until TiKiT_{i \, K_i}.

We might be tempted to think that we can just consider each observation, where we know the system lifetime and the component cause, and then just estimate the component’s parameter based on its failure time. However, this is incorrect, because when we condition on the component that caused the system failure, we are not observing a random sample of that component’s lifetime, but a conditional sample. For instance, if k=1k = 1, then we are observing Ti1|Ti1<Ti2 and Ti1<Ti3T_{i 1} | T_{i 1} < T_{i 2} \text{ and } T_{i 1} < T_{i 3}.

What is this distribution? We can derive it as: fTi|Ki(ti|ki)=fTi,Ki(ti,ki)/fKi(ki)=fKi|Ti(ki|ti)fTi(ti)/fKi(ki). f_{T_i|K_i}(t_i | k_i) = f_{T_i,K_i}(t_i,k_i) / f_{K_i}(k_i) = f_{K_i|T_i}(k_i|t_i) f_{T_i}(t_i) / f_{K_i}(k_i). By the memoryless property of the exponential distribution, we have fKi|Ti(ki|ti)=fKi(ki), f_{K_i|T_i}(k_i|t_i) = f_{K_i}(k_i), since the failure rates are constant (and thus independent of the time), and thus the probability that a componet is the cause of a system failure is independent of the time of the system failure.

That means that fTi|Ki(ti|ki)=fTi(ti), f_{T_i|K_i}(t_i | k_i) = f_{T_i}(t_i), which in the exponential case is just the pdf of the series system. Since the minimum of exponentially distributed random variables is exponentially distributed with a falure rate equal to the sum of the failure rates of the components, this estimator is an estimator of the sum of the failure rates (or the failure rate of the series system).

Instead, we just consider this to be a kind of right-censoring problem, where we have m=3m=3 components, and thus when component 11 is the component cause of failure at tit_i, that means we observe an exact failure time Ti1=tiT_{i 1} = t_i and right-censor the other two components at tit_i. The loglikehood contribution for this observation is Li=logf1(ti|λ1)+logR2(ti|λ2)+logR3(ti|λ3), L_i = \log f_1(t_i|\lambda_1) + \log R_2(t_i|\lambda_2) + \log R_3(t_i|\lambda_3), which simplifies to Li=logλ1(λ1+λ2+λ3)ti, L_i = \log \lambda_1 - (\lambda_1 + \lambda_2 + \lambda_3) t_i, which has a score si=1λ1[100]ti[111] s_i = \frac{1}{\lambda_1} \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} - t_i \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} and a Hessian Hi=1λ12[100000000]. H_i = -\frac{1}{\lambda_1^2} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}.

To get the total log-likelihood, score, and Hessian, we just sum them up (by i.i.d.).

In fact, that’s what we do here. We show how to use the likelihood contribution model. We show two ways to configure the model to dispatch the loglik and score functions for the observation type.

We implement methods for the loglik and score functions. This allows dispatching based on the observation type, as defined in obs_type.

score.observed <- function(df, rates, ...) {
  rep(-sum(df$t), length(rates)) + (df |> dplyr::count(k))$n / rates
}

loglik.observed <- function(df, rates, ...) {
  sum(log(rates[df$k])) - sum(rates) * sum(df$t)
}

model.observed <- likelihood_contr_model$new(
  obs_type = function(df) { "observed" }
)

Now that we have the likelihood model, we can use it to estimate the parameters. We use the optim function to do the numerical optimization. We use the hessian = TRUE argument to get the Hessian matrix, which we can use to get the standard errors of the estimates.

optim(rates, fn = loglik(model.observed), df=df, control=list(fnscale=-1))
#> $par
#> [1] 1.4 1.3 1.2
#> 
#> $value
#> [1] -112
#> 
#> $counts
#> function gradient 
#>       66       NA 
#> 
#> $convergence
#> [1] 0
#> 
#> $message
#> NULL

The estimates are close to the true rates λ=(1.1,1.2,1.3)\lambda = (1.1, 1.2, 1.3), and convergence code 0 indicates successful optimization.

Note that likelihood_model objects have a default method for fitting MLEs, fit. Internally, this method uses optim to maximize the log-likelihood, but it packs in some additional information and returns the result as a fisher_mle object.

rates.hat <- fit(model.observed)(df, par = rates)
summary(rates.hat)
#> Maximum Likelihood Estimate (Fisherian)
#> ----------------------------------------
#> 
#> Coefficients:
#>      Estimate Std. Error   2.5% 97.5%
#> [1,]   1.3669     0.1878 0.9989 1.735
#> [2,]   1.2639     0.1806 0.9100 1.618
#> [3,]   1.2383     0.1787 0.8880 1.589
#> 
#> Log-likelihood: -111.7 
#> AIC: 229.4 
#> Number of observations: 150

The 95% confidence intervals all contain the true parameter values, and the standard errors are small relative to the estimates, reflecting the information in n=150n=150 observations.

We implement hess_loglik.observed for completeness.

hess_loglik.observed <- function(df, rates, ...) {
  p <- length(rates)
  H <- matrix(0, p, p)
  counts <- df |> dplyr::count(k)
  for (j in 1:p) {
    H[j, j] <- -counts$n[j] / rates[j]^2
  }
  H
}
# Observed information from MLE (negative Hessian)
cat("Observed information matrix (from MLE):\n")
#> Observed information matrix (from MLE):
print(-rates.hat$hessian)
#>          [,1]     [,2]     [,3]
#> [1,]  2.8e+01 -9.9e-11 -1.5e-09
#> [2,] -9.9e-11  3.1e+01  1.0e-09
#> [3,] -1.5e-09  1.0e-09  3.1e+01

# Computed directly via hess_loglik
cat("\nHessian at MLE:\n")
#> 
#> Hessian at MLE:
print(hess_loglik(model.observed)(df, coef(rates.hat)))
#>      [,1] [,2] [,3]
#> [1,]  -28    0    0
#> [2,]    0  -31    0
#> [3,]    0    0  -31

The two matrices agree, confirming that our analytical hess_loglik.observed implementation is correct.

Now, we use the Bootstrap method to estimate the sampling distribution of the MLE.

model.samp <- sampler(model.observed, df = df, par = rates)
boot_result <- model.samp(n = 500)

cat("Bootstrap MLE:\n")
#> Bootstrap MLE:
print(coef(boot_result))
#> [1] 1.4 1.3 1.2

cat("\nBootstrap standard errors:\n")
#> 
#> Bootstrap standard errors:
print(se(boot_result))
#> [1] 0.19 0.18 0.17

cat("\nBootstrap bias:\n")
#> 
#> Bootstrap bias:
print(bias(boot_result))
#> [1] 0.0045 0.0057 0.0039

cat("\nBootstrap covariance:\n")
#> 
#> Bootstrap covariance:
print(vcov(boot_result))
#>         [,1]    [,2]    [,3]
#> [1,]  0.0364  0.0003 -0.0025
#> [2,]  0.0003  0.0326 -0.0031
#> [3,] -0.0025 -0.0031  0.0292

The bootstrap standard errors should be close to the asymptotic standard errors from the Fisher information matrix, and the bootstrap bias estimates should be negligible (all under 0.01), confirming that the MLE is nearly unbiased for this sample size.

Case 2: Right-censoring

We add right-censoring to the model. We assume that the right-censoring times are independent of the component lifetimes and the candidate sets.

In particular, we consider the data described earlier, but with right-censoring at time τ=0.5\tau = 0.5. The presence of this right-censoring means that we know the event occurred after time τ\tau for these censored observations, but we do not know exactly when (and it may still be ongoing).

As a first stab, you might be inclined to just ignore the censored data. However, this is not a good idea, because it will lead to biased estimates. This is problematic because then we are estimating the parameters of a truncated distribution, Ti|Ti<τT_i | T_i < \tau, rather than the original distribution, Si,τiS_i, \tau_i where Si=TiS_i = T_i if Ti<τiT_i < \tau_i and otherwise Si=τiS_i = \tau_i. This is a well-known problem in statistics, and it is why we need to use survival analysis techniques to handle right-censored data.

This is similiar to the earlier problem we described where we said it was a mistake to only focus on the component cause of failure and estimate the parameters of the components independently. In that case, we would be estimating component jj’s parameters from the truncated distribution, Tij|Ki=jT_{i j} | K_i = j, rather than the original distribution, Ti=min{Ti1,,Tim}T_i = \min\{T_{i 1}, \ldots, T_{i m}\}. In either case, the censoring or masking mechanisms must not be ignored, otherwise we will get biased estimates.

Here’s what that right-censored data looks like, with right-censoring time of τ=0.5\tau = 0.5.

# if df$t > .5, then the system was right-censored
df.censored <- df |> mutate(
    censored = t > .5,
    t = ifelse(censored, .5, t)
)

df.censored[df.censored[, "censored"], "k"] <- NA
df.censored[df.censored[, "censored"], paste0("t", 1:3)] <- rep(NA, 3)
head(df.censored)
#>      t1    t2   t3     t k censored
#> 1    NA 0.320   NA 0.320 2    FALSE
#> 2    NA 0.045   NA 0.045 2    FALSE
#> 3    NA 0.049   NA 0.049 2    FALSE
#> 4 0.083    NA   NA 0.083 1    FALSE
#> 5    NA    NA 0.14 0.141 3    FALSE
#> 6 0.035    NA   NA 0.035 1    FALSE

We construct their censoring functions with:

loglik.censored <- function(df, rates) {
    -sum(rates) * sum(df$t)
}
score.censored <- function(df, rates) {
    rep(-sum(df$t), length(rates))
}
hess_loglik_censored <- function(df, rates) {
    p <- length(rates)
    matrix(0, p, p)
}

model.censored <- likelihood_contr_model$new(
    obs_type = function(df) {
        ifelse(df$censored, "censored", "observed")
    }
)

mle.censored <- fit(model.censored)(df.censored, par = rates)
summary(mle.censored)
#> Maximum Likelihood Estimate (Fisherian)
#> ----------------------------------------
#> 
#> Coefficients:
#>      Estimate Std. Error   2.5% 97.5%
#> [1,]   1.3242     0.1996 0.9329 1.715
#> [2,]   1.2639     0.1950 0.8816 1.646
#> [3,]   1.2037     0.1903 0.8306 1.577
#> 
#> Log-likelihood: -96.38 
#> AIC: 198.8 
#> Number of observations: 150

So, recall the true parameters are λ=(1.1,1.2,1.3)\lambda = (1.1, 1.2, 1.3)'. We can see that the estimates are reasonable, particularly for the relatively small sample size.

Inference on censored estimates

Let’s examine the censored estimates more closely and compare them to the uncensored case. Censoring reduces information, so we expect wider confidence intervals:

cat("Uncensored estimates:\n")
#> Uncensored estimates:
cat("  Estimates:", coef(rates.hat), "\n")
#>   Estimates: 1.4 1.3 1.2
cat("  Std errors:", se(rates.hat), "\n")
#>   Std errors: 0.19 0.18 0.18

cat("\nCensored estimates:\n")
#> 
#> Censored estimates:
cat("  Estimates:", coef(mle.censored), "\n")
#>   Estimates: 1.3 1.3 1.2
cat("  Std errors:", se(mle.censored), "\n")
#>   Std errors: 0.2 0.2 0.19

cat("\n95% Confidence Intervals (uncensored):\n")
#> 
#> 95% Confidence Intervals (uncensored):
print(confint(rates.hat))
#>      2.5% 97.5%
#> [1,] 1.00   1.7
#> [2,] 0.91   1.6
#> [3,] 0.89   1.6

cat("\n95% Confidence Intervals (censored):\n")
#> 
#> 95% Confidence Intervals (censored):
print(confint(mle.censored))
#>      2.5% 97.5%
#> [1,] 0.93   1.7
#> [2,] 0.88   1.6
#> [3,] 0.83   1.6

cat("\nTrue parameters:", rates, "\n")
#> 
#> True parameters: 1.1 1.2 1.3

As expected, censoring inflates the standard errors and widens the confidence intervals because the censored observations carry less information than exact ones. Both sets of intervals still contain the true parameter values.

We can also use Fisherian likelihood inference to examine the strength of evidence for the censored estimates:

# Support at MLE (should be 0)
s_at_mle <- support(mle.censored, coef(mle.censored), df.censored,
                    model.censored)
cat("Support at MLE:", s_at_mle, "\n")
#> Support at MLE: 0

# Evidence for MLE vs true parameters
ev <- evidence(model.censored, df.censored, coef(mle.censored), rates)
cat("Evidence (MLE vs true):", ev, "\n")
#> Evidence (MLE vs true): 0.89

# Likelihood interval for rate 1 (first component)
li <- likelihood_interval(mle.censored, df.censored, model.censored,
                          k = 8, param = 1)
cat("\n1/8 likelihood interval for lambda_1:\n")
#> 
#> 1/8 likelihood interval for lambda_1:
print(li)
#> 1/8 Likelihood Interval (R >= 0.125)
#> -----------------------------------
#>      lower upper
#> [1,]  0.96   1.8
#> attr(,"k")
#> [1] 8
#> attr(,"relative_likelihood_cutoff")
#> [1] 0.12

The evidence value near 0 means the MLE and true parameters are about equally well-supported by the data, which is expected when the MLE is close to the truth. The 1/81/8 likelihood interval for λ1\lambda_1 provides a plausible range for the parameter without making probability statements — it contains all parameter values whose likelihood is at least 1/81/8 of the maximum.

Case 3: Single-parameter model (exponential rate)

The likelihood_contr_model also works well with single-parameter models. Here we estimate a single exponential rate from data where some observations are exact and others are right-censored, using separate contribution types for each.

This example also demonstrates that the score function works correctly with scalar parameters (a case that requires careful handling of R’s dimension-dropping behavior).

# Define contribution types for a single-rate exponential
loglik.exp_exact <- function(df, par, ...) {
  lambda <- par[1]
  sum(log(lambda) - lambda * df$t)
}

loglik.exp_right <- function(df, par, ...) {
  lambda <- par[1]
  -lambda * sum(df$t)
}

score.exp_exact <- function(df, par, ...) {
  lambda <- par[1]
  nrow(df) / lambda - sum(df$t)
}

score.exp_right <- function(df, par, ...) {
  lambda <- par[1]
  -sum(df$t)
}

# Simulate right-censored exponential data
set.seed(42)
lambda_true <- 2.0
n_single <- 200
censor_time_single <- 0.4

raw_lifetimes <- rexp(n_single, lambda_true)
df_single <- data.frame(
  t = pmin(raw_lifetimes, censor_time_single),
  type = ifelse(raw_lifetimes > censor_time_single, "exp_right", "exp_exact")
)

cat("Exact observations:", sum(df_single$type == "exp_exact"), "\n")
#> Exact observations: 106
cat("Right-censored:", sum(df_single$type == "exp_right"), "\n")
#> Right-censored: 94

model_single <- likelihood_contr_model$new(
  obs_type = function(df) df$type,
  assumptions = c("exponential lifetimes", "non-informative right censoring")
)

# Fit and verify
mle_single <- fit(model_single)(df_single, par = c(lambda = 1))
cat("\nMLE:", coef(mle_single), "(true:", lambda_true, ")\n")
#> 
#> MLE: 1.8 (true: 2 )

# Closed-form MLE for comparison: d / T
d <- sum(df_single$type == "exp_exact")
total_T <- sum(df_single$t)
cat("Closed-form MLE:", d / total_T, "\n")
#> Closed-form MLE: 1.8
cat("95% CI:", confint(mle_single)[1, ], "\n")
#> 95% CI: 1.5 2.2

The optim-based MLE matches the closed-form estimate λ̂=d/T\hat\lambda = d/T (number of exact observations divided by total observation time), and the 95% confidence interval contains the true value λ=2\lambda = 2.

Best Practices

When to use which model type

Scenario Recommended approach
Standard distribution, possibly censored likelihood_name()
Heterogeneous observation types likelihood_contr_model
Known analytical derivatives Specialized model (e.g., weibull_uncensored)
Closed-form MLE exists Custom class with fit override (e.g., exponential_lifetime)
Standard distribution as a contribution Define loglik.<type> using likelihood_name internally

Providing analytical derivatives

Numerical differentiation (the default fallback) works but is slow — each gradient evaluation requires 2r×p2r \times p function evaluations (where r=6r=6 and pp is the number of parameters). When analytical derivatives are available, providing score.<type> and hess_loglik.<type> functions gives 10–100x speedups:

# Slow (default): numerical differentiation of loglik.my_type
loglik.my_type <- function(df, par, ...) { ... }

# Fast: provide analytical derivatives
score.my_type <- function(df, par, ...) { ... }
hess_loglik.my_type <- function(df, par, ...) { ... }

The likelihood_contr_model checks for these functions in order:

  1. Functions passed directly via logliks, scores, hess_logliks arguments
  2. Functions found by name in the global environment (loglik.<type>, etc.)
  3. Numerical differentiation of the log-likelihood (last resort)

Performance tip: the S3 cache

The S3 wrappers (loglik(model), score(model), hess_loglik(model)) cache the data frame split and resolved dispatcher functions. During optimization, optim calls the returned function hundreds of times with the same df but varying par — caching eliminates the repeated split() and obs_type() overhead. Always use the S3 interface for optimization:

# Good: cached S3 wrapper
ll <- loglik(model)
result <- optim(par, fn = ll, df = df, control = list(fnscale = -1))

# Also good: using fit() which does this internally
result <- fit(model)(df, par = par)

# Slower: calling R6 method directly in a loop (no caching)
# model$loglik(df, par) -- re-splits df every call