Skip to contents

Introduction

In reliability engineering, many systems are designed as series systems: the system functions only when all of its mm components function. When the system fails, the failure time is observed, but the component that caused the failure is often unknown or only partially identified. This situation — where the failed component is hidden behind incomplete diagnostic information — is called masked failure cause data.

This vignette develops a self-contained mathematical framework for estimating component lifetime parameters from masked series system data. Everything is derived from first principles; no external paper is required. The framework is then instantiated for three distribution families implemented in maskedcauses:

  1. Exponential (exp_series_md_c1_c2_c3) — constant failure rates
  2. Homogeneous Weibull (wei_series_homogeneous_md_c1_c2_c3) — shared shape, individual scales
  3. Heterogeneous Weibull (wei_series_md_c1_c2_c3) — individual shapes and scales

Each section pairs the mathematical derivation with live R code that demonstrates the corresponding package functions.

Series System Model

Definition

Consider a system of mm independent components. The lifetime of component jj in the ii-th system is TijT_{ij}, with reliability (survival) function Rj(t)=P(Tij>t)R_j(t) = P(T_{ij} > t), pdf fj(t)f_j(t), and hazard function hj(t)=fj(t)Rj(t). h_j(t) = \frac{f_j(t)}{R_j(t)}.

In a series configuration (“weakest link”), the system fails as soon as any component fails: Ti=min(Ti1,,Tim). T_i = \min(T_{i1}, \ldots, T_{im}).

Assumption (independence). Component lifetimes Ti1,,TimT_{i1}, \ldots, T_{im} are mutually independent given the parameter vector θ\theta.

System reliability

The system reliability is the product of component reliabilities: R(tθ)=P(Ti>tθ)=j=1mRj(tθj).\begin{equation} \label{eq:sys_reliability} R(t \mid \theta) = P(T_i > t \mid \theta) = \prod_{j=1}^m R_j(t \mid \theta_j). \end{equation}

Additive hazards

The system hazard is the sum of component hazards: h(tθ)=j=1mhj(tθj).\begin{equation} \label{eq:sys_hazard} h(t \mid \theta) = \sum_{j=1}^m h_j(t \mid \theta_j). \end{equation}

The additive hazard decomposition is the central structural result for series systems. It means that each component contributes independently to the instantaneous system failure risk at every time tt.

The system pdf follows immediately: f(tθ)=h(tθ)R(tθ)=[j=1mhj(t)]l=1mRl(t). f(t \mid \theta) = h(t \mid \theta) \cdot R(t \mid \theta) = \left[\sum_{j=1}^m h_j(t)\right] \prod_{l=1}^m R_l(t).

R code: Exponential example

We demonstrate with a 3-component exponential series system. Each component has a constant hazard (failure rate) λj\lambda_j:

model_exp <- exp_series_md_c1_c2_c3()

# Component failure rates
theta_exp <- c(1.0, 1.1, 0.95)
m <- length(theta_exp)

# Extract hazard closures
h1 <- component_hazard(model_exp, 1)
h2 <- component_hazard(model_exp, 2)
h3 <- component_hazard(model_exp, 3)

t_grid <- seq(0.01, 3, length.out = 200)
h_sys <- h1(t_grid, theta_exp) + h2(t_grid, theta_exp) + h3(t_grid, theta_exp)

cat("System hazard (constant):", h_sys[1], "\n")
#> System hazard (constant): 3.05
cat("Sum of component rates:  ", sum(theta_exp), "\n")
#> Sum of component rates:   3.05

For exponential components, the system hazard is constant and equals the sum of the individual rates — consistent with the additive hazard theorem.

Component Cause of Failure

Definition

Let KiK_i denote the index of the component that caused the ii-th system failure: Ki=argminj{1,,m}Tij. K_i = \arg\min_{j \in \{1,\ldots,m\}} T_{ij}.

KiK_i is a discrete random variable on {1,,m}\{1, \ldots, m\} that, in general, depends on the failure time TiT_i.

Conditional cause probability

The probability that component jj caused the system failure, given that the failure occurred at time tt, is: P(Ki=jTi=t,θ)=hj(tθj)l=1mhl(tθl)=hj(t)h(t).\begin{equation} \label{eq:cond_cause} P(K_i = j \mid T_i = t, \theta) = \frac{h_j(t \mid \theta_j)}{\sum_{l=1}^m h_l(t \mid \theta_l)} = \frac{h_j(t)}{h(t)}. \end{equation}

Intuition. At any instant tt, the component with the highest hazard is the most likely cause. The cause probability is the component’s share of the total instantaneous risk — a proportional allocation.

R code: Exponential vs Weibull cause probabilities

For exponential components, the cause probability is constant in time (because all hazards are constant). For heterogeneous Weibull components, the cause probability varies with time:

# Exponential: constant cause probabilities
ccp_exp <- conditional_cause_probability(model_exp)
probs_exp <- ccp_exp(t_grid, theta_exp)

# Heterogeneous Weibull: time-varying
theta_wei <- c(0.7, 200,   # DFR electronics
               1.0, 150,   # CFR seals
               2.0, 100)   # IFR bearing
model_wei <- wei_series_md_c1_c2_c3(
  shapes = theta_wei[seq(1, 6, 2)],
  scales = theta_wei[seq(2, 6, 2)]
)
ccp_wei <- conditional_cause_probability(model_wei)

t_wei <- seq(0.5, 120, length.out = 200)
probs_wei <- ccp_wei(t_wei, theta_wei)

par(mfrow = c(1, 2), mar = c(4, 4, 3, 1))

# Exponential
plot(t_grid, probs_exp[, 1], type = "l", col = "steelblue", lwd = 2,
     ylim = c(0, 0.6), xlab = "Time", ylab = "P(K = j | T = t)",
     main = "Exponential (constant)")
lines(t_grid, probs_exp[, 2], col = "forestgreen", lwd = 2, lty = 2)
lines(t_grid, probs_exp[, 3], col = "firebrick", lwd = 2, lty = 3)
legend("right", paste("Comp", 1:3),
       col = c("steelblue", "forestgreen", "firebrick"),
       lty = 1:3, lwd = 2, cex = 0.8)

# Heterogeneous Weibull
plot(t_wei, probs_wei[, 1], type = "l", col = "steelblue", lwd = 2,
     ylim = c(0, 1), xlab = "Time", ylab = "P(K = j | T = t)",
     main = "Heterogeneous Weibull (time-varying)")
lines(t_wei, probs_wei[, 2], col = "forestgreen", lwd = 2, lty = 2)
lines(t_wei, probs_wei[, 3], col = "firebrick", lwd = 2, lty = 3)
legend("right", c("DFR (k=0.7)", "CFR (k=1.0)", "IFR (k=2.0)"),
       col = c("steelblue", "forestgreen", "firebrick"),
       lty = 1:3, lwd = 2, cex = 0.8)

In the right panel, the DFR component (high infant-mortality hazard) dominates at early times, but the IFR component (wear-out) takes over as the system ages. This crossover cannot be captured by models that force a common shape parameter.

The Observational Model

In practice, we do not observe the latent component lifetimes (Ti1,,Tim)(T_{i1}, \ldots, T_{im}) directly. Instead, for each of nn systems we observe a triple (si,ωi,ci)(s_i, \omega_i, c_i):

  • sis_i: a time value (possibly censored)
  • ωi\omega_i: the observation type
  • ci{1,,m}c_i \subseteq \{1, \ldots, m\}: the candidate set of possibly-failed components

Four observation types

The package supports four types, each arising from a different monitoring scheme:

Type ωi\omega_i What we know Likelihood contribution
Exact "exact" System failed at sis_i R(si)hc(si)R(s_i) \cdot h_c(s_i)
Right-censored "right" System survived past sis_i R(si)R(s_i)
Left-censored "left" System failed before sis_i 0sihc(u)R(u)du\int_0^{s_i} h_c(u) R(u)\, du
Interval-censored "interval" Failure in (si,siupper)(s_i, s_i^{\mathrm{upper}}) sisiupperhc(u)R(u)du\int_{s_i}^{s_i^{\mathrm{upper}}} h_c(u) R(u)\, du

Here hc(t)=jcihj(t)h_c(t) = \sum_{j \in c_i} h_j(t) is the candidate-set hazard.

Masking

When |ci|=1|c_i| = 1, the failed component is known exactly. When |ci|>1|c_i| > 1, the cause of failure is masked: the diagnostic identifies a set of possible culprits but not the specific one.

Observe functors

The package provides composable observation functors for generating data under different monitoring schemes:

# Continuous monitoring with right-censoring at tau
obs_right <- observe_right_censor(tau = 5)
obs_right(3.2)   # failure before tau -> exact
#> $t
#> [1] 3.2
#> 
#> $omega
#> [1] "exact"
#> 
#> $t_upper
#> [1] NA
obs_right(7.1)   # survival past tau  -> right-censored
#> $t
#> [1] 5
#> 
#> $omega
#> [1] "right"
#> 
#> $t_upper
#> [1] NA

# Single inspection at tau (left-censoring)
obs_left <- observe_left_censor(tau = 5)
obs_left(3.2)    # failed before inspection -> left-censored
#> $t
#> [1] 5
#> 
#> $omega
#> [1] "left"
#> 
#> $t_upper
#> [1] NA
obs_left(7.1)    # surviving at inspection  -> right-censored
#> $t
#> [1] 5
#> 
#> $omega
#> [1] "right"
#> 
#> $t_upper
#> [1] NA

# Periodic inspections every delta time units
obs_periodic <- observe_periodic(delta = 1, tau = 5)
obs_periodic(3.2)  # failure in (3, 4) -> interval-censored
#> $t
#> [1] 3
#> 
#> $omega
#> [1] "interval"
#> 
#> $t_upper
#> [1] 4

# Mixture: random assignment to schemes
obs_mixed <- observe_mixture(
  observe_right_censor(tau = 5),
  observe_left_censor(tau = 3),
  observe_periodic(delta = 0.5, tau = 5),
  weights = c(0.5, 0.2, 0.3)
)

All rdata() methods accept an observe argument to generate data under a specified monitoring scheme:

gen_exp <- rdata(model_exp)

set.seed(42)
df_demo <- gen_exp(theta_exp, n = 500, p = 0.3,
                   observe = observe_mixture(
                     observe_right_censor(tau = 5),
                     observe_left_censor(tau = 3),
                     weights = c(0.7, 0.3)
                   ))
cat("Observation type counts:\n")
#> Observation type counts:
print(table(df_demo$omega))
#> 
#> exact  left 
#>   344   156

The C1–C2–C3 Likelihood

This is the core contribution: three conditions on the masking process that reduce the full joint likelihood to a tractable form that depends only on the lifetime parameters θ\theta.

The three conditions

C1 (containment). The candidate set always contains the true failed component: P(Kici)=1. P(K_i \in c_i) = 1. This is a minimal accuracy requirement on the diagnostic. Without C1, the candidate set could be misleading, and the likelihood would need to model diagnostic errors explicitly.

Example: An on-board diagnostic (OBD) system reports a set of possibly-failed modules. C1 requires that the truly failed module is always in this set — the OBD may include false positives but never a false negative.

C2 (symmetric masking). The masking probability is the same for all components in the candidate set: P(ciKi=j,Ti=ti,θ)=P(ciKi=j,Ti=ti,θ)for all j,jci. P(c_i \mid K_i = j, T_i = t_i, \theta) = P(c_i \mid K_i = j', T_i = t_i, \theta) \quad \text{for all } j, j' \in c_i. Intuitively, the diagnostic cannot distinguish among the candidates — if it could, it would narrow the candidate set.

Example: A line-replaceable unit (LRU) grouping replaces all components in a subsystem together. The grouping is determined by physical layout, not by which component failed, so C2 holds by construction.

C3 (parameter independence). The masking probabilities do not depend on the lifetime parameters θ\theta: P(ciKi=j,Ti=ti) is not a function of θ. P(c_i \mid K_i = j, T_i = t_i) \text{ is not a function of } \theta.

Example: Diagnostic accuracy depends on sensor calibration and technician skill, not on component failure rates. If the diagnostic equipment was calibrated before the study, C3 holds.

Deriving the likelihood

We derive the likelihood step by step. Consider observation ii with failure time tit_i and candidate set cic_i.

Step 1: Full joint density. The observed-data density includes a sum over the latent cause KiK_i: Li(θ)=j=1mfK,T(j,ti)P(ciKi=j,Ti=ti,θ). L_i(\theta) = \sum_{j=1}^m f_{K,T}(j, t_i) \cdot P(c_i \mid K_i = j, T_i = t_i, \theta).

Step 2: Apply C1. Since P(Kici)=1P(K_i \in c_i) = 1, contributions from jcij \notin c_i are zero: Li(θ)=jcifK,T(j,ti)P(ciKi=j,Ti=ti,θ). L_i(\theta) = \sum_{j \in c_i} f_{K,T}(j, t_i) \cdot P(c_i \mid K_i = j, T_i = t_i, \theta).

Step 3: Apply C2. Symmetric masking means P(ciKi=j,Ti=ti,θ)=βiP(c_i \mid K_i = j, T_i = t_i, \theta) = \beta_i for all jcij \in c_i: Li(θ)=βijcifK,T(j,ti). L_i(\theta) = \beta_i \sum_{j \in c_i} f_{K,T}(j, t_i).

Step 4: Apply C3. Since βi\beta_i does not depend on θ\theta, it is a constant with respect to the parameters and drops from the maximization: Li(θ)jcifK,T(j,ti)=jcihj(ti)R(ti)=R(ti)hc(ti). L_i(\theta) \propto \sum_{j \in c_i} f_{K,T}(j, t_i) = \sum_{j \in c_i} h_j(t_i) \cdot R(t_i) = R(t_i) \cdot h_c(t_i).

The exact-failure likelihood contribution is therefore: iexact(θ)=logR(tiθ)+loghc(tiθ),\begin{equation} \label{eq:exact_contrib} \ell_i^{\text{exact}}(\theta) = \log R(t_i \mid \theta) + \log h_c(t_i \mid \theta), \end{equation} where hc(t)=jcihj(t)h_c(t) = \sum_{j \in c_i} h_j(t) is the candidate-set hazard.

Censored observations

Right-censored. The system survived past tit_i. No failure was observed, so there is no candidate set and the contribution is simply: iright(θ)=logR(tiθ). \ell_i^{\text{right}}(\theta) = \log R(t_i \mid \theta).

Left-censored. The system was found failed at inspection time τi\tau_i, but the exact failure time is unknown. Applying C1–C3 to the integrated likelihood: ileft(θ)=log0τihc(u)R(u)du. \ell_i^{\text{left}}(\theta) = \log \int_0^{\tau_i} h_c(u) \, R(u) \, du.

Interval-censored. The failure occurred in (ai,bi)(a_i, b_i): iinterval(θ)=logaibihc(u)R(u)du. \ell_i^{\text{interval}}(\theta) = \log \int_{a_i}^{b_i} h_c(u) \, R(u) \, du.

Combined log-likelihood

The full log-likelihood decomposes as: (θ)=E+R+L+I,\begin{equation} \label{eq:full_loglik} \ell(\theta) = \ell_E + \ell_R + \ell_L + \ell_I, \end{equation} where E=i:ωi=exact[logR(ti)+loghci(ti)],R=i:ωi=rightlogR(ti),L=i:ωi=leftlog0τihci(u)R(u)du,I=i:ωi=intervallogaibihci(u)R(u)du.\begin{align} \ell_E &= \sum_{i : \omega_i = \text{exact}} \bigl[\log R(t_i) + \log h_{c_i}(t_i)\bigr], \\ \ell_R &= \sum_{i : \omega_i = \text{right}} \log R(t_i), \\ \ell_L &= \sum_{i : \omega_i = \text{left}} \log \int_0^{\tau_i} h_{c_i}(u) R(u) \, du, \\ \ell_I &= \sum_{i : \omega_i = \text{interval}} \log \int_{a_i}^{b_i} h_{c_i}(u) R(u) \, du. \end{align}

R code: Log-likelihood evaluation

The loglik() generic returns a closure that evaluates this log-likelihood for any model:

ll_exp <- loglik(model_exp)

# Evaluate at true parameters on the mixed-censoring demo data
ll_val <- ll_exp(df_demo, theta_exp)
cat("Log-likelihood at true theta:", round(ll_val, 4), "\n")
#> Log-likelihood at true theta: -345.3

# Perturb parameters and compare
theta_bad <- c(2, 2, 2)
cat("Log-likelihood at wrong theta:", round(ll_exp(df_demo, theta_bad), 4), "\n")
#> Log-likelihood at wrong theta: -470.8

Distribution Families

The C1–C2–C3 framework applies to any component lifetime distribution. The following table lists common families with their hazard, reliability, and pdf:

Family Parameters hj(t)h_j(t) Rj(t)R_j(t)
Exponential λj\lambda_j λj\lambda_j eλjte^{-\lambda_j t}
Weibull kj,βjk_j, \beta_j kjβj(tβj)kj1\frac{k_j}{\beta_j}\left(\frac{t}{\beta_j}\right)^{k_j-1} exp((tβj)kj)\exp\!\left(-\left(\frac{t}{\beta_j}\right)^{k_j}\right)
Pareto αj,xm,j\alpha_j, x_{m,j} αjt\frac{\alpha_j}{t} (xm,jt)αj\left(\frac{x_{m,j}}{t}\right)^{\alpha_j}
Log-normal μj,σj\mu_j, \sigma_j ϕ(zj)tσjΦ(zj)\frac{\phi(z_j)}{t\sigma_j \Phi(-z_j)} Φ(zj)\Phi(-z_j)
Gamma αj,βj\alpha_j, \beta_j fj(t)/Rj(t)f_j(t)/R_j(t) 1γ(αj,t/βj)Γ(αj)1 - \frac{\gamma(\alpha_j, t/\beta_j)}{\Gamma(\alpha_j)}

where zj=(logtμj)/σjz_j = (\log t - \mu_j)/\sigma_j. The package currently implements the first two families (Exponential and Weibull). The framework extends to any family by substituting the appropriate hjh_j and RjR_j into the general log-likelihood.

Worked Example: Exponential Components

The exponential model is the simplest instantiation: all hazards are constant, yielding closed-form expressions for every quantity.

Setup

theta <- c(1.0, 1.1, 0.95, 1.15, 1.1)
m <- length(theta)
model <- exp_series_md_c1_c2_c3()

cat("True rates:", theta, "\n")
#> True rates: 1 1.1 0.95 1.15 1.1
cat("System rate:", sum(theta), "\n")
#> System rate: 5.3

Specializing the likelihood

For exponential components with hj(t)=λjh_j(t) = \lambda_j and Rj(t)=eλjtR_j(t) = e^{-\lambda_j t}:

Exact:log(jciλj)(j=1mλj)ti,Right:λsysti,Left:logλc+log(1eλsysτi)logλsys,Interval:logλcλsysai+log(1eλsys(biai))logλsys,\begin{align} \text{Exact:} \quad & \log\!\left(\sum_{j \in c_i} \lambda_j\right) - \left(\sum_{j=1}^m \lambda_j\right) t_i, \\ \text{Right:} \quad & -\lambda_{\text{sys}} t_i, \\ \text{Left:} \quad & \log \lambda_c + \log(1 - e^{-\lambda_{\text{sys}} \tau_i}) - \log \lambda_{\text{sys}}, \\ \text{Interval:} \quad & \log \lambda_c - \lambda_{\text{sys}} a_i + \log(1 - e^{-\lambda_{\text{sys}}(b_i - a_i)}) - \log \lambda_{\text{sys}}, \end{align} where λc=jciλj\lambda_c = \sum_{j \in c_i} \lambda_j and λsys=jλj\lambda_{\text{sys}} = \sum_j \lambda_j.

All four observation types have closed-form log-likelihood, score, and Hessian — the exponential model is unique in this respect.

Data generation and fitting

gen <- rdata(model)
set.seed(7231)

df <- gen(theta, n = 2000, p = 0.3,
          observe = observe_right_censor(tau = 3))

cat("Observation types:\n")
#> Observation types:
print(table(df$omega))
#> 
#> exact 
#>  2000
solver <- fit(model)
theta0 <- rep(1, m)
estimate <- solver(df, par = theta0, method = "Nelder-Mead")
recovery <- data.frame(
  Component = 1:m,
  True = theta,
  MLE = estimate$par,
  SE = sqrt(diag(estimate$vcov)),
  Rel_Error_Pct = 100 * (estimate$par - theta) / theta
)

knitr::kable(recovery, digits = 4,
             caption = "Exponential MLE: parameter recovery",
             col.names = c("Component", "True", "MLE", "SE", "Rel. Error %"))
Exponential MLE: parameter recovery
Component True MLE SE Rel. Error %
1 1.00 0.9824 0.0753 -1.7590
2 1.10 0.9906 0.0713 -9.9493
3 0.95 0.9429 0.0721 -0.7451
4 1.15 1.1041 0.0760 -3.9901
5 1.10 1.1903 0.0789 8.2058

Score and Hessian verification

The analytical score should vanish at the MLE. We also verify it against numerical differentiation:

ll_fn <- loglik(model)
scr_fn <- score(model)
hess_fn <- hess_loglik(model)

scr_at_mle <- scr_fn(df, estimate$par)
cat("Score at MLE:", round(scr_at_mle, 4), "\n")
#> Score at MLE: -0.0494 0.1326 -0.0488 0.0388 0.1303

scr_num <- numDeriv::grad(function(th) ll_fn(df, th), estimate$par)
cat("Max |analytical - numerical| score:",
    formatC(max(abs(scr_at_mle - scr_num)), format = "e", digits = 2), "\n")
#> Max |analytical - numerical| score: 1.14e-06

# Hessian eigenvalues (should all be negative for concavity)
H <- hess_fn(df, estimate$par)
cat("Hessian eigenvalues:", round(eigen(H)$values, 2), "\n")
#> Hessian eigenvalues: -142.6 -153.3 -167.4 -174.5 -370.4

Fisher information and confidence intervals

The observed Fisher information is I(θ̂)=H(θ̂)I(\hat\theta) = -H(\hat\theta). Asymptotic (1α)(1-\alpha)% Wald confidence intervals are: θ̂j±z1α/2I(θ̂)jj1. \hat\theta_j \pm z_{1-\alpha/2} \sqrt{I(\hat\theta)^{-1}_{jj}}.

alpha <- 0.05
z <- qnorm(1 - alpha / 2)
se <- sqrt(diag(estimate$vcov))

ci_df <- data.frame(
  Component = 1:m,
  Lower = estimate$par - z * se,
  MLE = estimate$par,
  Upper = estimate$par + z * se,
  True = theta,
  Covered = (estimate$par - z * se <= theta) & (theta <= estimate$par + z * se)
)

knitr::kable(ci_df, digits = 4,
             caption = "95% Wald confidence intervals",
             col.names = c("Comp.", "Lower", "MLE", "Upper", "True", "Covered?"))
95% Wald confidence intervals
Comp. Lower MLE Upper True Covered?
1 0.8348 0.9824 1.130 1.00 TRUE
2 0.8508 0.9906 1.130 1.10 TRUE
3 0.8016 0.9429 1.084 0.95 TRUE
4 0.9552 1.1041 1.253 1.15 TRUE
5 1.0357 1.1903 1.345 1.10 TRUE

Homogeneous Weibull

The homogeneous Weibull model constrains all mm components to share a common shape parameter kk, while allowing individual scale parameters β1,,βm\beta_1, \ldots, \beta_m. The parameter vector is θ=(k,β1,,βm)>0m+1\theta = (k, \beta_1, \ldots, \beta_m) \in \mathbb{R}^{m+1}_{> 0}.

Key properties

Property 1: The system lifetime is Weibull. Because all shapes are equal, the system reliability simplifies: R(t)=j=1mexp((tβj)k)=exp((tβsys)k), R(t) = \prod_{j=1}^m \exp\!\left(-\left(\frac{t}{\beta_j}\right)^k\right) = \exp\!\left(-\left(\frac{t}{\beta_{\text{sys}}}\right)^k\right), where βsys=(j=1mβjk)1/k\beta_{\text{sys}} = \bigl(\sum_{j=1}^m \beta_j^{-k}\bigr)^{-1/k}. This closure under the minimum is the structural advantage of the homogeneous model.

Property 2: Constant cause probabilities. The time dependence tk1t^{k-1} cancels in the hazard ratio: P(K=jT=t)=βjklβlkwj. P(K = j \mid T = t) = \frac{\beta_j^{-k}}{\sum_l \beta_l^{-k}} \eqqcolon w_j. Just as in the exponential case, the conditional cause probability does not depend on tt.

Property 3: Closed-form censored contributions. Left- and interval-censored terms factor as logwci+log[Weibull CDF difference]\log w_{c_i} + \log [\text{Weibull CDF difference}] — no numerical integration needed.

R code: Setup and hazard visualization

theta_hom <- c(k = 1.5, beta1 = 100, beta2 = 150, beta3 = 200)
k <- theta_hom[1]
scales <- theta_hom[-1]
m_hom <- length(scales)

model_hom <- wei_series_homogeneous_md_c1_c2_c3()

# System scale
beta_sys <- wei_series_system_scale(k, scales)
cat("System scale:", round(beta_sys, 2), "\n")
#> System scale: 65.24

# Plot component hazards
t_grid <- seq(0.1, 200, length.out = 300)
cols <- c("steelblue", "forestgreen", "firebrick")

plot(NULL, xlim = c(0, 200), ylim = c(0, 0.06),
     xlab = "Time", ylab = "Hazard h_j(t)",
     main = "Homogeneous Weibull: component hazards (k = 1.5)")
for (j in seq_len(m_hom)) {
  h_j <- component_hazard(model_hom, j)
  lines(t_grid, h_j(t_grid, theta_hom), col = cols[j], lwd = 2)
}
legend("topleft", paste0("Comp ", 1:m_hom, " (beta=", scales, ")"),
       col = cols, lwd = 2, cex = 0.9)
grid()

All hazards are increasing (k>1k > 1), sharing the same shape but differing in magnitude. Component 1 (smallest β\beta) dominates at all times.

Cause probabilities

# Analytical cause weights
w <- scales^(-k) / sum(scales^(-k))
names(w) <- paste0("Component ", 1:m_hom)
cat("Time-invariant cause weights:\n")
#> Time-invariant cause weights:
print(round(w, 4))
#> Component 1 Component 2 Component 3 
#>      0.5269      0.2868      0.1863

# Verify with package function (pass scales so model knows m)
ccp_fn <- conditional_cause_probability(
  wei_series_homogeneous_md_c1_c2_c3(scales = scales)
)
probs <- ccp_fn(c(10, 50, 100, 150), theta_hom)
knitr::kable(probs, digits = 4,
             caption = "P(K=j | T=t) at four time points (should be constant)",
             col.names = paste0("Comp ", 1:m_hom))
P(K=j | T=t) at four time points (should be constant)
Comp 1 Comp 2 Comp 3
0.5269 0.2868 0.1863
0.5269 0.2868 0.1863
0.5269 0.2868 0.1863
0.5269 0.2868 0.1863

Data generation and MLE

gen_hom <- rdata(model_hom)
set.seed(2024)

df_hom <- gen_hom(theta_hom, n = 500, p = 0.3,
                  observe = observe_periodic(delta = 20, tau = 250))

cat("Observation types:\n")
#> Observation types:
print(table(df_hom$omega))
#> 
#> interval 
#>      500
solver_hom <- fit(model_hom)
theta0_hom <- c(1.2, 110, 140, 180)

est_hom <- solver_hom(df_hom, par = theta0_hom, method = "Nelder-Mead")

recovery_hom <- data.frame(
  Parameter = c("k", paste0("beta_", 1:m_hom)),
  True = theta_hom,
  MLE = est_hom$par,
  SE = sqrt(diag(est_hom$vcov)),
  Rel_Error_Pct = 100 * (est_hom$par - theta_hom) / theta_hom
)

knitr::kable(recovery_hom, digits = 3,
             caption = paste0("Homogeneous Weibull MLE (n = 500)"),
             col.names = c("Parameter", "True", "MLE", "SE", "Rel. Error %"))
Homogeneous Weibull MLE (n = 500)
Parameter True MLE SE Rel. Error %
k k 1.5 1.48 0.054 -1.353
beta1 beta_1 100.0 100.94 4.629 0.938
beta2 beta_2 150.0 149.80 10.157 -0.131
beta3 beta_3 200.0 247.39 27.542 23.696

Heterogeneous Weibull

The heterogeneous Weibull model allows each component to have its own shape kjk_j and scale βj\beta_j, giving a 2m2m-dimensional parameter vector θ=(k1,β1,,km,βm)\theta = (k_1, \beta_1, \ldots, k_m, \beta_m).

Differences from the homogeneous model

Property Homogeneous Heterogeneous
Parameters m+1m + 1 2m2m
System lifetime Weibull(kk, βsys\beta_{\text{sys}}) Not Weibull
Cause probabilities Constant in tt Time-varying
Left/interval likelihood Closed form Numerical integration

The additional flexibility comes at a cost: the system lifetime T=min(T1,,Tm)T = \min(T_1, \ldots, T_m) is no longer Weibull when shapes differ, and left-censored or interval-censored likelihood contributions require numerical integration (stats::integrate).

R code: Mixed failure regimes

We construct a 3-component system with DFR (infant mortality), CFR (random), and IFR (wear-out) components:

theta_het <- c(0.7, 200,   # DFR electronics
               1.0, 150,   # CFR seals
               2.0, 100)   # IFR bearing
m_het <- length(theta_het) / 2

model_het <- wei_series_md_c1_c2_c3(
  shapes = theta_het[seq(1, 6, 2)],
  scales = theta_het[seq(2, 6, 2)]
)

# Component hazard functions
t_het <- seq(0.5, 120, length.out = 300)

h_fns <- lapply(1:m_het, function(j) component_hazard(model_het, j))
h_vals <- sapply(h_fns, function(hf) hf(t_het, theta_het))

plot(t_het, h_vals[, 1], type = "l", col = "steelblue", lwd = 2,
     ylim = c(0, 0.05),
     xlab = "Time", ylab = "Hazard h_j(t)",
     main = "Heterogeneous Weibull: mixed failure regimes")
lines(t_het, h_vals[, 2], col = "forestgreen", lwd = 2, lty = 2)
lines(t_het, h_vals[, 3], col = "firebrick", lwd = 2, lty = 3)
legend("topright",
       c("Comp 1: DFR (k=0.7)", "Comp 2: CFR (k=1.0)", "Comp 3: IFR (k=2.0)"),
       col = c("steelblue", "forestgreen", "firebrick"),
       lty = 1:3, lwd = 2, cex = 0.85)
grid()

Time-varying cause probabilities

ccp_het <- conditional_cause_probability(model_het)
probs_het <- ccp_het(t_het, theta_het)

plot(t_het, probs_het[, 1], type = "l", col = "steelblue", lwd = 2,
     ylim = c(0, 1), xlab = "Time", ylab = "P(K = j | T = t)",
     main = "Heterogeneous Weibull: time-varying cause probabilities")
lines(t_het, probs_het[, 2], col = "forestgreen", lwd = 2, lty = 2)
lines(t_het, probs_het[, 3], col = "firebrick", lwd = 2, lty = 3)
legend("right",
       c("DFR (k=0.7)", "CFR (k=1.0)", "IFR (k=2.0)"),
       col = c("steelblue", "forestgreen", "firebrick"),
       lty = 1:3, lwd = 2, cex = 0.85)
grid()

The crossover pattern is characteristic of bathtub-curve behavior at the system level: early failures are dominated by infant-mortality mechanisms, while late failures are dominated by wear-out.

Data generation and MLE

gen_het <- rdata(model_het)
set.seed(7231)

df_het <- gen_het(theta_het, n = 300, tau = 120, p = 0.3,
                  observe = observe_right_censor(tau = 120))

cat("Observation types:\n")
#> Observation types:
print(table(df_het$omega))
#> 
#> exact right 
#>   286    14

We use right-censoring for speed — exact and right-censored observations have closed-form likelihood contributions. Left- and interval-censored observations require numerical integration per observation per optimization iteration.

solver_het <- fit(model_het)
theta0_het <- rep(c(1, 150), m_het)

est_het <- solver_het(df_het, par = theta0_het, method = "Nelder-Mead")

par_names <- paste0(
  rep(c("k", "beta"), m_het), "_",
  rep(1:m_het, each = 2)
)

recovery_het <- data.frame(
  Parameter = par_names,
  True = theta_het,
  MLE = est_het$par,
  SE = sqrt(diag(est_het$vcov)),
  Rel_Error_Pct = 100 * (est_het$par - theta_het) / theta_het
)

knitr::kable(recovery_het, digits = 3,
             caption = paste0("Heterogeneous Weibull MLE (n = 300)"),
             col.names = c("Parameter", "True", "MLE", "SE", "Rel. Error %"))
Heterogeneous Weibull MLE (n = 300)
Parameter True MLE SE Rel. Error %
k_1 0.7 0.744 0.062 6.233
beta_1 200.0 145.388 19.797 -27.306
k_2 1.0 1.137 0.126 13.668
beta_2 150.0 150.563 21.610 0.376
k_3 2.0 2.341 0.220 17.071
beta_3 100.0 102.378 5.510 2.378

Model comparison: heterogeneous vs homogeneous

When the true DGP has heterogeneous shapes, the homogeneous model is misspecified. We compare the two on the same data:

# Refit on larger dataset for sharper comparison
set.seed(999)
df_comp <- gen_het(theta_het, n = 800, tau = 150, p = 0.2,
                   observe = observe_right_censor(tau = 150))

# Heterogeneous (6 params)
fit_het <- solver_het(df_comp, par = theta0_het, method = "Nelder-Mead")

# Homogeneous (4 params)
model_hom2 <- wei_series_homogeneous_md_c1_c2_c3()
solver_hom2 <- fit(model_hom2)
fit_hom2 <- solver_hom2(df_comp, par = c(1, 150, 150, 150),
                        method = "Nelder-Mead")

# Likelihood ratio test
LRT <- 2 * (fit_het$loglik - fit_hom2$loglik)
df_lrt <- 6 - 4
p_val <- pchisq(LRT, df = df_lrt, lower.tail = FALSE)

comp_df <- data.frame(
  Model = c("Heterogeneous (2m = 6)", "Homogeneous (m+1 = 4)"),
  Params = c(6, 4),
  LogLik = c(fit_het$loglik, fit_hom2$loglik),
  AIC = c(-2 * fit_het$loglik + 2 * 6, -2 * fit_hom2$loglik + 2 * 4)
)

knitr::kable(comp_df, digits = 2,
             caption = "Model comparison",
             col.names = c("Model", "# Params", "Log-Lik", "AIC"))
Model comparison
Model # Params Log-Lik AIC
Heterogeneous (2m = 6) 6 -4373 8759
Homogeneous (m+1 = 4) 4 -4451 8911

cat(sprintf("\nLRT statistic: %.2f (df = %d, p = %.4f)\n",
            LRT, df_lrt, p_val))
#> 
#> LRT statistic: 155.88 (df = 2, p = 0.0000)

Both AIC and the likelihood ratio test favor the heterogeneous model when the true DGP has unequal shapes. The common-shape estimate from the homogeneous fit represents a compromise value between 0.7, 1.0, and 2.0, leading to biased estimates of the scale parameters.

Monte Carlo Assessment

We briefly assess finite-sample properties of the MLE for each model family. Full Monte Carlo studies with sensitivity analyses are provided in the dedicated vignettes (exponential_series, weibull_series, weibull_homogeneous_series).

set.seed(42)
B <- 100
alpha <- 0.05

# Exponential: 5 components
theta_mc_exp <- c(1.0, 1.1, 0.95, 1.15, 1.1)
m_mc_exp <- length(theta_mc_exp)
n_mc <- 1000

model_mc_exp <- exp_series_md_c1_c2_c3()
gen_mc_exp <- rdata(model_mc_exp)
solver_mc_exp <- fit(model_mc_exp)

est_exp <- matrix(NA, B, m_mc_exp)
conv_exp <- logical(B)

for (b in seq_len(B)) {
  df_b <- gen_mc_exp(theta_mc_exp, n = n_mc, p = 0.3,
                     observe = observe_right_censor(tau = 3))
  tryCatch({
    fit_b <- solver_mc_exp(df_b, par = rep(1, m_mc_exp),
                           method = "Nelder-Mead")
    est_exp[b, ] <- fit_b$par
    conv_exp[b] <- fit_b$converged
  }, error = function(e) conv_exp[b] <<- FALSE)
}

# Homogeneous Weibull: 3 components
theta_mc_hom <- c(1.5, 100, 150, 200)
m_mc_hom <- 3
model_mc_hom <- wei_series_homogeneous_md_c1_c2_c3()
gen_mc_hom <- rdata(model_mc_hom)
solver_mc_hom <- fit(model_mc_hom)

beta_sys_mc <- wei_series_system_scale(theta_mc_hom[1], theta_mc_hom[-1])
tau_mc_hom <- qweibull(0.75, shape = theta_mc_hom[1], scale = beta_sys_mc)

est_hom <- matrix(NA, B, m_mc_hom + 1)
conv_hom <- logical(B)

for (b in seq_len(B)) {
  df_b <- gen_mc_hom(theta_mc_hom, n = n_mc, p = 0.3,
                     observe = observe_right_censor(tau = tau_mc_hom))
  tryCatch({
    fit_b <- solver_mc_hom(df_b, par = c(1, 120, 120, 120),
                           method = "L-BFGS-B",
                           lower = rep(1e-6, m_mc_hom + 1))
    est_hom[b, ] <- fit_b$par
    conv_hom[b] <- fit_b$converged
  }, error = function(e) conv_hom[b] <<- FALSE)
}

# Heterogeneous Weibull: 3 components
theta_mc_het <- c(0.8, 150, 1.5, 120, 2.0, 100)
m_mc_het <- 3
model_mc_het <- wei_series_md_c1_c2_c3()
gen_mc_het <- rdata(model_mc_het)
solver_mc_het <- fit(model_mc_het)

est_het <- matrix(NA, B, 2 * m_mc_het)
conv_het <- logical(B)

for (b in seq_len(B)) {
  df_b <- gen_mc_het(theta_mc_het, n = n_mc, tau = 200, p = 0.2,
                     observe = observe_right_censor(tau = 200))
  tryCatch({
    fit_b <- solver_mc_het(df_b, par = rep(c(1, 130), m_mc_het),
                           method = "L-BFGS-B",
                           lower = rep(1e-6, 2 * m_mc_het))
    est_het[b, ] <- fit_b$par
    conv_het[b] <- fit_b$converged
  }, error = function(e) conv_het[b] <<- FALSE)
}
mc_summary <- function(estimates, converged, theta, par_names) {
  valid <- converged & !is.na(estimates[, 1])
  ev <- estimates[valid, , drop = FALSE]
  bias <- colMeans(ev) - theta
  rmse <- sqrt(bias^2 + apply(ev, 2, var))
  data.frame(
    Parameter = par_names,
    True = theta,
    Mean_Est = colMeans(ev),
    Bias = bias,
    RMSE = rmse,
    Rel_Bias_Pct = 100 * bias / theta,
    stringsAsFactors = FALSE
  )
}

# Exponential
cat("=== Exponential ===\n")
#> === Exponential ===
cat("Convergence:", mean(conv_exp), "\n\n")
#> Convergence: 0.99
exp_table <- mc_summary(est_exp, conv_exp, theta_mc_exp,
                        paste0("lambda_", 1:m_mc_exp))
knitr::kable(exp_table, digits = 4,
             caption = "Exponential MC (B=100, n=1000)",
             col.names = c("Parameter", "True", "Mean Est.", "Bias",
                          "RMSE", "Rel. Bias %"))
Exponential MC (B=100, n=1000)
Parameter True Mean Est. Bias RMSE Rel. Bias %
lambda_1 1.00 0.9906 -0.0094 0.0968 -0.9366
lambda_2 1.10 1.1103 0.0103 0.1183 0.9401
lambda_3 0.95 0.9688 0.0188 0.1071 1.9777
lambda_4 1.15 1.1424 -0.0076 0.1062 -0.6650
lambda_5 1.10 1.1039 0.0039 0.1065 0.3550
cat("\n=== Homogeneous Weibull ===\n")
#> 
#> === Homogeneous Weibull ===
cat("Convergence:", mean(conv_hom), "\n\n")
#> Convergence: 0.8
hom_table <- mc_summary(est_hom, conv_hom, theta_mc_hom,
                        c("k", paste0("beta_", 1:m_mc_hom)))
knitr::kable(hom_table, digits = 4,
             caption = "Homogeneous Weibull MC (B=100, n=1000)",
             col.names = c("Parameter", "True", "Mean Est.", "Bias",
                          "RMSE", "Rel. Bias %"))
Homogeneous Weibull MC (B=100, n=1000)
Parameter True Mean Est. Bias RMSE Rel. Bias %
k 1.5 1.496 -0.0036 0.0507 -0.2375
beta_1 100.0 100.242 0.2423 3.9623 0.2423
beta_2 150.0 151.721 1.7208 9.2254 1.1472
beta_3 200.0 201.174 1.1742 17.2010 0.5871
cat("\n=== Heterogeneous Weibull ===\n")
#> 
#> === Heterogeneous Weibull ===
cat("Convergence:", mean(conv_het), "\n\n")
#> Convergence: 0.78
het_names <- paste0(rep(c("k", "beta"), m_mc_het), "_",
                    rep(1:m_mc_het, each = 2))
het_table <- mc_summary(est_het, conv_het, theta_mc_het, het_names)
knitr::kable(het_table, digits = 4,
             caption = "Heterogeneous Weibull MC (B=100, n=1000)",
             col.names = c("Parameter", "True", "Mean Est.", "Bias",
                          "RMSE", "Rel. Bias %"))
Heterogeneous Weibull MC (B=100, n=1000)
Parameter True Mean Est. Bias RMSE Rel. Bias %
k_1 0.8 0.8012 0.0012 0.0351 0.1440
beta_1 150.0 148.7311 -1.2689 12.2943 -0.8460
k_2 1.5 1.5145 0.0145 0.0821 0.9648
beta_2 120.0 120.9411 0.9411 6.0519 0.7842
k_3 2.0 1.9917 -0.0083 0.0936 -0.4159
beta_3 100.0 100.1777 0.1777 3.2279 0.1777

Practical Considerations

Starting values. For the exponential model, starting at λj=1\lambda_j = 1 for all jj generally works. For Weibull models, start with shapes near 1 (the exponential case) and scales estimated from sample quantiles. An exponential fit can provide good initial scale values via β̂j=1/λ̂j\hat\beta_j = 1/\hat\lambda_j.

Identifiability. If two components are always masked together (i.e., they appear in every candidate set as a pair), their individual parameters are not identifiable — only their sum (exponential) or combined contribution (Weibull) can be estimated. This is a structural limitation of the observational design, not the estimation method.

Computational cost. The three models differ substantially in computation time:

  • Exponential: O(n)O(n) per log-likelihood evaluation. All four observation types have closed-form loglik, score, and Hessian.
  • Homogeneous Weibull: O(n)O(n) for loglik (closed form for all types). Score uses numDeriv::grad for left/interval observations.
  • Heterogeneous Weibull: O(nexact+nright)O(n_{\text{exact}} + n_{\text{right}}) for closed-form terms, plus O(nleft+ninterval)O(n_{\text{left}} + n_{\text{interval}}) calls to stats::integrate for numerical integration terms.

For datasets with many left- or interval-censored observations under the heterogeneous model, each optimization iteration involves hundreds of numerical integrations. Strategies to manage this include using the homogeneous model as a warm start, or extending the study period to convert more observations to exact/right-censored.

Bootstrap confidence intervals. As an alternative to asymptotic Wald intervals (based on the observed Fisher information), parametric bootstrap confidence intervals can be constructed by repeatedly simulating from the fitted model and refitting. This is more computationally expensive but avoids reliance on large-sample normality approximations, which can be inaccurate when the sample size is small or the likelihood surface is asymmetric.