Skip to contents

The R package likelihood.md.series.system is a framework for estimating the parameters of latent component lifetimes from masked data in a series system.

Exponentially Distributed Component Lifetimes

Consider a series system in which the components have exponentially distributed lifetimes. The jj component of the ii has a lifetime distribution given by TijEXP(λj) T_{i j} \sim \operatorname{EXP}(\lambda_j) for j=1,,mj=1,\ldots,m. Thus, λ=(λ1,,λm)\lambda = \bigl(\lambda_1,\ldots,\lambda_m\bigr). The random variable TijT_{i j} has a reliability, pdf, and hazard function given respectively by Rj(t|λj)=exp(λjt),fj(t|λj)=λjexp(λjt),hj(|λj)=λj\begin{align} \label{eq:expo_reliability} R_j(t|\lambda_j) &= \exp(-\lambda_j t),\\ \label{eq:expo_pdf} f_j(t|\lambda_j) &= \lambda_j \exp(-\lambda_j t),\\ \label{eq:expo_haz} h_j(\cdot|\lambda_j) &= \lambda_j \end{align} where t>0t > 0 is the lifetime and λj>0\lambda_j > 0 is the failure rate of the jj-th component.

The lifetime of the series system composed of mm components with exponentially distributed lifetimes has a reliability function given by RTi(ti|𝛌)=exp(j=1mλjti)\begin{equation} \label{eq:sys_expo_reliability_function} R_{T_i}(t_i|\boldsymbol{\lambda}) = \exp \biggl(-\sum_{j=1}^{m}{\lambda_j} t_i\biggr) \end{equation} where t>0t > 0. A series system with exponentially distributed lifetimes is also exponentially distributed.

The series system’s failure rate function is given by h(|𝛌)=j=1mλj\begin{equation} \label{eq:expo_sys_failure_rate} h(\cdot|\boldsymbol{\lambda}) = \sum_{j=1}^{m} \lambda_j \end{equation} whose proof follows from Theorem .

We see that the failure rate 𝛌=j=1nλ\boldsymbol{\lambda }= \sum_{j=1}^n \lambda is constant, consistent with the the exponential distribution being the only continuous distribution that has a constant failure rate.

The pdf of the series system is given by fTi(ti|𝛌)=(j=1mλj)exp(j=1mλjti)\begin{equation} \label{eq:expo_sys_pdf} f_{T_i}(t_i|\boldsymbol{\lambda}) = \biggl( \sum_{j=1}^{m} {\lambda_j} \biggr) \exp \biggl(-\sum_{j=1}^{m}{\lambda_j} t_i\biggr) \end{equation} where ti>0t_i > 0 is the lifetime of the system. The conditional probability that component kk is the cause of a system failure at time tt is given by fKi|Ti(k|t,𝛌)=fKi(k|𝛌)=λkp=1mλp\begin{equation} \label{eq:expo_prob_K_given_S} f_{K_i|T_i}(k|t,\boldsymbol{\lambda}) = f_{K_i}(k|\boldsymbol{\lambda}) = \frac{\lambda_k}{\sum_{p=1}^{m} \lambda_p} \end{equation} where k{1,,m}k \in \{1,\ldots,m\} and t>0t > 0. Due to the constant failure rates of the components, KiK_i and TiT_i are mutually independent. The joint pdf of KiK_i and TiT_i is given by fKi,Ti(k,t|𝛌)=λkexp(j=1mλjt)\begin{equation} \label{eq:expo_joint_k_s} f_{K_i,T_i}(k,t|\boldsymbol{\lambda}) = \lambda_k \exp \biggl(-\sum_{j=1}^{m}{\lambda_j} t\biggr) \end{equation} where k{1,,m}k \in \{1,\ldots,m\} and t>0t > 0.

Likelihood Model

In this study, the system is a series system with mm components. The true DGP for the system lifetime is in the exponential series system family, i.e., the component lifetimes are exponentially and independently distributed and we denote the true parameter value by θ\theta.

The principle object of study is $, which in the case of the exponential series system family consists of mm rate (scale) parameters for each component lifetime, 𝛌=(λ1,,λm)\boldsymbol{\lambda }= (\lambda_1, \ldots, \lambda_m)'.

We are interested in estimating the θ\theta from masked data. The masking comes in two independent forms:

  • Censored system failure times, e.g., right-censoring. The system failure time is the minimum of the component lifetimes, and it is right-censored if the system failure time does not occur during the observation period, Ti=min{τi,Ti1,,Tim}, T_i = \min\{\tau_i, T_{i 1}, \ldots, T_{i m}\}, where τi\tau_i is the right-censoring time for the ii observation and Ti1,,TimT_{i 1},\ldots,T_{i m} are the component lifetimes for the iith system.

  • The cause of failure, the failed component, is masked. This masking comes in the form of a candidate set 𝒞i\mathcal{C}_i that, on average, conveys information about the component cause of failure.

The candidate set 𝒞i\mathcal{C}_i is a random variable that is a subset of {1,,m}\{1,\ldots,m\}. The true DGP for the candidate set model has a general form that may be denoted by Pr{𝒞i=ci|T1=j,,Tm,θ,other factors}. \Pr\{\mathcal{C}_i=c_i|T_1=j,\ldots,T_m,\theta,\text{other factors}\}.

This is a pretty complicated looking model, and we are not even interested in the DGP for candidate sets, except to the extent that it affects the sampling distribution of the MLE for θ\theta.

In theory, given some candidate set model, we could construct a joint likelihood function for the full model and jointly estimate the parameters of both the candidate set model and θ\theta. In practice, however, this could be a very challenging task unless we make some simplifying assumptions about the DGP for candidate sets.

Candidate set models

In every model we consider, we assume that the candidate set 𝒞i\mathcal{C}_i is only a function of the component lifetimes Ti1,,TimT_{i 1},\ldots,T_{i m}, θ\theta, and the right-censoring time τi\tau_i. That is, the candidate set 𝒞i\mathcal{C}_i is independent of any other factors (or held constant for the duration of the experiment), like ambient temperature, and these factors also have a neglible effect on the series system lifetime and thus we can ignore them.

Reduced likelihood model

In the Bernoulli candidate set model, we make the following assumptions about how candidate sets are generated:

  • C1C_1: The index of the failed component is in the candidate set, i.e., Pr{Ki𝒞i}=1\Pr\{K_i \in \mathcal{C}_i\} = 1, where Ki=argminj{Tij:j=1,,m}K_i = \arg\min_j \{ T_{i j} : j = 1,\ldots,m\}.

  • C2C_2: The probability of CiC_i given KiK_i and TiT_i is equally probable when the failed component varies over the components in the candidate set, i.e., Pr{𝒞i=ci|Ki=j,Ti=ti,θ}=Pr{Ci=ci|Ki=j,Ti=ti}\Pr\{\mathcal{C}_i=c_i|K_i=j,T_i=t_i,\theta\} = \Pr\{C_i=c_i|K_i=j',T_i=t_i\} for any j,jcij,j' \in c_i.

  • C3C_3: The masking probabilities are conditionally independent of θ\theta given KiK_i and TiT_i, i.e., Pr{𝒞i=ci|Ki=j,Ti=ti}\Pr\{\mathcal{C}_i=c_i|K_i=j,T_i=t_i\} is not a function of θ\theta.

Using these simplifying assumptions, we can arrive at a reduced likelihood function that only depends on θ\theta and the observed data and as long as our candidate set satisfies conditions C1C_1, C2C_2, and C3C_3, our reduced likelihood function obtains the same MLEs as the full likelihood function.

We see that Pr{𝒞i=ci,|Ki=j,Ti=ti}=g(ci,ti), \Pr\{\mathcal{C}_i=c_i,|K_i=j,T_i=t_i\} = g(c_i,t_i), since the probability cannot depend on jj by condition C2C_2 and cannot depend on θ\theta by condition C3C_3. Thus, we can write the likelihood function as L(θ)=i=1nfTi(ti|θ)g(ci,ti). L(\theta) = \prod_{i=1}^n f_{T_i}(t_i|\theta) g(c_i,t_i).

We show that g(ci,ti)g(c_i,t_i) is proportional to g(ci,ti)jcifj(ti|θj)l=j,ljmRl(ti|θl), g(c_i,t_i) \propto \sum_{j \in c_i} f_j(t_i|\theta_j) \prod_{l=j,l \neq j}^m R_l(t_i|\theta_l), and thus

Note, however, that different ways in which the conditions are met will yield MLEs with different sampling distributions, e.g., more or less efficient estimators.

Bernoulli candidate set model #1

This is a special case of the reduced likelihood model. In this model, we satisfy conditions C1C_1, C2C_2, and C3C_3, but we include each of the non-failed components with a fixed probability pp, 0<p<10 < p < 1.

In the simplest case, p=0.5p = 0.5, and candidate set cic_i has a probability given by Pr{𝒞i=ci|Ki=j,Ti=ti}={(1/2)m1if jci and ci{1,,m}0if jci. \Pr\{\mathcal{C}_i=c_i|K_i=j,T_i=t_i\} = \begin{cases} (1/2)^{m-1} & \text{if $j \in c_i$ and $c_i \subseteq \{1,\ldots,m\}$} \\ 0 & \text{if $j \notin c_i$}. \end{cases}

Bernoulli candidate set model #2

Now, we remove condition C2C_2. We still assume conditions C1C_1 and C3C_3, but we allow CiC_i to depend on the failed component KiK_i, i.e., Pr{𝒞i=ci|Ki=j,Ti=ti,θ}Pr{Ci=ci|Ki=j,Ti=ti} \Pr\{\mathcal{C}_i=c_i|K_i=j,T_i=t_i,\theta\} \neq \Pr\{C_i=c_i|K_i=j',T_i=t_i\} for j,jcij,j' \in c_i.

In this case, we can write the likelihood function as L(θ)=i=1nfTi(ti|θ)j=1mPr{Ki=j|Ti=ti}ci𝒞ig(ci,ti,j). L(\theta) = \prod_{i=1}^n f_{T_i}(t_i|\theta) \prod_{j=1}^m \Pr\{K_i=j|T_i=t_i\} \prod_{c_i \in \mathcal{C}_i} g(c_i,t_i,j).

Simulation

The most straightforward series system to estimate is the series system with exponentially distributed component lifetimes.

Suppose an exponential series system with mm components is parameterized by the following R code:

theta <- c(1,     # component 1 failure rate
           1.1,   # 3
           0.95,  # 5
           1.15,  # 6
           1.1)   # 7

m <- length(theta)

So, in our study, θ=(1,1.1,0.95,1.15,1.1)\theta = (1, 1.1, 0.95, 1.15, 1.1)'. The component assigned to index jj has an exponentially distributed lifetime with a failure rate θj\theta_j, e.g., θ2=1.1\theta_2 = 1.1 is the failure rate of the component indexed by 22.

Let’s simulate generating the lifetimes of the m=5m = 5 components for this series system:

set.seed(7231) # set seed for reproducibility
n <- 7500
comp_times <- matrix(nrow=n,ncol=m)
for (j in 1:m)
    comp_times[,j] <- rexp(n,theta[j])
comp_times <- md_encode_matrix(comp_times,"t")
print(comp_times,n=4)
#> # A tibble: 7,500 × 5
#>      t1     t2    t3     t4    t5
#>   <dbl>  <dbl> <dbl>  <dbl> <dbl>
#> 1  2.95 1.67   0.777 0.0476 0.380
#> 2  1.10 2.20   2.59  3.27   0.626
#> 3  1.20 0.0956 3.74  1.80   2.13 
#> 4  1.44 0.0460 0.123 0.522  0.102
#> # ℹ 7,496 more rows

Next, we use the function md_series_lifetime_right_censoring to decorate the masked data with the right-censor-censoring time chosen by the probability Pr{Ti>τ}=0.75\Pr\{T_i > \tau\} = 0.75:

q <- 0.25
tau <- rep(-(1/sum(theta))*log(q),n)
data <- comp_times %>% md_series_lifetime_right_censoring(tau)
print(data,n=4,drop_latent=TRUE)
#> Latent variables:  t1 t2 t3 t4 t5 
#> # A tibble: 7,500 × 2
#>        t delta
#>    <dbl> <lgl>
#> 1 0.0476 TRUE 
#> 2 0.262  FALSE
#> 3 0.0956 TRUE 
#> 4 0.0460 TRUE 
#> # ℹ 7,496 more rows

Masked component cause of failure

We simulate candidate sets using the Bernoulli candidate model with an appropriate set of parameters to satisfy conditions C1C_1, C2C_2, and C3C_3:

p <- .3
data <- data %>% md_bernoulli_cand_c1_c2_c3(p)
print(data[,paste0("q",1:m)],n=4)
#> Latent variables:  q1 q2 q3 q4 q5 t1 t2 t3 t4 t5 
#> # A tibble: 7,500 × 5
#>      q1    q2    q3    q4    q5
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1   0.3   0.3   0.3   1     0.3
#> 2   0     0     0     0     0  
#> 3   0.3   1     0.3   0.3   0.3
#> 4   0.3   1     0.3   0.3   0.3
#> # ℹ 7,496 more rows

Now, to generate candidate sets, we sample from these probabilities:

data <- data %>% md_cand_sampler()
print(md_boolean_matrix_to_charsets(data,drop_set=TRUE),drop_latent=TRUE,n=6)
#> Latent variables:  q1 q2 q3 q4 q5 t1 t2 t3 t4 t5 
#> # A tibble: 7,500 × 3
#>        t delta x        
#>    <dbl> <lgl> <chr>    
#> 1 0.0476 TRUE  {1, 4}   
#> 2 0.262  FALSE {}       
#> 3 0.0956 TRUE  {1, 2, 5}
#> 4 0.0460 TRUE  {1, 2, 4}
#> 5 0.0290 TRUE  {4}      
#> 6 0.160  TRUE  {3, 5}   
#> # ℹ 7,494 more rows

We see that after dropping latent (unobserved) columns, we only have the right censoring time, right censoring indicator, and the candidate sets. (Note that this time we showed the candidate sets in a more friendly way using md_boolean_matrix_to_charsets.)

Likelihood Model

The likelihood model is a statistical model that describes the distribution of the observed data as a function of the parameters of interest.

We construct a likelihood model for the masked data model with exponentially distributed component lifetimes with the following code:

Maximum likelihood estimation

The log-likelihood for our masked data model when we assume Conditions , , and is given by (λ)=i=1n(1δi)log(jciλj)(i=1nsi)(j=1mλj).\begin{equation} \label{eq:} \ell(\lambda) = \sum_{i=1}^{n} (1-\delta_i) \log \biggl(\sum_{j \in c_i} \lambda_j \biggr) - \biggl( \sum_{i=1}^{n} s_i \biggr) \biggl( \sum_{j=1}^{m} \lambda_j \biggr). \end{equation}

The set of solutions to the MLE equations must be stationary points, i.e., a point at which the score function of type mm\mathbb{R}^m \mapsto \mathbb{R}^m is zero. The jj-th component of the output of the score function is given by λp=i=1n(jciλj)11{pci and δi=0}i=1nsi.\begin{equation} \label{eq:score_expo_j} \frac{\partial \ell}{\partial \lambda_p} = \sum_{i=1}^{n} \biggl( \sum_{j \in c_i} \lambda_j \biggr)^{-1} 1_{\{p \in c_i \text{ and } \delta_i = 0\}} - \sum_{i=1}^{n} s_i. \end{equation}

We may find an MLE by solving the maximum likelihood equation , i.e., a set of (stationary) points satisfying λj|λ̂j=0 \frac{\partial \ell}{\partial \lambda_j}\Biggr|_{\hat\lambda_j} = 0 for j=1,,mj=1,\ldots,m. We approximate a solution to this problem by using the iterative Newton-Raphson method as described in Section .

The Newton-Raphson method needs the observed information matrix, which is a function of λ\lambda of type mm×m\mathbb{R}^m \mapsto \mathbb{R}^{m \times m}. The (j,k)(j,k)-th element of J(λ)J(\lambda) is given by 2λjλk=i=1n(jciλj)21{j,kci and δi=0}.\begin{equation} \label{eq:info_expo} \frac{\partial^2 \ell}{\partial \lambda_j \partial \lambda_k} = \sum_{i=1}^{n} \biggl( \sum_{j \in c_i} \lambda_j \biggr)^{-2} 1_{\{j,k \in c_i \text{ and } \delta_i = 0\}}. \end{equation}

Log-likelihood of θ\theta given masked data

The reduced log-likelihood function (the log of the kernel of the likelihood function) is given by (θ|data)=(i=1nti)(j=1mθj)+i=1n(1δi)log(jciθj). \ell(\theta|\text{data}) = -\left(\sum_{i=1}^{n} t_i\right) \left(\sum_{j=1}^{m} \theta_j\right) + \sum_{i=1}^{n} (1-\delta_i)\log\left(\sum_{j \in c_i} \theta_j\right).

We compute the log-likelihood function using generic dispatch:

ll <- loglik(model)

The returned function ll(df, par) evaluates the log-likelihood at parameter par given data df. For example, at the true parameter value:

ll(data, theta)
#> [1] -1485

Note that the implementation uses minimally sufficient statistics, which improves computational efficiency.

The log-likelihood function contains the maximum amount of information about parameter θ\theta given the sample of masked data data satisfying conditions C1C_1, C2C_2, and C3C_3.

Suppose we do not know that θ=(1,1.1,0.95,1.15,1.1)\theta = (1, 1.1, 0.95, 1.15, 1.1)'. With the log-likelihood, we may estimate θ\theta with θ̂\hat\theta by solving θ̂=argmaxθΩ(θ), \hat{\theta} = \operatorname{argmax}_{\theta \in \Omega} \ell(\theta), i.e., finding the point that maximizes the log-likelihood on the observed sample data. This is known as maximum likelihood estimation (MLE). We typically solve for the MLE by solving |θ=θ̂=0. \nabla \ell|_{\theta=\hat{\theta}} = 0.

A popular choice is gradient ascent, which is an iterative method based on the update rule θ(n+1)=θn+η(θn), \theta^{(n+1)} = \theta^n + \eta \ell(\theta^n), where η\eta is the learning rate.

We can also obtain the score (gradient) function via generic dispatch:

grad <- score(model)

The score at the true parameter should be close to zero (at the MLE, it is exactly zero):

grad(data, theta)
#> [1] -28.488  12.506  -9.824 -24.504 -12.322

The likelihood.model framework provides analytical score and Hessian implementations when available, falling back to numerical differentiation otherwise.

In what follows, we use algebraic.mle to help solve the MLE equations and display various properties of the solution.

To solve the MLE equation, we use the generic fit() function, which dispatches to fit.likelihood_model for any object with "likelihood_model" in its class. The fit() function returns a solver that uses optim internally with the BFGS method by default.

# Get the solver from the model using generic dispatch
solver <- fit(model)

# Solve for MLE with initial guess
theta0 <- rep(1, m)
estimate <- solver(data, par = theta0, method = "Nelder-Mead")

The result is an mle object from the algebraic.mle package with rich accessor methods:

# Print summary with confidence intervals
print(estimate)
#> Maximum likelihood estimator of type mle_likelihood_model is normally distributed.
#> The estimates of the parameters are given by:
#> [1] 0.9545 1.1430 0.9440 1.1085 1.0881
#> The standard error is  0.04314 0.04496 0.04247 0.04481 0.04465 .
#> The asymptotic 95% confidence interval of the parameters are given by:
#>          2.5% 97.5%
#> param1 0.8700 1.039
#> param2 1.0549 1.231
#> param3 0.8607 1.027
#> param4 1.0207 1.196
#> param5 1.0005 1.176
#> The MSE of the individual components in a multivariate estimator is:
#>            [,1]       [,2]       [,3]       [,4]       [,5]
#> [1,]  0.0018608 -0.0002202 -0.0002258 -0.0002585 -0.0002654
#> [2,] -0.0002202  0.0020211 -0.0002331 -0.0002648 -0.0002360
#> [3,] -0.0002258 -0.0002331  0.0018034 -0.0002185 -0.0002453
#> [4,] -0.0002585 -0.0002648 -0.0002185  0.0020078 -0.0002316
#> [5,] -0.0002654 -0.0002360 -0.0002453 -0.0002316  0.0019936
#> The log-likelihood is  -1483 .
#> The AIC is  2976 .

We can access specific components of the MLE:

# Point estimate
theta.hat <- estimate$theta.hat
cat("MLE:", round(theta.hat, 4), "\n")
#> MLE: 0.9545 1.143 0.944 1.109 1.088

# Standard errors
cat("SE:", round(estimate$sigma, 4), "\n")
#> SE: 0.0019 -2e-04 -2e-04 -3e-04 -3e-04 -2e-04 0.002 -2e-04 -3e-04 -2e-04 -2e-04 -2e-04 0.0018 -2e-04 -2e-04 -3e-04 -3e-04 -2e-04 0.002 -2e-04 -3e-04 -2e-04 -2e-04 -2e-04 0.002

# Log-likelihood at MLE
cat("Log-likelihood:", round(estimate$loglike, 4), "\n")
#> Log-likelihood: -1483

Recall that the true parameter is θ=(1,1.1,0.95,1.15,1.1)\theta = (1, 1.1, 0.95, 1.15, 1.1)'.

Due to sampling variability, different runs of the experiment will result in different outcomes, i.e., θ̂\hat{\theta} has a sampling distribution. We see that θ̂θ\hat{\theta} \neq \theta, but it is reasonably close. We may measure this sampling variability using the variance-covariance matrix, bias, mean squared error (MSE), and confidence intervals.

Monte Carlo simulation study

To understand the sampling properties of the MLE, we conduct a Monte Carlo simulation study. We repeatedly:

  1. Generate masked data from the true model
  2. Fit the likelihood model to obtain θ̂\hat{\theta}
  3. Compute asymptotic confidence intervals

From the collection of estimates, we can compute:

  • Bias: E[θ̂]θ\text{E}[\hat{\theta}] - \theta
  • Variance: Var[θ̂]\text{Var}[\hat{\theta}]
  • MSE: E[(θ̂θ)2]=Bias2+Var\text{E}[(\hat{\theta} - \theta)^2] = \text{Bias}^2 + \text{Var}
  • Coverage probability: Proportion of CIs containing the true value
set.seed(7231)

# Simulation parameters
B <- 200       # Number of Monte Carlo replications
alpha <- 0.05  # Significance level for CIs

# Storage for results
estimates <- matrix(NA, nrow = B, ncol = m)
se_estimates <- matrix(NA, nrow = B, ncol = m)
ci_lower <- matrix(NA, nrow = B, ncol = m)
ci_upper <- matrix(NA, nrow = B, ncol = m)
converged <- logical(B)
for (b in 1:B) {
    # Generate component lifetimes
    comp_times_b <- matrix(nrow = n, ncol = m)
    for (j in 1:m) {
        comp_times_b[, j] <- rexp(n, theta[j])
    }
    comp_times_b <- md_encode_matrix(comp_times_b, "t")

    # Apply masking pipeline
    data_b <- comp_times_b %>%
        md_series_lifetime_right_censoring(tau) %>%
        md_bernoulli_cand_c1_c2_c3(p) %>%
        md_cand_sampler()

    # Fit model
    tryCatch({
        result_b <- solver(data_b, par = theta0, method = "Nelder-Mead")
        estimates[b, ] <- result_b$theta.hat
        se_estimates[b, ] <- sqrt(diag(result_b$sigma))

        # Asymptotic Wald CIs
        z <- qnorm(1 - alpha/2)
        ci_lower[b, ] <- result_b$theta.hat - z * sqrt(diag(result_b$sigma))
        ci_upper[b, ] <- result_b$theta.hat + z * sqrt(diag(result_b$sigma))
        converged[b] <- result_b$converged
    }, error = function(e) {
        converged[b] <<- FALSE
    })
}

cat("Convergence rate:", mean(converged, na.rm = TRUE), "\n")
#> Convergence rate: 0.99

Bias, Variance, and MSE

# Compute summary statistics (only for converged runs)
valid <- converged & !is.na(estimates[, 1])
est_valid <- estimates[valid, , drop = FALSE]

bias <- colMeans(est_valid) - theta
variance <- apply(est_valid, 2, var)
mse <- bias^2 + variance
rmse <- sqrt(mse)

# Create results table
results_df <- data.frame(
    Component = 1:m,
    True = theta,
    Mean_Est = colMeans(est_valid),
    Bias = bias,
    Variance = variance,
    MSE = mse,
    RMSE = rmse,
    Rel_Bias_Pct = 100 * bias / theta
)

knitr::kable(results_df, digits = 4,
             caption = "Monte Carlo Results: Bias, Variance, and MSE",
             col.names = c("Component", "True θ", "Mean θ̂", "Bias",
                          "Variance", "MSE", "RMSE", "Rel. Bias %"))
Monte Carlo Results: Bias, Variance, and MSE
Component True θ Mean θ̂ Bias Variance MSE RMSE Rel. Bias %
1 1.00 1.0026 0.0026 0.0020 0.0020 0.0453 0.2627
2 1.10 1.0972 -0.0028 0.0023 0.0023 0.0484 -0.2516
3 0.95 0.9461 -0.0039 0.0017 0.0017 0.0414 -0.4146
4 1.15 1.1516 0.0016 0.0019 0.0019 0.0437 0.1421
5 1.10 1.0967 -0.0033 0.0023 0.0023 0.0483 -0.3024

Confidence Interval Coverage

The asymptotic (1α)(1-\alpha)% Wald confidence interval is: θ̂j±z1α/2SE(θ̂j) \hat{\theta}_j \pm z_{1-\alpha/2} \cdot \text{SE}(\hat{\theta}_j)

We assess coverage probability: the proportion of intervals that contain the true parameter value.

# Compute coverage for each component
coverage <- numeric(m)
for (j in 1:m) {
    valid_j <- valid & !is.na(ci_lower[, j]) & !is.na(ci_upper[, j])
    covered <- (ci_lower[valid_j, j] <= theta[j]) & (theta[j] <= ci_upper[valid_j, j])
    coverage[j] <- mean(covered)
}

# Mean CI width
mean_width <- colMeans(ci_upper[valid, ] - ci_lower[valid, ], na.rm = TRUE)

coverage_df <- data.frame(
    Component = 1:m,
    True = theta,
    Coverage = coverage,
    Nominal = 1 - alpha,
    Mean_Width = mean_width
)

knitr::kable(coverage_df, digits = 4,
             caption = paste0("Coverage Probability of ", 100*(1-alpha), "% Confidence Intervals"),
             col.names = c("Component", "True θ", "Coverage", "Nominal", "Mean Width"))
Coverage Probability of 95% Confidence Intervals
Component True θ Coverage Nominal Mean Width
1 1.00 0.9495 0.95 0.1721
2 1.10 0.9343 0.95 0.1770
3 0.95 0.9596 0.95 0.1694
4 1.15 0.9545 0.95 0.1794
5 1.10 0.9444 0.95 0.1769

Sampling Distribution Visualization

# Plot sampling distributions
par(mfrow = c(1, min(m, 5)), mar = c(4, 4, 2, 1))
for (j in 1:min(m, 5)) {
    hist(est_valid[, j], breaks = 20, probability = TRUE,
         main = paste0("Component ", j),
         xlab = expression(hat(theta)[j]),
         col = "lightblue", border = "white")
    abline(v = theta[j], col = "red", lwd = 2, lty = 2)
    abline(v = mean(est_valid[, j]), col = "blue", lwd = 2)
    legend("topright", legend = c("True", "Mean Est."),
           col = c("red", "blue"), lty = c(2, 1), lwd = 2, cex = 0.7)
}

Summary

The Monte Carlo simulation demonstrates that the MLE for the exponential series system with masked data:

  1. Is approximately unbiased - the bias is small relative to the true parameter values
  2. Has well-characterized variance - consistent with asymptotic theory
  3. Achieves nominal coverage - the (1α)(1-\alpha)% confidence intervals contain the true value approximately (1α)(1-\alpha)% of the time

These properties validate the use of this likelihood model for inference about component failure rates from masked series system data.

Sensitivity Analysis

The MLE properties depend on several factors: sample size, masking probability, and right-censoring rate. In this section, we study how these factors affect estimation accuracy.

Effect of Masking Probability

The masking probability pp controls how much information about the failed component is obscured. When p=0p = 0, only the failed component is in the candidate set (perfect information). As pp increases, more non-failed components are included, making estimation more difficult.

set.seed(7231)

# Use smaller sample for sensitivity study
n_sens <- 500
B_sens <- 100

# Masking probabilities to test
p_values <- seq(0, 0.5, by = 0.1)

# Fixed right-censoring (25% censored)
q_sens <- 0.25
tau_sens <- rep(-(1/sum(theta))*log(q_sens), n_sens)

# Storage
mask_results <- list()
for (p_idx in seq_along(p_values)) {
    p_curr <- p_values[p_idx]
    est_p <- matrix(NA, nrow = B_sens, ncol = m)

    for (b in 1:B_sens) {
        # Generate data
        comp_b <- matrix(nrow = n_sens, ncol = m)
        for (j in 1:m) comp_b[, j] <- rexp(n_sens, theta[j])
        comp_b <- md_encode_matrix(comp_b, "t")

        data_b <- comp_b %>%
            md_series_lifetime_right_censoring(tau_sens) %>%
            md_bernoulli_cand_c1_c2_c3(p_curr) %>%
            md_cand_sampler()

        tryCatch({
            fit_b <- solver(data_b, par = theta0, method = "Nelder-Mead")
            if (fit_b$converged) est_p[b, ] <- fit_b$theta.hat
        }, error = function(e) NULL)
    }

    # Compute statistics
    valid_p <- !is.na(est_p[, 1])
    mask_results[[p_idx]] <- list(
        p = p_curr,
        bias = colMeans(est_p[valid_p, , drop = FALSE]) - theta,
        variance = apply(est_p[valid_p, , drop = FALSE], 2, var),
        mse = (colMeans(est_p[valid_p, , drop = FALSE]) - theta)^2 +
              apply(est_p[valid_p, , drop = FALSE], 2, var)
    )
}
# Extract bias and MSE for plotting
bias_mat <- sapply(mask_results, function(x) mean(abs(x$bias)))
mse_mat <- sapply(mask_results, function(x) mean(x$mse))

par(mfrow = c(1, 2), mar = c(4, 4, 2, 1))

# Mean absolute bias vs masking probability
plot(p_values, bias_mat, type = "b", pch = 19, col = "blue",
     xlab = "Masking Probability (p)",
     ylab = "Mean Absolute Bias",
     main = "Bias vs Masking Probability")
grid()

# Mean MSE vs masking probability
plot(p_values, mse_mat, type = "b", pch = 19, col = "red",
     xlab = "Masking Probability (p)",
     ylab = "Mean MSE",
     main = "MSE vs Masking Probability")
grid()

mask_df <- data.frame(
    p = p_values,
    Mean_Abs_Bias = bias_mat,
    Mean_MSE = mse_mat,
    Mean_RMSE = sqrt(mse_mat)
)

knitr::kable(mask_df, digits = 4,
             caption = "Effect of Masking Probability on Estimation Accuracy",
             col.names = c("Masking Prob.", "Mean |Bias|", "Mean MSE", "Mean RMSE"))
Effect of Masking Probability on Estimation Accuracy
Masking Prob. Mean |Bias| Mean MSE Mean RMSE
0.0 0.0133 0.0143 0.1197
0.1 0.0136 0.0186 0.1365
0.2 0.0065 0.0212 0.1456
0.3 0.0194 0.0303 0.1741
0.4 0.0285 0.0389 0.1973
0.5 0.0240 0.0590 0.2430

As expected, increasing the masking probability generally increases both bias and MSE. With p=0p = 0 (no masking of non-failed components), we have the most information and achieve the best estimation accuracy. As pp increases toward 0.5, the candidate sets become less informative, leading to less precise estimates.

Effect of Right-Censoring Rate

Right-censoring reduces the number of exact failure times observed. When a system is right-censored, we know the system survived beyond the censoring time, but we don’t observe the actual failure time or failed component.

set.seed(7231)

# Censoring quantiles (proportion surviving past tau)
# q = 0.1 means 10% survive (heavy censoring)
# q = 0.9 means 90% survive (light censoring)
q_values <- c(0.9, 0.7, 0.5, 0.3, 0.1)  # Survival probabilities

# Fixed masking probability
p_cens <- 0.2

# Storage
cens_results <- list()
for (q_idx in seq_along(q_values)) {
    q_curr <- q_values[q_idx]
    tau_curr <- rep(-(1/sum(theta))*log(q_curr), n_sens)
    est_q <- matrix(NA, nrow = B_sens, ncol = m)

    for (b in 1:B_sens) {
        # Generate data
        comp_b <- matrix(nrow = n_sens, ncol = m)
        for (j in 1:m) comp_b[, j] <- rexp(n_sens, theta[j])
        comp_b <- md_encode_matrix(comp_b, "t")

        data_b <- comp_b %>%
            md_series_lifetime_right_censoring(tau_curr) %>%
            md_bernoulli_cand_c1_c2_c3(p_cens) %>%
            md_cand_sampler()

        tryCatch({
            fit_b <- solver(data_b, par = theta0, method = "Nelder-Mead")
            if (fit_b$converged) est_q[b, ] <- fit_b$theta.hat
        }, error = function(e) NULL)
    }

    # Compute statistics
    valid_q <- !is.na(est_q[, 1])
    cens_rate <- 1 - mean(data_b$delta)  # Actual censoring rate
    cens_results[[q_idx]] <- list(
        q = q_curr,
        cens_rate = cens_rate,
        bias = colMeans(est_q[valid_q, , drop = FALSE]) - theta,
        variance = apply(est_q[valid_q, , drop = FALSE], 2, var),
        mse = (colMeans(est_q[valid_q, , drop = FALSE]) - theta)^2 +
              apply(est_q[valid_q, , drop = FALSE], 2, var)
    )
}
# Extract statistics
cens_rates <- sapply(cens_results, function(x) x$cens_rate)
bias_cens <- sapply(cens_results, function(x) mean(abs(x$bias)))
mse_cens <- sapply(cens_results, function(x) mean(x$mse))

par(mfrow = c(1, 2), mar = c(4, 4, 2, 1))

# Mean absolute bias vs censoring rate
plot(cens_rates * 100, bias_cens, type = "b", pch = 19, col = "blue",
     xlab = "Censoring Rate (%)",
     ylab = "Mean Absolute Bias",
     main = "Bias vs Censoring Rate")
grid()

# Mean MSE vs censoring rate
plot(cens_rates * 100, mse_cens, type = "b", pch = 19, col = "red",
     xlab = "Censoring Rate (%)",
     ylab = "Mean MSE",
     main = "MSE vs Censoring Rate")
grid()

cens_df <- data.frame(
    Cens_Rate_Pct = round(cens_rates * 100, 1),
    Mean_Abs_Bias = bias_cens,
    Mean_MSE = mse_cens,
    Mean_RMSE = sqrt(mse_cens)
)

knitr::kable(cens_df, digits = 4,
             caption = "Effect of Right-Censoring Rate on Estimation Accuracy",
             col.names = c("Censoring %", "Mean |Bias|", "Mean MSE", "Mean RMSE"))
Effect of Right-Censoring Rate on Estimation Accuracy
Censoring % Mean |Bias| Mean MSE Mean RMSE
91.0 0.0556 0.1745 0.4177
68.2 0.0173 0.0568 0.2383
52.2 0.0118 0.0340 0.1843
28.8 0.0183 0.0245 0.1566
7.8 0.0085 0.0171 0.1308

Higher censoring rates (more systems surviving past the observation period) lead to increased bias and MSE. This is expected because:

  1. Censored observations contribute less information to the likelihood
  2. For censored systems, we only know the system survived beyond τ\tau, not which component would have failed
  3. No candidate set information is available for censored observations

Practical Recommendations

Based on these sensitivity analyses:

  1. Sample size: Larger samples (n > 500) generally provide stable, well-behaved estimates
  2. Masking probability: Keep pp as low as practically possible. Even moderate masking (p0.3p \leq 0.3) produces acceptable results with sufficient sample size
  3. Censoring: Heavy censoring (> 50%) significantly degrades estimation quality. Design experiments with adequate follow-up time when possible
  4. Combined effects: When both masking and censoring are high, consider increasing sample size to compensate for information loss