Skip to main content
← All Series

Statistical Reliability

Maximum likelihood estimation for series systems with masked failure data — from master's thesis to R package ecosystem

21 parts

When a system fails, you want to know which component caused the failure. But in practice you often can’t tell. The failure cause is masked. And some systems haven’t failed yet when testing ends—they’re right-censored. How do you estimate component reliability from this incomplete information?

That’s the problem at the center of my master’s thesis and the ecosystem of R packages, papers, and blog posts in this series.

The Problem

A series system fails when any of its components fails. Given system-level failure times where:

  • Some observations are right-censored (the system was still running when we stopped watching)
  • Some failures are masked (we know the system failed, but not which component caused it)

…estimate the lifetime distribution of each component.

The mathematical framework combines survival analysis, mixture models, the EM algorithm, and bootstrap methods into a coherent likelihood-based approach. The key insight: even incomplete observations carry statistical information, and maximum likelihood estimation can extract it.

The Journey

2019: I started exploring reliability analysis with censored data, drawn to the elegance of how maximum likelihood handles incomplete observations. This became the seed of a research direction.

2020: I enrolled in a second master’s degree—Mathematics and Statistics at SIUE—because I needed deeper foundations. I could use statistical methods, but I couldn’t derive them. I wanted to understand why MLE works, not just that it works.

2021: Bootstrap methods became central to the work. When analytical confidence intervals are intractable (as they are for Weibull series systems with masked data), the bootstrap provides a computational path to uncertainty quantification.

2022: The Weibull distribution took on personal significance. I was studying the mathematics of failure and survival for my thesis when I was diagnosed with cancer. The survival curves I was analyzing became uncomfortably concrete.

2023: Thesis defended. Reliability Estimation in Series Systems: Maximum Likelihood Techniques for Right-Censored and Masked Failure Data. Three years of work, completed while navigating treatment.

2024–2026: The thesis spawned an ecosystem of R packages, follow-up papers on model selection and relaxed masking conditions, and closed-form results for exponential special cases.

Key Contributions

EM Algorithm for Masked Series Systems: An expectation-maximization approach that treats the masked failure cause as a latent variable, iteratively estimating component parameters from system-level data.

Bootstrap Confidence Intervals: When the Fisher information matrix is analytically intractable (Weibull case), parametric bootstrap provides reliable interval estimates.

Closed-Form Fisher Information: For exponential series systems with masked data, exact analytical expressions for the Fisher information matrix—enabling direct confidence interval computation without simulation.

Relaxed Masking Conditions: The mdrelax extension drops the standard assumption that masking probabilities are uniform, handling real-world scenarios where some components are more easily identified as failure causes.

Model Selection: AIC/BIC-based model selection for choosing between exponential, Weibull, and other lifetime distributions in the series system context.

The R Package Ecosystem

The research produced a layered ecosystem of R packages, each handling a distinct concern:

hypothesize              — Consistent API for statistical hypothesis testing
algebraic.mle            — Core MLE algebra: compose, transform, combine likelihoods
algebraic.dist           — Distributions as algebraic objects with composable operations
    ↑                            ↑
dfr.dist                 — Specify distributions via hazard (failure rate) functions
    ↑                            ↑
dfr.dist.series          — Series system distributions from hazard specifications
likelihood.model         — Composable statistical inference framework
    ↑                            ↑
compositional.mle        — Composable MLE pipelines (SICP-inspired)
likelihood.model.series.md  — Series system models with masked/censored data
    ↑                            ↑
wei.series.md.c1.c2.c3      │  dfr.lik.series.md — Arbitrary hazard series models
    ↑                            ↑
nabla                    — Automatic differentiation (exact FIM, skewness, etc.)

The design principle: each package solves one problem well. algebraic.dist treats distributions as algebraic objects. algebraic.mle provides the algebra for combining likelihood contributions. likelihood.model composes these into inference pipelines. The series system packages sit at the top, using all the layers below.

What This Series Contains

Core narrative: The origin of the research, the decision to pursue a second master’s, the thesis itself, and reflections on completing it during cancer treatment.

Technical deep-dives: Closed-form Fisher information for exponential systems, Weibull model selection, numerical MLE optimization, and relaxed masking conditions.

Package announcements: Each R package in the ecosystem—its design, API, and role in the larger architecture.

The Constraint That Clarifies

Completing a mathematics master’s while fighting cancer wasn’t heroic—it was stubborn. But the constraint clarified priorities: finish the work, make it reproducible, package it so others can use it. Every R package, every paper, every blog post in this series exists because the math was worth getting right, and the tools were worth sharing.

The Weibull distribution doesn’t care about your prognosis. It just models time-to-failure, honestly. There’s something clarifying about that.

Posts in this Series

Showing 21 of 21 posts