algebraic.dist: Treating Distributions as First-Class Algebraic Objects in R
An R package for treating probability distributions as first-class algebraic objects that compose through standard operations.
Maximum likelihood estimation for series systems with masked failure data — from master's thesis to R package ecosystem
When a system fails, you want to know which component caused the failure. But in practice you often can’t tell. The failure cause is masked. And some systems haven’t failed yet when testing ends—they’re right-censored. How do you estimate component reliability from this incomplete information?
That’s the problem at the center of my master’s thesis and the ecosystem of R packages, papers, and blog posts in this series.
A series system fails when any of its components fails. Given system-level failure times where:
…estimate the lifetime distribution of each component.
The mathematical framework combines survival analysis, mixture models, the EM algorithm, and bootstrap methods into a coherent likelihood-based approach. The key insight: even incomplete observations carry statistical information, and maximum likelihood estimation can extract it.
2019: I started exploring reliability analysis with censored data, drawn to the elegance of how maximum likelihood handles incomplete observations. This became the seed of a research direction.
2020: I enrolled in a second master’s degree—Mathematics and Statistics at SIUE—because I needed deeper foundations. I could use statistical methods, but I couldn’t derive them. I wanted to understand why MLE works, not just that it works.
2021: Bootstrap methods became central to the work. When analytical confidence intervals are intractable (as they are for Weibull series systems with masked data), the bootstrap provides a computational path to uncertainty quantification.
2022: The Weibull distribution took on personal significance. I was studying the mathematics of failure and survival for my thesis when I was diagnosed with cancer. The survival curves I was analyzing became uncomfortably concrete.
2023: Thesis defended. Reliability Estimation in Series Systems: Maximum Likelihood Techniques for Right-Censored and Masked Failure Data. Three years of work, completed while navigating treatment.
2024–2026: The thesis spawned an ecosystem of R packages, follow-up papers on model selection and relaxed masking conditions, and closed-form results for exponential special cases.
EM Algorithm for Masked Series Systems: An expectation-maximization approach that treats the masked failure cause as a latent variable, iteratively estimating component parameters from system-level data.
Bootstrap Confidence Intervals: When the Fisher information matrix is analytically intractable (Weibull case), parametric bootstrap provides reliable interval estimates.
Closed-Form Fisher Information: For exponential series systems with masked data, exact analytical expressions for the Fisher information matrix—enabling direct confidence interval computation without simulation.
Relaxed Masking Conditions: The mdrelax extension drops the standard assumption that masking probabilities are uniform, handling real-world scenarios where some components are more easily identified as failure causes.
Model Selection: AIC/BIC-based model selection for choosing between exponential, Weibull, and other lifetime distributions in the series system context.
The research produced a layered ecosystem of R packages, each handling a distinct concern:
hypothesize — Consistent API for statistical hypothesis testing
↑
algebraic.mle — Core MLE algebra: compose, transform, combine likelihoods
↑
algebraic.dist — Distributions as algebraic objects with composable operations
↑ ↑
dfr.dist — Specify distributions via hazard (failure rate) functions
↑ ↑
dfr.dist.series — Series system distributions from hazard specifications
↑
likelihood.model — Composable statistical inference framework
↑ ↑
compositional.mle — Composable MLE pipelines (SICP-inspired)
↑
likelihood.model.series.md — Series system models with masked/censored data
↑ ↑
wei.series.md.c1.c2.c3 │ dfr.lik.series.md — Arbitrary hazard series models
↑ ↑
nabla — Automatic differentiation (exact FIM, skewness, etc.)
The design principle: each package solves one problem well. algebraic.dist treats distributions as algebraic objects. algebraic.mle provides the algebra for combining likelihood contributions. likelihood.model composes these into inference pipelines. The series system packages sit at the top, using all the layers below.
Core narrative: The origin of the research, the decision to pursue a second master’s, the thesis itself, and reflections on completing it during cancer treatment.
Technical deep-dives: Closed-form Fisher information for exponential systems, Weibull model selection, numerical MLE optimization, and relaxed masking conditions.
Package announcements: Each R package in the ecosystem—its design, API, and role in the larger architecture.
Completing a mathematics master’s while fighting cancer wasn’t heroic—it was stubborn. But the constraint clarified priorities: finish the work, make it reproducible, package it so others can use it. Every R package, every paper, every blog post in this series exists because the math was worth getting right, and the tools were worth sharing.
The Weibull distribution doesn’t care about your prognosis. It just models time-to-failure, honestly. There’s something clarifying about that.
Maximum likelihood estimation for series system reliability with Weibull components under right-censoring and masked failure data
Explore project →An R package for treating probability distributions as first-class algebraic objects that compose through standard operations.
An R package where solvers are first-class functions that compose through chaining, racing, and restarts.
Announcing the likelihood.model.series.md R package for maximum likelihood estimation in series systems with masked component failures—built on composable likelihood contributions, validated through extensive simulation, and heading to CRAN.
Announcing observation functors in likelihood.model.series.md — composable functions that separate the data-generating process from the observation mechanism, enabling mixed-censoring simulation and verified Monte Carlo studies.
Overview of my master's project on maximum likelihood estimation for series systems with right-censored and masked failure data.
My R package for hypothesis testing, hypothesize, is now available on CRAN.
Extending masked failure data analysis when traditional C1-C2-C3 conditions are violated.
When can reliability engineers safely use simpler models? This paper provides sharp boundaries through likelihood ratio tests on Weibull series systems.
Deriving closed-form maximum likelihood estimators and Fisher information for exponential series systems with masked failure data.
Maximum likelihood estimation of component reliability from masked failure data in series systems, with BCa bootstrap confidence intervals validated through extensive simulation studies.
Post-mortem on completing a mathematics master's degree over three years while navigating cancer treatment—what worked, what didn't, and lessons learned.
Numerical approaches to solving maximum likelihood estimation problems.
A generic R framework for composable likelihood models as first-class objects, designed for seamless maximum likelihood estimation.
The mathematics of Weibull distributions for modeling time-to-failure in both reliability engineering and cancer survival analysis.
An R package providing a unified API for hypothesis testing, so every test returns the same consistent interface.
Bootstrap resampling methods as the intersection of rigorous statistical theory and brute-force computation for approximating sampling distributions.
An R package that lets you specify hazard functions directly instead of choosing from a catalog of named distributions.
An R package treating MLEs as first-class algebraic objects with composable statistical properties.
Building R packages for statistical inference, leveraging R's domain-specific strengths for computational statistics and literate programming.
Why I chose to pursue a second master's in Mathematics and Statistics after my CS degree—seeking deeper foundations for statistical theory and inference.
Introduction to reliability analysis with censored data, where observations are incomplete but statistically informative.