Last week I announced likelihood.model.series.md, the R package that estimates component reliability from masked series system failures. That post covered the three likelihood models and the path to CRAN.
This post is about what happened next: the package now supports four observation types — exact, right-censored, left-censored, and interval-censored — via composable observation functors. Along the way, I wrote four vignettes, removed the md.tools dependency, and developed a verification methodology for keeping prose honest about simulation results.
The Problem: Hardcoded Right-Censoring
The original rdata() method for each model had a simple observation mechanism built in: if a system failed before the study end time \(\tau\), you saw the exact failure time; if it survived past \(\tau\), you got a right-censored observation. Two outcomes, one knob.
# The old way: right-censoring was the only option
gen <- rdata(model)
df <- gen(theta = c(1.0, 1.1, 0.95), n = 500, tau = 3, p = 0.3)
# Every observation is either omega = "exact" or omega = "right"
This is fine for textbook problems, but real monitoring produces other kinds of incomplete data:
Left-censoring: You inspect a system once. If it’s already failed, you know it died before the inspection — but not when. Single-inspection monitoring of remote equipment works this way.
Interval-censoring: You inspect periodically. A system that was running at inspection \(k\) but failed by inspection \(k+1\) gives you a bracket \([t_k, t_{k+1}]\) containing the failure time. This is the norm in industrial maintenance, medical follow-up studies, and field reliability testing.
Mixed monitoring: Different units in the same study may be observed under different schemes. Some get continuous monitoring (exact/right), some get periodic inspections (interval), some get a single check (left).
Hardcoding the observation mechanism into the data-generating process conflates two orthogonal concerns: how components fail (the DGP, governed by the lifetime distribution) and what we see (the observation mechanism, governed by the monitoring protocol). These should be separate.
Observation Functors
The solution is a small algebra of observation functors — closures that transform true failure times into observed data. Each functor has the same signature:
# Signature: function(t_true) -> list(t, omega, t_upper)
Given the true failure time t_true, the functor returns what the observer actually sees: the recorded time t, the observation type omega (one of "exact", "right", "left", "interval"), and an upper bound t_upper for interval observations.
Four primitives:
# Exact observation if t < tau, right-censored otherwise
observe_right_censor(tau = 100)
# Single inspection: left-censored if failed, right-censored if running
observe_left_censor(tau = 50)
# Periodic inspection: interval-censored in [floor, ceiling]
observe_periodic(delta = 10, tau = 100)
# Randomly select a scheme per observation
observe_mixture(
observe_right_censor(tau = 100),
observe_left_censor(tau = 50),
observe_periodic(delta = 10, tau = 100),
weights = c(0.5, 0.2, 0.3)
)
The key design property: functors compose. observe_mixture takes functors as arguments and returns a new functor with the same signature. You can nest mixtures, combine any primitives, or write your own functor — anything with signature function(t_true) -> list(t, omega, t_upper) works.
This draws from the same compositional philosophy behind compositional.mle and the broader ecosystem: define small, well-typed building blocks; compose them into complex behavior. The SICP maxim applies — the means of combination should produce objects of the same type as the things being combined.
Using Functors with rdata()
The rdata() method on every model now accepts an observe parameter:
model <- exp_series_md_c1_c2_c3()
gen <- rdata(model)
# Right-censoring (default, backwards-compatible)
df_right <- gen(theta, n = 500, tau = 5, p = 0.3)
# Periodic inspection every 0.5 time units
df_interval <- gen(theta, n = 500, p = 0.3,
observe = observe_periodic(delta = 0.5, tau = 5))
# Mixed monitoring environment
df_mixed <- gen(theta, n = 500, p = 0.3,
observe = observe_mixture(
observe_right_censor(tau = 5),
observe_left_censor(tau = 3),
observe_periodic(delta = 0.5, tau = 5),
weights = c(0.6, 0.2, 0.2)
))
When observe is NULL (the default), rdata() uses observe_right_censor(tau) for full backwards compatibility. No existing code breaks.
What the Likelihood Sees
Each observation type contributes differently to the log-likelihood. Under the C1-C2-C3 masking conditions:
| Type | Log-likelihood contribution |
|---|---|
| Exact | \(\log\bigl(\sum_{j \in C_i} h_j(t_i)\bigr) - H(t_i)\) |
| Right | \(-H(t_i)\) |
| Left | \(\log\bigl(\sum_{j \in C_i} \int_0^{t_i} h_j(u)R(u)\,du\bigr) - \log(1 - R(t_i))\) |
| Interval | \(\log\bigl(\sum_{j \in C_i} \int_a^b h_j(u)R(u)\,du\bigr) - \log(R(a) - R(b))\) |
Here \(h_j\) is component \(j\)’s hazard, \(H = \sum_j H_j\) is the system cumulative hazard, and \(R = \exp(-H)\) is the system reliability function. Exact observations carry the most information (you see the hazard directly). Right-censored observations say only “survived this long.” Left and interval observations fall between these extremes.
The three models handle these integrals differently:
- Exponential: all four types have closed-form expressions for loglik, score, and Hessian
- Homogeneous Weibull: closed-form loglik for all types via \(w_c\) weights (shared shape simplifies the integral); numerical Hessian via
numDeriv::jacobian - Heterogeneous Weibull: closed-form for exact/right;
stats::integratefor left/interval; numerical Hessian
Four Vignettes
The observation functor system needed validation across all three models and multiple censoring configurations. This produced four vignettes — each a complete Monte Carlo study with precomputed results.
Exponential Series
The exponential series vignette is the tutorial entry point. It walks through the full pipeline: model construction, data generation, MLE fitting, and Monte Carlo simulation under all four observation types.
The exponential case is special because everything is analytical. The score and Hessian have closed-form expressions for all observation types, which means the MLE converges fast and asymptotic standard errors are exact (no numerical differentiation). This makes it the ideal test bed for validating the functor system — any discrepancy between analytical and numerical results points to a bug in the functor, not in a numerical approximation.
The vignette also covers masking probability sensitivity: as the masking probability \(p\) decreases (more masking), RMSE increases and confidence interval coverage degrades. This is the fundamental tradeoff in masked data — information loss from incomplete cause identification.
Homogeneous Weibull
The homogeneous Weibull vignette studies the shared-shape model across three hazard regimes:
- DFR (decreasing failure rate, \(k < 1\)): Early failures dominate. Common in burn-in periods and infant mortality.
- CFR (constant failure rate, \(k = 1\)): Reduces to exponential. Memoryless.
- IFR (increasing failure rate, \(k > 1\)): Wear-out dominates. The typical aging process.
The mathematical insight: when all Weibull components share shape \(k\), the system lifetime is also Weibull with shape \(k\) and system scale \(\beta_s = (\sum_j \beta_j^{-k})^{-1/k}\). This means the left- and interval-censored likelihood contributions have closed-form expressions using weight factors \(w_c = \beta_c^{-k} / \sum_j \beta_j^{-k}\) — no numerical integration needed. The shared shape is a strong assumption, but when it holds, estimation is remarkably efficient.
Heterogeneous Weibull
The heterogeneous Weibull vignette drops the shared-shape assumption. Each component gets its own shape \(k_j\) and scale \(\beta_j\), doubling the parameter count to \(2m\).
With mixed shapes, the system lifetime is not Weibull — it’s a competing-risks minimum of Weibull variates with different shape parameters. The left- and interval-censored contributions require numerical integration via stats::integrate. This is slower but handles the realistic case where components age at different rates (e.g., a capacitor with IFR alongside a solder joint with DFR).
Censoring Comparison
The censoring comparison vignette is the cross-cutting analysis. It runs the exponential model under five monitoring configurations:
| Config | Description |
|---|---|
| A | Exact + right (continuous monitoring) |
| B | Left + right (single inspection) |
| C | Interval + right (periodic, wide intervals) |
| D | Interval + right (periodic, narrow intervals) |
| E | Mixed (all observation types) |
The results confirm the expected information ordering: exact > narrow-interval > wide-interval > left \(\approx\) right. Exact observations carry the most information because you see the failure time precisely. Narrow intervals are close to exact. Left-censored data is nearly as uninformative as right-censored — knowing only “it failed before \(\tau\)” is almost as vague as “it survived past \(\tau\).”
The practical takeaway: if you’re designing a monitoring protocol and can choose between a single end-of-study inspection (left-censoring) and periodic inspections (interval-censoring), periodic inspections are strictly better for parameter estimation. The frequency of inspection trades off against cost — narrower intervals give better estimates but require more inspections.
Verified Prose
Writing prose about simulation results is easy to get wrong. You run a Monte Carlo study, get an RMSE of 0.042, then write “RMSE ranges from 0.035 to 0.045.” Later someone changes the simulation seed or sample size, and the prose is silently wrong.
I developed a verification methodology for this package: every numerical claim in vignette commentary is cross-checked against the precomputed .rds data.
The workflow:
- Extract values:
readRDS("vignettes/precomputed_results.rds") - Compute metrics from the raw simulation output
- Round consistently (3 decimal places for RMSE, 1 for percentages)
- Only then write the prose
This caught several errors during the vignette audit. An RMSE range was off by 0.001. A coverage percentage rounded the wrong direction. A score comparison paragraph claimed “near zero” when the actual values were order \(10^{-6}\) — technically true, but not the same as “precisely zero” which is what holds at the analytical MLE.
The .rds files have different structures across vignettes (the exponential data is a flat list; the homogeneous Weibull data is nested by scenario; the censoring comparison has pre-aggregated summaries), so the verification code isn’t one-size-fits-all. But the principle is: no prose about numbers without first reading the numbers.
md.tools Removal
The md.tools package provided four utility functions for encoding and decoding masked data matrices:
md_decode_matrix: Extract prefixed columns (e.g.,x1, x2, x3) as a matrixmd_encode_matrix: Convert matrix to prefixed columns in a data framemd_mark_latent: Tag columns as latent (for display purposes)md_boolean_matrix_to_charsets: Convert Boolean columns to set notation ({1, 3})
These are small functions — the largest is about 10 lines. Carrying a GitHub remote dependency for 40 lines of base R was unnecessary friction for installation and CRAN submission.
I internalized all four into R/md_compat.R, keeping the same function names and signatures. The package now has three GitHub remotes (down from four):
- algebraic.mle — MLE algebra
- likelihood.model — inference framework
- algebraic.dist — distribution utilities
Zero non-base R dependencies for the core matrix encoding/decoding operations. The generics package for fit is the only remaining external import beyond the ecosystem packages.
Testing
The observation functor system is validated by 798 tests at 99.45% line coverage. The test suite mirrors the source file structure:
test-observe.R: Unit tests for each primitive functor (boundary conditions, output format)test-censoring.R: Integration tests for the full pipeline (generate data → fit model → check estimates)test-exp_series_md_c1_c2_c3.R,test-wei_series_*.R: Model-specific tests for all four observation types
The censoring integration tests are particularly important: they verify that the loglik, score, and Hessian functions handle mixed observation types correctly — a dataset with some exact, some right, some left, and some interval observations should produce a valid log-likelihood that MLE can optimize.
What’s Next
CRAN submission. The package is R CMD check-clean with no notes or warnings. Once likelihood.model is accepted, this package follows immediately.
Arbitrary hazard functions. The current models assume exponential or Weibull components. The dfr.lik.series.md extension will support components specified by arbitrary hazard functions via dfr.dist — define a component by its hazard rate and get likelihood contributions automatically.
Relaxed masking. The C2 condition (uniform masking within candidate sets) is often unrealistic. mdrelax extends the model to handle non-uniform masking probabilities, but its integration with the observation functor system is not yet complete.
observe_mixture as a model for heterogeneous monitoring. The current mixture functor randomly assigns observation schemes. A natural extension is to let the monitoring scheme depend on covariates — different sites, different equipment classes, different maintenance budgets.
Resources:
Discussion