In the likelihood model, we assume rather that Ci | Ti = t, Ki = k satisfies certain assumptions. For tractability and efficiency, we're discarding the conditional relation on Ti and only keeping the conditional relation on Ki by using the empirical distribution of Ci | Ki = k.

md_sample_candidates(df, n, k = NA, cause = "k", candset = "x")

Arguments

df

masked data frame

n

number of candidate sets to sample

k

component cause of failure, defaults to NA (unknown)

cause

column name for component cause of failure, defaults to k

candset

column prefix for candidate sets, defaults to x, e.g., x1, x2, x3.

Value

a data frame of candidate sets sampled from the empirical distribution of Ci | Ki = k

Details

If k is not known, then we can sample from the empirical distribution of Ci, which may still be reasonable, since by Condition 2, PrCi = ci | Ti = ti, Ki = j = PrCi = ci | Ti = ti, Ki = j' for all j, j' in ci, and so in general varying the component cause among the components in the candidate set does not change the probability of the candidate set.

If the model is more informed, then dropping the conditional relation on Ki will lose information, and may bias the results in complex ways.