Survival Analysis from Scratch • femtograd

Introduction

This vignette demonstrates femtograd’s compositional philosophy: rather than providing a pre-built survival analysis function, we build one from primitives. This shows how the log-likelihood functions, parameter transforms, and inference tools compose to solve real statistical problems.

We cover:

Fitting a Weibull survival model
Handling right-censored data
Building a custom censored log-likelihood
Adding covariate effects (accelerated failure time model)
Full inference: standard errors, CIs, hypothesis tests

Parametric Survival: The Weibull Model

The Weibull distribution is a workhorse of survival analysis. Its hazard function can model increasing, decreasing, or constant risk:

$k > 1$ : increasing hazard (aging, wear-out)
$k = 1$ : constant hazard (exponential, memoryless)
$k < 1$ : decreasing hazard (infant mortality, burn-in)

Fitting Without Censoring

For complete (uncensored) data, we can use the built-in loglik_weibull():

# Simulate survival times from Weibull(shape=1.5, scale=10)
set.seed(42)
n <- 100
true_shape <- 1.5
true_scale <- 10
times <- rweibull(n, shape = true_shape, scale = true_scale)

# Fit Weibull model (both parameters positive -> log-transform)
weibull_fit <- fit(
  function(log_k, log_lambda) {
    k <- exp(log_k)
    lambda <- exp(log_lambda)
    loglik_weibull(k, lambda, times)
  },
  params = c(log_k = 0, log_lambda = 1)
)

# Estimates on original scale
estimates <- exp(coef(weibull_fit))
names(estimates) <- c("shape", "scale")
estimates
#>    shape    scale 
#> 1.276740 9.838516

The estimates should be close to the true values (shape = 1.5, scale = 10).

summary(weibull_fit)
#> femtofit: Maximum Likelihood Estimation
#> 
#> Call:
#> fit(loglik = function(log_k, log_lambda) {
#>     k <- exp(log_k)
#>     lambda <- exp(log_lambda)
#>     loglik_weibull(k, lambda, times)
#> }, params = c(log_k = 0, log_lambda = 1))
#> 
#> Coefficients:
#>            Estimate Std. Error z value  Pr(>|z|)    
#> log_k      0.244310   0.075491  3.2363  0.001211 ** 
#> log_lambda 2.286305   0.082755 27.6276 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ---
#> Log-likelihood: -316.1    AIC: 636.2
#> Converged in 13 iterations

Handling Right Censoring

In real survival studies, some subjects haven’t experienced the event by the end of observation (right censoring). The log-likelihood for censored data is:

$\ell(\theta) = \sum_{i: \delta_i = 1} \log f(t_i; \theta) + \sum_{i: \delta_i = 0} \log S(t_i; \theta)$

where $\delta_i = 1$ for observed events and $\delta_i = 0$ for censored observations. $f$ is the density and $S$ is the survival function.

For the Weibull with shape $k$ and scale $\lambda$ :

Density: $f(t) = \frac{k}{\lambda}\left(\frac{t}{\lambda}\right)^{k-1} \exp\left[-\left(\frac{t}{\lambda}\right)^k\right]$
Survival: $S(t) = \exp\left[-\left(\frac{t}{\lambda}\right)^k\right]$
Log-survival: $\log S(t) = -\left(\frac{t}{\lambda}\right)^k$

Let’s build a custom censored Weibull log-likelihood:

weibull_censored_loglik <- function(log_k, log_lambda, times, status) {
  k <- exp(log_k)
  lambda <- exp(log_lambda)

  # For observed events: log f(t) = log(k) - k*log(lambda) + (k-1)*log(t)
  #                                 - (t/lambda)^k
  # For censored: log S(t) = -(t/lambda)^k
  #
  # Combined:
  #   event_term: d*log(k) - d*k*log(lambda) + (k-1)*sum_log_t_events
  #   surv_term:  -sum( exp(k * (log(t_i) - log(lambda))) )  [all obs]

  n_events <- sum(status)
  sum_log_t_events <- sum(log(times[status == 1]))

  # Event contribution (only uncensored observations)
  event_term <- log(k) * n_events -
    k * log(lambda) * n_events +
    (k - 1) * sum_log_t_events

  # Survival contribution for ALL observations:
  # (t/lambda)^k = exp(k * (log(t) - log(lambda)))
  # We use this form because log(t) is plain data (numeric),
  # and k, log(lambda) are parameters tracked by AD.
  log_times <- log(times)
  surv_term <- k * 0  # creates a zero of the correct type (value or dual)
  for (i in seq_along(times)) {
    surv_term <- surv_term - exp(k * (log_times[i] - log(lambda)))
  }

  event_term + surv_term
}

Note the use of exp(k * (log(t) - log(lambda))) instead of (t/lambda)^k. This avoids creating temporary val() objects for data, which is important for compatibility with femtograd’s Hessian computation (which uses dual numbers internally).

Simulating Censored Data

set.seed(123)
n <- 150

# True survival times from Weibull
true_times <- rweibull(n, shape = 1.5, scale = 10)

# Censoring times (uniform)
censor_times <- runif(n, min = 5, max = 20)

# Observed data
obs_times <- pmin(true_times, censor_times)
status <- as.integer(true_times <= censor_times)

cat("Events:", sum(status), "/ Censored:", sum(1 - status), "\n")
#> Events: 112 / Censored: 38

Fitting the Censored Model

censored_fit <- fit(
  function(log_k, log_lambda) {
    weibull_censored_loglik(log_k, log_lambda, obs_times, status)
  },
  params = c(log_k = 0, log_lambda = 1)
)

# Estimates on original scale
est <- exp(coef(censored_fit))
names(est) <- c("shape", "scale")
est
#>    shape    scale 
#> 1.600060 9.607953

# CIs on original scale
exp(confint(censored_fit))
#>                2.5%     97.5%
#> log_k      1.371326  1.866947
#> log_lambda 8.557610 10.787214

Testing Shape = 1 (Is Exponential Adequate?)

A natural question: is the simpler exponential model sufficient? This corresponds to testing $H_0: k = 1$ (i.e., $\log k = 0$ ).

# Wald test
wt <- wald_test(censored_fit, parm = "log_k", null_value = 0)
wt
#> Wald Test
#> ---------
#> Parameter: log_k 
#> Estimate: 0.47 
#> Std. Error: 0.07871 
#> Null value: 0 
#> 
#> z-score: 5.972 
#> Wald statistic: 35.67 
#> P-value: 2.343e-09 
#> 
#> Significance: ***

We can also use a likelihood ratio test by fitting the nested exponential model:

# Exponential model (shape fixed at 1, i.e., log_k = 0)
exp_censored_fit <- fit(
  function(log_lambda) {
    weibull_censored_loglik(0, log_lambda, obs_times, status)
  },
  params = c(log_lambda = 1)
)

# LRT
anova(exp_censored_fit, censored_fit)
#> Analysis of Deviance Table (Likelihood Ratio Tests) 
#> 
#>                      Df logLik Df diff Chisq Pr(>Chisq)    
#> exp_censored_fit    1.0 -368.3                             
#> censored_fit        2.0 -353.7       1 29.07   6.98e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Adding Covariates: Accelerated Failure Time Model

The accelerated failure time (AFT) model links covariates to the scale parameter:

$\log(\lambda_i) = \beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip}$

This is straightforward to build from our primitives:

set.seed(42)
n <- 200

# Covariate: treatment group (0 = control, 1 = treatment)
treatment <- rbinom(n, 1, 0.5)

# True model: treatment increases scale (longer survival)
true_scale <- exp(2 + 0.5 * treatment)  # beta0=2, beta1=0.5
true_shape <- 1.5

# Simulate survival times
true_times <- rweibull(n, shape = true_shape, scale = true_scale)
censor_times <- runif(n, min = 5, max = 30)
obs_times2 <- pmin(true_times, censor_times)
status2 <- as.integer(true_times <= censor_times)

cat("Events:", sum(status2), "/ Censored:", sum(1 - status2), "\n")
#> Events: 152 / Censored: 48

# AFT Weibull with treatment covariate
aft_loglik <- function(log_k, beta0, beta1, times, status, x) {
  k <- exp(log_k)
  log_times <- log(times)

  # Per-subject log-scale: log(lambda_i) = beta0 + beta1 * x_i
  n_obs <- length(times)
  ll <- k * 0  # zero of the correct type

  for (i in seq_len(n_obs)) {
    log_lambda_i <- beta0 + beta1 * x[i]

    # log(t/lambda) = log(t) - log(lambda)
    log_ratio <- log_times[i] - log_lambda_i

    # Survival contribution: -(t/lambda)^k = -exp(k * log(t/lambda))
    ll <- ll - exp(k * log_ratio)

    if (status[i] == 1) {
      # Event contribution: log(k) - log(lambda) + (k-1)*log(t/lambda)
      ll <- ll + log(k) - log_lambda_i + (k - 1) * log_ratio
    }
  }
  ll
}

aft_fit <- fit(
  function(log_k, beta0, beta1) {
    aft_loglik(log_k, beta0, beta1, obs_times2, status2, treatment)
  },
  params = c(log_k = 0, beta0 = 1, beta1 = 0)
)

summary(aft_fit)
#> femtofit: Maximum Likelihood Estimation
#> 
#> Call:
#> fit(loglik = function(log_k, beta0, beta1) {
#>     aft_loglik(log_k, beta0, beta1, obs_times2, status2, treatment)
#> }, params = c(log_k = 0, beta0 = 1, beta1 = 0))
#> 
#> Coefficients:
#>       Estimate Std. Error z value  Pr(>|z|)    
#> log_k 0.419418   0.066589  6.2986 3.004e-10 ***
#> beta0 2.062984   0.073822 27.9453 < 2.2e-16 ***
#> beta1 0.502710   0.107967  4.6562 3.222e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ---
#> Log-likelihood: -485.4    AIC: 976.8
#> Converged in 19 iterations

Interpreting the Treatment Effect

# Shape parameter
cat("Shape (k):", exp(coef(aft_fit)["log_k"]), "\n")
#> Shape (k): 1.521076

# Treatment effect: exp(beta1) is the scale ratio
cat("Scale ratio (treatment/control):", exp(coef(aft_fit)["beta1"]), "\n")
#> Scale ratio (treatment/control): 1.653195

# 95% CI for treatment effect
beta1_ci <- confint(aft_fit, parm = "beta1")
cat("95% CI for beta1:", beta1_ci, "\n")
#> 95% CI for beta1: 0.291099 0.7143205
cat("95% CI for scale ratio:", exp(beta1_ci), "\n")
#> 95% CI for scale ratio: 1.337897 2.042798

Is Treatment Significant?

# Wald test
wald_test(aft_fit, parm = "beta1", null_value = 0)
#> Wald Test
#> ---------
#> Parameter: beta1 
#> Estimate: 0.5027 
#> Std. Error: 0.108 
#> Null value: 0 
#> 
#> z-score: 4.656 
#> Wald statistic: 21.68 
#> P-value: 3.222e-06 
#> 
#> Significance: ***

Summary

This vignette demonstrated femtograd’s compositional approach:

Built-in primitives (loglik_weibull, exp, log) handle the uncensored case directly
Custom log-likelihoods are straightforward to write—just compose operations on value objects
Parameter transforms (exp(log_k)) handle constraints naturally
Inference tools (summary, confint, wald_test, anova) work identically on custom models
Covariate models (AFT) are built by linking parameters to covariates inside the likelihood

The same approach extends to other survival models (log-normal, log-logistic, piecewise exponential) and other statistical domains—define a log-likelihood from femtograd’s differentiable operations, and the full inference pipeline follows automatically.