Statistical Inference with femtograd • femtograd

Overview

After fitting a model with fit(), femtograd provides a full suite of inference tools:

Standard errors from the observed Fisher information
Wald confidence intervals and hypothesis tests
Likelihood ratio tests for nested model comparison
Profile likelihood confidence intervals
Bootstrap inference for non-asymptotic settings
Model diagnostics to verify inference reliability

This vignette demonstrates each approach using a running example.

Running Example

We’ll fit a gamma distribution to waiting time data:

set.seed(42)
waiting <- rgamma(150, shape = 3, rate = 0.5)

# Fit gamma model (both parameters positive, so use log-transform)
gamma_fit <- fit(
  function(log_shape, log_rate) {
    shape <- exp(log_shape)
    rate <- exp(log_rate)
    loglik_gamma(shape, rate, waiting)
  },
  params = c(log_shape = 0, log_rate = 0)
)

summary(gamma_fit)
#> femtofit: Maximum Likelihood Estimation
#> 
#> Call:
#> fit(loglik = function(log_shape, log_rate) {
#>     shape <- exp(log_shape)
#>     rate <- exp(log_rate)
#>     loglik_gamma(shape, rate, waiting)
#> }, params = c(log_shape = 0, log_rate = 0))
#> 
#> Coefficients:
#>           Estimate Std. Error z value  Pr(>|z|)    
#> log_shape  1.07908    0.10956  9.8496 < 2.2e-16 ***
#> log_rate  -0.69490    0.11945 -5.8174 5.976e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ---
#> Log-likelihood: -379.5    AIC: 763
#> Converged in 11 iterations

The estimates are on the log scale. To interpret, exponentiate:

exp(coef(gamma_fit))
#> log_shape  log_rate 
#> 2.9419787 0.4991246

Wald Inference

Standard Errors

Standard errors come from the inverse of the observed Fisher information matrix (the negative Hessian of the log-likelihood at the MLE):

$\text{SE}(\hat{\theta}_j) = \sqrt{[\mathcal{I}(\hat{\theta})^{-1}]_{jj}}$

se(gamma_fit)
#> log_shape  log_rate 
#> 0.1095563 0.1194514

Confidence Intervals

Wald confidence intervals assume asymptotic normality of the MLE:

$\hat{\theta}_j \pm z_{\alpha/2} \cdot \text{SE}(\hat{\theta}_j)$

# 95% CIs (default)
confint(gamma_fit)
#>                 2.5%      97.5%
#> log_shape  0.8643560  1.2938088
#> log_rate  -0.9290199 -0.4607792

# 99% CIs
confint(gamma_fit, level = 0.99)
#>                 0.5%      99.5%
#> log_shape  0.7968841  1.3612807
#> log_rate  -1.0025859 -0.3872132

These are CIs for the log-transformed parameters. For CIs on the original scale, exponentiate the endpoints (this works because exp is monotone):

exp(confint(gamma_fit))
#>                2.5%    97.5%
#> log_shape 2.3734771 3.646649
#> log_rate  0.3949406 0.630792

Wald Test

Test whether a parameter equals a hypothesized value:

# Test H0: log_shape = 0 (i.e., shape = 1, which is exponential)
wt <- wald_test(gamma_fit, parm = "log_shape", null_value = 0)
wt
#> Wald Test
#> ---------
#> Parameter: log_shape 
#> Estimate: 1.079 
#> Std. Error: 0.1096 
#> Null value: 0 
#> 
#> z-score: 9.85 
#> Wald statistic: 97.01 
#> P-value: 6.884e-23 
#> 
#> Significance: ***

Access individual components:

test_stat(wt)
#> [1] 97.01404
pval(wt)
#> [1] 6.883758e-23
is_significant_at(wt, alpha = 0.05)
#> [1] TRUE

Likelihood Ratio Tests

For comparing nested models, the likelihood ratio test (LRT) is often more powerful than the Wald test. The test statistic is:

$\Lambda = -2[\ell(\hat{\theta}_0) - \ell(\hat{\theta}_1)] \sim \chi^2_{df}$

where $df$ is the difference in number of parameters.

# Null model: exponential (shape = 1, equivalent to log_shape = 0)
exp_fit <- fit(
  function(log_rate) {
    rate <- exp(log_rate)
    loglik_exponential(rate, waiting)
  },
  params = c(log_rate = 0)
)

# LRT: Is gamma significantly better than exponential?
lr <- lrt(exp_fit, gamma_fit)
lr
#> Likelihood Ratio Test
#> ---------------------
#> LRT statistic: 73.15 
#> Degrees of freedom: 1 
#> P-value: 1.2e-17 
#> 
#> Null log-likelihood: -416.1 
#> Alt log-likelihood: -379.5 
#> 
#> Significance: ***

The anova() function provides the same test in a familiar format:

anova(exp_fit, gamma_fit)
#> Analysis of Deviance Table (Likelihood Ratio Tests) 
#> 
#>               Df logLik Df diff Chisq Pr(>Chisq)    
#> exp_fit      1.0 -416.1                             
#> gamma_fit    2.0 -379.5       1 73.15     <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model Comparison

When models are not nested, use information criteria instead:

# Compare gamma vs exponential
compare(exp_fit, gamma_fit, names = c("Exponential", "Gamma"))
#> Model Comparison (ranked by AIC)
#> 
#>        Model df  logLik    AIC BIC  dAIC weight
#>        Gamma  2 -379.52 763.04  NA  0.00      1
#>  Exponential  1 -416.10 834.19  NA 71.15      0

The dAIC column shows how much worse each model is than the best. AIC weights give the relative probability that each model is best (assuming one of them is true).

Profile Likelihood

Profile likelihood provides confidence intervals that don’t rely on the normality assumption. They are based on the likelihood ratio:

$\{\ \theta_j : 2[\ell(\hat{\theta}) - \ell_P(\theta_j)] \leq \chi^2_{1, \alpha} \}$

# Profile likelihood for log_shape
prof <- profile_loglik(gamma_fit, parm = "log_shape")
#> Warning in profile_loglik(gamma_fit, parm = "log_shape"): Full profile
#> likelihood requires storing the original objective. Using quadratic
#> approximation based on Hessian.
prof
#> Profile Likelihood
#> ------------------
#> Parameter: log_shape 
#> MLE: 1.079 
#> Max log-likelihood: -379.5 
#> Grid: 20 points from 0.7504 to 1.408 
#> 
#> Note: Using quadratic approximation based on Hessian

# Profile-based confidence intervals
suppressWarnings(confint_profile(gamma_fit))
#>                 2.5%      97.5%
#> log_shape  0.9937349  1.1644299
#> log_rate  -0.7879556 -0.6018435

Profile CIs are generally preferred over Wald CIs for small samples or when the likelihood surface is asymmetric.

Bootstrap Inference

For situations where asymptotic theory may not hold (small samples, boundary parameters, complex models), bootstrap inference provides a nonparametric alternative.

The bootstrap_fit() function uses the loglik_maker pattern: a function that takes data and returns a log-likelihood function.

boot <- bootstrap_fit(
  loglik_maker = function(d) {
    function(log_shape, log_rate) {
      shape <- exp(log_shape)
      rate <- exp(log_rate)
      loglik_gamma(shape, rate, d)
    }
  },
  data = waiting,
  params = c(log_shape = 0, log_rate = 0),
  n_boot = 200,
  progress = FALSE
)

boot
#> Bootstrap Results
#> -----------------
#> Bootstrap samples: 200 of 200 (100.0% success)
#> 
#>           Original Boot.SE      Bias
#> log_shape   1.0791 0.09486 -0.005927
#> log_rate   -0.6949 0.09954 -0.007078

Bootstrap confidence intervals come in three flavors:

# Percentile method (most common)
confint(boot, type = "percentile")
#>                 2.5%      97.5%
#> log_shape  0.8928334  1.2699327
#> log_rate  -0.8996152 -0.4769706

# Basic bootstrap
confint(boot, type = "basic")
#>                 2.5%      97.5%
#> log_shape  0.8882321  1.2653314
#> log_rate  -0.9128285 -0.4901839

# Normal approximation (uses bootstrap SE)
confint(boot, type = "normal")
#>                 2.5%      97.5%
#> log_shape  0.8931660  1.2649988
#> log_rate  -0.8899866 -0.4998125

Model Diagnostics

Before trusting inference results, verify that the model is well-behaved:

# Check Hessian properties
check_hessian(gamma_fit)
#> Hessian Diagnostics
#> -------------------
#> Negative definite: TRUE 
#> Condition number: 2.333274e+01 
#> Rank: 2 of 2 
#> Singular: FALSE 
#> 
#> No issues detected.

# Check optimization convergence
check_convergence(gamma_fit)
#> Convergence Diagnostics
#> -----------------------
#> Converged: TRUE 
#> Iterations: 11 
#> Gradient norm: 9.558e-09 
#> Gradient OK: TRUE (tolerance: 1e-04 )
#> Log-likelihood: -379.5 
#> 
#> No issues detected.

# Full diagnostics
diagnostics(gamma_fit)
#> Model Diagnostics
#> =================
#> 
#> Overall: OK
#> 
#> Hessian Diagnostics
#> -------------------
#> Negative definite: TRUE 
#> Condition number: 2.333274e+01 
#> Rank: 2 of 2 
#> Singular: FALSE 
#> 
#> No issues detected.
#> 
#> Convergence Diagnostics
#> -----------------------
#> Converged: TRUE 
#> Iterations: 11 
#> Gradient norm: 9.558e-09 
#> Gradient OK: TRUE (tolerance: 1e-04 )
#> Log-likelihood: -379.5 
#> 
#> No issues detected.

The quick check:

# Are standard errors reliable?
se_reliable(gamma_fit)
#> [1] TRUE

What Can Go Wrong

Non-positive-definite Hessian: The MLE may be at a saddle point, or the model may be unidentifiable
Large condition number: Parameters are highly correlated, SEs may be unstable
Non-zero gradient: Optimization may not have converged

Inference Summary

Method	When to use	Assumptions
Wald CI/test	Large samples, well-behaved likelihood	Asymptotic normality
Likelihood ratio test	Nested models, moderate samples	Regularity conditions
Profile likelihood	Asymmetric likelihood, moderate samples	Smooth likelihood
Bootstrap	Small samples, complex models	Resampling validity

For routine analysis with moderate-to-large samples, Wald inference (the default from confint() and summary()) is usually sufficient. Use profile likelihood or bootstrap when you suspect the asymptotic approximation may be poor.