Getting Started with femtograd • femtograd

Introduction

femtograd is an R package for automatic differentiation (AD) designed for classical statistics. Unlike packages focused on deep learning, femtograd emphasizes:

Pedagogy: Clear, readable code that demonstrates how AD works
Statistics: Built-in support for MLE, Hessians, and statistical inference
R integration: Standard R generics (coef, vcov, confint, etc.)

This vignette introduces the core concepts and workflow.

Creating Differentiable Values

The foundation of femtograd is the value object, which represents a node in a computational graph. Create values using val():

library(femtograd)

# Create a differentiable value
x <- val(3)
x
#> value(data =3, grad =0)

Values can be scalars, vectors, or matrices:

# Scalar
a <- val(2.5)

# Vector (stored as column matrix)
v <- val(c(1, 2, 3))

# Matrix
m <- val(matrix(1:4, 2, 2))

Building Computations

Standard mathematical operations work with value objects and automatically build a computational graph:

x <- val(3)
y <- val(4)

# Build computation
z <- x^2 + x * y

# Access the result
get_data(z)
#> [1] 21

Femtograd supports many operations:

Arithmetic: +, -, *, /, ^
Transcendental: log, exp, sqrt, sin, cos, tanh
Special functions: lgamma, digamma, log1p
Activations: sigmoid, relu, softplus, logit

Computing Gradients

Call backward() on any value to compute gradients via reverse-mode AD (backpropagation). Then use grad() to access the gradient:

x <- val(3)
y <- val(4)
z <- x^2 + x * y  # z = 9 + 12 = 21

# Compute gradients
backward(z)

# Access gradients
grad(x)  # dz/dx = 2x + y = 10
#> [1] 10
grad(y)  # dz/dy = x = 3
#> [1] 3

The gradient represents how much z changes when we change each input.

Resetting Gradients

When reusing parameters across multiple computations, reset gradients with zero_grad():

x <- val(2)

# First computation
y1 <- x^2
backward(y1)
grad(x)  # 4
#> [1] 4

# Reset and recompute
zero_grad(y1)
y2 <- x^3
backward(y2)
grad(x)  # 12
#> [1] 12

Maximum Likelihood Estimation

The primary use case for femtograd is maximum likelihood estimation. The fit() function combines optimization and inference:

# Generate data from a normal distribution
set.seed(42)
data <- rnorm(100, mean = 5, sd = 2)

# Define log-likelihood with named parameters
# Use log(sigma) to ensure sigma > 0
result <- fit(
  function(mu, log_sigma) {
    sigma <- exp(log_sigma)
    loglik_normal(mu, sigma, data)
  },
  params = c(mu = 0, log_sigma = 0)
)

result
#> femtofit: Maximum Likelihood Estimation
#> 
#> Coefficients:
#>        mu log_sigma 
#> 5.0650296 0.7286466 
#> 
#> Log-likelihood: -214.8

Using Base R Generics

The fitted model supports standard R generics:

# Parameter estimates
coef(result)
#>        mu log_sigma 
#> 5.0650296 0.7286466

# Standard errors
se(result)
#>         mu  log_sigma 
#> 0.20722742 0.07071068

# Variance-covariance matrix
vcov(result)
#>                      mu     log_sigma
#> mu         4.294320e-02 -2.117559e-11
#> log_sigma -2.117559e-11  5.000000e-03

# Confidence intervals
confint(result)
#>                2.5%    97.5%
#> mu        4.6588714 5.471188
#> log_sigma 0.5900563 0.867237

# Log-likelihood (enables AIC/BIC)
logLik(result)
#> 'log Lik.' -214.7585 (df=2)
AIC(result)
#> [1] 433.517

For a detailed summary with p-values:

summary(result)
#> femtofit: Maximum Likelihood Estimation
#> 
#> Call:
#> fit(loglik = function(mu, log_sigma) {
#>     sigma <- exp(log_sigma)
#>     loglik_normal(mu, sigma, data)
#> }, params = c(mu = 0, log_sigma = 0))
#> 
#> Coefficients:
#>           Estimate Std. Error z value  Pr(>|z|)    
#> mu        5.065030   0.207227  24.442 < 2.2e-16 ***
#> log_sigma 0.728647   0.070711  10.305 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> ---
#> Log-likelihood: -214.8    AIC: 433.5
#> Converged in 22 iterations

Built-in Distributions

Femtograd includes log-likelihood functions for common distributions:

Distribution	Function	Parameters
Normal	`loglik_normal(mu, sigma, x)`	mean, sd
Exponential	`loglik_exponential(rate, x)`	rate
Poisson	`loglik_poisson(lambda, x)`	mean
Binomial	`loglik_binomial(p, n, x)`	probability, trials
Gamma	`loglik_gamma(shape, rate, x)`	shape, rate
Beta	`loglik_beta(alpha, beta, x)`	shape1, shape2
Weibull	`loglik_weibull(shape, scale, x)`	shape, scale
Pareto	`loglik_pareto(alpha, x_min, x)`	shape, scale

Example: Exponential Distribution

# Generate exponential data
set.seed(123)
waiting_times <- rexp(50, rate = 0.5)

# Fit exponential model
exp_fit <- fit(
  function(rate) loglik_exponential(rate, waiting_times),
  params = c(rate = 1)
)
#> Warning in log(x$data): NaNs produced
#> Warning in log(x$data): NaNs produced
#> Warning in log(x$data): NaNs produced
#> Warning in log(x$data): NaNs produced
#> Warning in log(x$data): NaNs produced
#> Warning in log(x$data): NaNs produced

# True rate was 0.5, MLE should be close to 1/mean
coef(exp_fit)
#>      rate 
#> 0.4423327
1 / mean(waiting_times)
#> [1] 0.4423327

Parameter Constraints

For parameters with constraints (e.g., variance > 0, probability in [0,1]), use transformations:

# sigma > 0: use log(sigma), then exp() in likelihood
# 0 < p < 1: use logit(p), then sigmoid() in likelihood

# Example: Beta distribution with shape constraints (both > 0)
set.seed(456)
beta_data <- rbeta(50, shape1 = 2, shape2 = 5)

beta_fit <- fit(
  function(log_alpha, log_beta) {
    alpha <- exp(log_alpha)  # Ensures alpha > 0
    beta <- exp(log_beta)    # Ensures beta > 0
    loglik_beta(alpha, beta, beta_data)
  },
  params = c(log_alpha = 0, log_beta = 0)
)

# Transform back to original scale
exp(coef(beta_fit))
#> log_alpha  log_beta 
#>  2.467469  6.149787

Femtograd also provides helper functions: positive(), probability(), and bounded().

Model Comparison

Compare multiple models using AIC:

# Fit two models: fixed vs estimated sigma
m1 <- fit(function(mu) loglik_normal(mu, 2, data), c(mu = 0))
m2 <- fit(
  function(mu, log_sigma) loglik_normal(mu, exp(log_sigma), data),
  c(mu = 0, log_sigma = 0)
)

compare(m1, m2, names = c("Fixed sigma", "Free sigma"))
#> Model Comparison (ranked by AIC)
#> 
#>        Model df  logLik    AIC BIC dAIC weight
#>  Fixed sigma  1 -214.89 431.78  NA 0.00  0.705
#>   Free sigma  2 -214.76 433.52  NA 1.74  0.295

For nested models, use anova() for likelihood ratio tests:

anova(m1, m2)
#> Analysis of Deviance Table (Likelihood Ratio Tests) 
#> 
#>        Df logLik Df diff Chisq Pr(>Chisq)
#> m1    1.0 -214.9                         
#> m2    2.0 -214.8       1 0.258      0.611

Next Steps

See vignette("computational-graphs") to understand how AD works
See vignette("inference") for advanced inference (bootstrap, profile likelihood)
See vignette("survival-analysis") for a complete worked example
See vignette("architecture") for how AD engines are built and what makes femtograd different from production tools