active library

autograd-stats

Statistical modeling library built on autograd-cpp for MLE, bootstrap, regression

Started 2025 C++

Resources & Distribution

Source Code

Package Registries

autograd-stats

Statistical modeling library built on top of autograd-cpp.

Features

  • Maximum Likelihood Estimation (MLE): Fit statistical distributions to data with automatic differentiation
  • Rich Distribution Library: Exponential, Weibull, Normal, Gamma, Poisson, Beta, Lognormal, Gompertz, Multivariate Normal
  • Bootstrap Inference: Parametric and non-parametric bootstrap for confidence intervals
  • Regression Models: Linear and logistic regression with gradient-based optimization
  • Generalized Linear Models (GLM): Gaussian, Binomial, Poisson families with various link functions
  • Conditional Dependencies: Build Bayesian networks with conditional distributions
  • Survival Analysis: Hazard functions, survival functions, cumulative hazard for Gompertz distribution

Installation

Using CMake FetchContent

include(FetchContent)

FetchContent_Declare(
    autograd_stats
    GIT_REPOSITORY https://github.com/queelius/autograd-stats.git
    GIT_TAG main
)

set(BUILD_EXAMPLES OFF CACHE BOOL "" FORCE)
set(BUILD_TESTS OFF CACHE BOOL "" FORCE)

FetchContent_MakeAvailable(autograd_stats)

target_link_libraries(your_app PRIVATE statmodels::statmodels)

Note: This automatically pulls in autograd-cpp as a dependency.

Quick Start

Maximum Likelihood Estimation

#include <statmodels/distributions/normal.hpp>

using namespace statmodels;

int main() {
    // Generate sample data
    std::vector<float> data = {1.2, 2.3, 0.8, 1.5, 2.1, 1.8, 2.0};

    // Fit normal distribution
    NormalDistribution dist(0.0f, 1.0f);  // Initial guess
    dist.fit_mle(data);

    std::cout << "Fitted mu: " << dist.get_mu() << std::endl;
    std::cout << "Fitted sigma: " << dist.get_sigma() << std::endl;

    return 0;
}

Multivariate Normal with Full Covariance

#include <statmodels/distributions/multivariate_normal.hpp>

using namespace statmodels;

int main() {
    size_t d = 3;  // 3 dimensions

    // Create MVN with initial parameters
    MultivariateNormal mvn(d);

    // Generate data and fit
    std::vector<std::vector<float>> data = generate_mvn_samples(...);
    mvn.fit_mle(data);

    // Get fitted parameters
    auto mu = mvn.get_mean();
    auto cov = mvn.get_covariance();

    return 0;
}

Conditional Dependencies (Bayesian Networks)

#include <statmodels/graphical/conditional_distribution.hpp>

using namespace statmodels;

int main() {
    // Build a simple Bayesian network: X -> Y -> Z

    // X ~ Normal(0, 1)
    auto X = std::make_shared<ConditionalNormal>(
        "X", std::vector<std::string>{},
        [](const auto&) { return 0.0f; },   // mean
        [](const auto&) { return 1.0f; }    // std
    );

    // Y | X ~ Normal(2*X, 0.5)
    auto Y = std::make_shared<ConditionalNormal>(
        "Y", std::vector<std::string>{"X"},
        [](const auto& p) { return 2.0f * p.at("X"); },
        [](const auto&) { return 0.5f; }
    );

    // Sample from the network
    AncestralSampler sampler;
    sampler.add_node(X);
    sampler.add_node(Y);

    auto samples = sampler.sample(1000, 42);

    return 0;
}

Survival Analysis with Gompertz Distribution

#include <statmodels/distributions/gompertz.hpp>

using namespace statmodels;

int main() {
    GompertzDistribution dist(0.1f, 0.05f);  // eta, b

    // Fit to survival data
    std::vector<float> times = {10.0f, 25.0f, 40.0f, 55.0f, 70.0f};
    dist.fit_mle(times);

    // Survival analysis
    float t = 50.0f;
    std::cout << "Survival at t=" << t << ": " << dist.survival(t) << std::endl;
    std::cout << "Hazard rate: " << dist.hazard_rate(t) << std::endl;
    std::cout << "Cumulative hazard: " << dist.cumulative_hazard(t) << std::endl;

    return 0;
}

Linear Regression

#include <statmodels/regression/linear_regression.hpp>

using namespace statmodels;
using namespace autograd;

int main() {
    // Create feature matrix X [n_samples, n_features] and target y [n_samples, 1]
    auto X = from_vector(X_data, {n_samples, n_features}, false);
    auto y = from_vector(y_data, {n_samples, 1}, false);

    LinearRegression model(true);  // fit_intercept = true
    model.fit(X, y, 1000, 0.01f, 1e-6, true);  // max_iter, lr, tol, verbose

    auto predictions = model.predict(X);
    float r2 = model.score(X, y);

    return 0;
}

Bootstrap Confidence Intervals

#include <statmodels/inference/bootstrap.hpp>
#include <statmodels/distributions/exponential.hpp>

using namespace statmodels;

int main() {
    std::vector<float> data = {1.2, 2.3, 0.8, 1.5, 2.1};

    ExponentialDistribution dist(1.0f);
    dist.fit_mle(data);

    // Non-parametric bootstrap
    auto results = Bootstrap::run(dist, data, 100, 0.95f);

    std::cout << "95% CI: [" << results.parameter_ci_lower[0]
              << ", " << results.parameter_ci_upper[0] << "]" << std::endl;

    return 0;
}

Components

Distributions (distributions/)

DistributionParametersMLESpecial Functions
ExponentiallambdaYes-
Weibullk, lambdaYes-
Normalmu, sigmaYes-
Gammaalpha, betaYes-
PoissonlambdaYes-
Betaalpha, betaYes-
Lognormalmu, sigmaYesmedian, mode
Gompertzeta, bYessurvival, hazard_rate, cumulative_hazard
MultivariateNormalmu, SigmaYesCholesky decomposition
MultivariateNormalDiagmu, diag_sigmaYesDiagonal covariance

Inference (inference/)

  • Bootstrap: Parametric and non-parametric bootstrap
  • EmpiricalDistribution: Distribution from bootstrap samples with CI, quantiles, correlation

Regression (regression/)

  • LinearRegression: OLS with L1/L2 regularization
  • LogisticRegression: Binary classification with gradient descent
  • GLM: Generalized Linear Models (Gaussian, Binomial, Poisson families)

Graphical Models (graphical/)

  • ConditionalDistribution: Base class for conditional distributions
  • ConditionalNormal: Normal with mean/std as functions of parents
  • ConditionalGamma: Gamma with rate as function of parents
  • AncestralSampler: Forward sampling from Bayesian networks

Math Utilities (distributions/math_utils.hpp)

  • lgamma_tensor(): Differentiable log-gamma function
  • log_beta_function_tensor(): Differentiable log-beta function

Building Examples

git clone https://github.com/queelius/autograd-stats.git
cd autograd-stats
mkdir build && cd build
cmake ..
make -j$(nproc)

# Run distribution examples
./examples/normal_mle
./examples/gamma_mle
./examples/beta_mle
./examples/poisson_mle
./examples/exponential_mle
./examples/weibull_mle_simple
./examples/lognormal_mle
./examples/gompertz_mle
./examples/mvn_diag_mle
./examples/mvn_full_mle

# Run regression examples
./examples/linear_regression
./examples/logistic_regression

# Run bootstrap example
./examples/exponential_bootstrap

# Run conditional sampling example
./examples/conditional_sampling

Running Tests

cd build
ctest --verbose

# Or run specific test suites
./tests/test_distributions
./tests/test_regression
./tests/test_bootstrap
./tests/test_optimizer

Requirements

  • C++17 or later
  • CMake 3.14+
  • autograd-cpp (automatically fetched)
  • Optional: OpenMP for parallelization

Design Philosophy

autograd-stats builds on autograd-cpp to provide:

  • Automatic differentiation for gradient-based MLE optimization
  • Log-parameterization for constrained parameters (e.g., sigma > 0 uses log_sigma internally)
  • Type-safe statistical models with compile-time guarantees
  • Composable components for building custom statistical models
  • Broadcasting support for scalar-tensor operations in regression

License

MIT License

Contributing

Contributions welcome! This library depends on autograd-cpp for automatic differentiation.

Discussion