active
library
autograd-stats
Statistical modeling library built on autograd-cpp for MLE, bootstrap, regression
Resources & Distribution
Source Code
Package Registries
autograd-stats
Statistical modeling library built on top of autograd-cpp.
Features
- Maximum Likelihood Estimation (MLE): Fit statistical distributions to data with automatic differentiation
- Rich Distribution Library: Exponential, Weibull, Normal, Gamma, Poisson, Beta, Lognormal, Gompertz, Multivariate Normal
- Bootstrap Inference: Parametric and non-parametric bootstrap for confidence intervals
- Regression Models: Linear and logistic regression with gradient-based optimization
- Generalized Linear Models (GLM): Gaussian, Binomial, Poisson families with various link functions
- Conditional Dependencies: Build Bayesian networks with conditional distributions
- Survival Analysis: Hazard functions, survival functions, cumulative hazard for Gompertz distribution
Installation
Using CMake FetchContent
include(FetchContent)
FetchContent_Declare(
autograd_stats
GIT_REPOSITORY https://github.com/queelius/autograd-stats.git
GIT_TAG main
)
set(BUILD_EXAMPLES OFF CACHE BOOL "" FORCE)
set(BUILD_TESTS OFF CACHE BOOL "" FORCE)
FetchContent_MakeAvailable(autograd_stats)
target_link_libraries(your_app PRIVATE statmodels::statmodels)
Note: This automatically pulls in autograd-cpp as a dependency.
Quick Start
Maximum Likelihood Estimation
#include <statmodels/distributions/normal.hpp>
using namespace statmodels;
int main() {
// Generate sample data
std::vector<float> data = {1.2, 2.3, 0.8, 1.5, 2.1, 1.8, 2.0};
// Fit normal distribution
NormalDistribution dist(0.0f, 1.0f); // Initial guess
dist.fit_mle(data);
std::cout << "Fitted mu: " << dist.get_mu() << std::endl;
std::cout << "Fitted sigma: " << dist.get_sigma() << std::endl;
return 0;
}
Multivariate Normal with Full Covariance
#include <statmodels/distributions/multivariate_normal.hpp>
using namespace statmodels;
int main() {
size_t d = 3; // 3 dimensions
// Create MVN with initial parameters
MultivariateNormal mvn(d);
// Generate data and fit
std::vector<std::vector<float>> data = generate_mvn_samples(...);
mvn.fit_mle(data);
// Get fitted parameters
auto mu = mvn.get_mean();
auto cov = mvn.get_covariance();
return 0;
}
Conditional Dependencies (Bayesian Networks)
#include <statmodels/graphical/conditional_distribution.hpp>
using namespace statmodels;
int main() {
// Build a simple Bayesian network: X -> Y -> Z
// X ~ Normal(0, 1)
auto X = std::make_shared<ConditionalNormal>(
"X", std::vector<std::string>{},
[](const auto&) { return 0.0f; }, // mean
[](const auto&) { return 1.0f; } // std
);
// Y | X ~ Normal(2*X, 0.5)
auto Y = std::make_shared<ConditionalNormal>(
"Y", std::vector<std::string>{"X"},
[](const auto& p) { return 2.0f * p.at("X"); },
[](const auto&) { return 0.5f; }
);
// Sample from the network
AncestralSampler sampler;
sampler.add_node(X);
sampler.add_node(Y);
auto samples = sampler.sample(1000, 42);
return 0;
}
Survival Analysis with Gompertz Distribution
#include <statmodels/distributions/gompertz.hpp>
using namespace statmodels;
int main() {
GompertzDistribution dist(0.1f, 0.05f); // eta, b
// Fit to survival data
std::vector<float> times = {10.0f, 25.0f, 40.0f, 55.0f, 70.0f};
dist.fit_mle(times);
// Survival analysis
float t = 50.0f;
std::cout << "Survival at t=" << t << ": " << dist.survival(t) << std::endl;
std::cout << "Hazard rate: " << dist.hazard_rate(t) << std::endl;
std::cout << "Cumulative hazard: " << dist.cumulative_hazard(t) << std::endl;
return 0;
}
Linear Regression
#include <statmodels/regression/linear_regression.hpp>
using namespace statmodels;
using namespace autograd;
int main() {
// Create feature matrix X [n_samples, n_features] and target y [n_samples, 1]
auto X = from_vector(X_data, {n_samples, n_features}, false);
auto y = from_vector(y_data, {n_samples, 1}, false);
LinearRegression model(true); // fit_intercept = true
model.fit(X, y, 1000, 0.01f, 1e-6, true); // max_iter, lr, tol, verbose
auto predictions = model.predict(X);
float r2 = model.score(X, y);
return 0;
}
Bootstrap Confidence Intervals
#include <statmodels/inference/bootstrap.hpp>
#include <statmodels/distributions/exponential.hpp>
using namespace statmodels;
int main() {
std::vector<float> data = {1.2, 2.3, 0.8, 1.5, 2.1};
ExponentialDistribution dist(1.0f);
dist.fit_mle(data);
// Non-parametric bootstrap
auto results = Bootstrap::run(dist, data, 100, 0.95f);
std::cout << "95% CI: [" << results.parameter_ci_lower[0]
<< ", " << results.parameter_ci_upper[0] << "]" << std::endl;
return 0;
}
Components
Distributions (distributions/)
| Distribution | Parameters | MLE | Special Functions |
|---|---|---|---|
| Exponential | lambda | Yes | - |
| Weibull | k, lambda | Yes | - |
| Normal | mu, sigma | Yes | - |
| Gamma | alpha, beta | Yes | - |
| Poisson | lambda | Yes | - |
| Beta | alpha, beta | Yes | - |
| Lognormal | mu, sigma | Yes | median, mode |
| Gompertz | eta, b | Yes | survival, hazard_rate, cumulative_hazard |
| MultivariateNormal | mu, Sigma | Yes | Cholesky decomposition |
| MultivariateNormalDiag | mu, diag_sigma | Yes | Diagonal covariance |
Inference (inference/)
- Bootstrap: Parametric and non-parametric bootstrap
- EmpiricalDistribution: Distribution from bootstrap samples with CI, quantiles, correlation
Regression (regression/)
- LinearRegression: OLS with L1/L2 regularization
- LogisticRegression: Binary classification with gradient descent
- GLM: Generalized Linear Models (Gaussian, Binomial, Poisson families)
Graphical Models (graphical/)
- ConditionalDistribution: Base class for conditional distributions
- ConditionalNormal: Normal with mean/std as functions of parents
- ConditionalGamma: Gamma with rate as function of parents
- AncestralSampler: Forward sampling from Bayesian networks
Math Utilities (distributions/math_utils.hpp)
lgamma_tensor(): Differentiable log-gamma functionlog_beta_function_tensor(): Differentiable log-beta function
Building Examples
git clone https://github.com/queelius/autograd-stats.git
cd autograd-stats
mkdir build && cd build
cmake ..
make -j$(nproc)
# Run distribution examples
./examples/normal_mle
./examples/gamma_mle
./examples/beta_mle
./examples/poisson_mle
./examples/exponential_mle
./examples/weibull_mle_simple
./examples/lognormal_mle
./examples/gompertz_mle
./examples/mvn_diag_mle
./examples/mvn_full_mle
# Run regression examples
./examples/linear_regression
./examples/logistic_regression
# Run bootstrap example
./examples/exponential_bootstrap
# Run conditional sampling example
./examples/conditional_sampling
Running Tests
cd build
ctest --verbose
# Or run specific test suites
./tests/test_distributions
./tests/test_regression
./tests/test_bootstrap
./tests/test_optimizer
Requirements
- C++17 or later
- CMake 3.14+
- autograd-cpp (automatically fetched)
- Optional: OpenMP for parallelization
Design Philosophy
autograd-stats builds on autograd-cpp to provide:
- Automatic differentiation for gradient-based MLE optimization
- Log-parameterization for constrained parameters (e.g., sigma > 0 uses log_sigma internally)
- Type-safe statistical models with compile-time guarantees
- Composable components for building custom statistical models
- Broadcasting support for scalar-tensor operations in regression
License
MIT License
Contributing
Contributions welcome! This library depends on autograd-cpp for automatic differentiation.