Accumux provides a framework for combining statistical accumulators using algebraic composition. Built on solid mathematical foundations (monoids, homomorphisms), it enables single-pass computation of complex statistical measures with optimal numerical stability.
The Problem: Multi-Statistic Computation
Computing multiple statistics over large datasets typically requires:
- Multiple passes over the data, or
- Hand-rolled code combining different algorithms, or
- Numerical instability from naive implementations
Accumux solves this with compositional accumulators: combine any accumulators with +, process data once, extract all results.
Quick Example
#include "accumux/accumulators/kbn_sum.hpp"
#include "accumux/accumulators/welford.hpp"
#include "accumux/core/composition.hpp"
using namespace accumux;
// Compose accumulators with +
auto stats = kbn_sum<double>() + welford_accumulator<double>();
// Single pass through data
std::vector<double> data = {1.0, 2.0, 3.0, 4.0, 5.0};
for (const auto& value : data) {
stats += value;
}
// Extract all results
auto sum = stats.get_first().eval(); // 15.0
auto mean = stats.get_second().mean(); // 3.0
auto variance = stats.get_second().sample_variance(); // 2.5
Numerically Stable Algorithms
Accumux uses proven algorithms that maintain accuracy even with ill-conditioned data:
Kahan-Babuška-Neumaier Summation
Standard floating-point summation loses precision:
// Naive sum fails on this
std::vector<double> values = {1.0, 1e100, 1.0, -1e100};
// Naive: 0.0 (wrong!)
// KBN: 2.0 (correct!)
auto summer = kbn_sum<double>();
for (auto v : values) summer += v;
std::cout << summer.eval(); // 2.0
Welford’s Online Algorithm
Computes mean and variance in a single pass without catastrophic cancellation:
auto welford = welford_accumulator<double>();
for (auto v : data) welford += v;
welford.count(); // Number of samples
welford.mean(); // Running mean
welford.sample_variance(); // Unbiased variance
welford.sample_std_dev(); // Standard deviation
Min/Max Tracking
auto minmax = minmax_accumulator<double>();
for (auto v : data) minmax += v;
minmax.min(); // Minimum value
minmax.max(); // Maximum value
Algebraic Composition
The key insight: accumulators form a monoid under composition.
// Compose arbitrarily many accumulators
auto financial = kbn_sum<double>() +
welford_accumulator<double>() +
minmax_accumulator<double>();
std::vector<double> returns = {0.05, -0.02, 0.03, 0.01, -0.01, 0.04};
for (auto ret : returns) {
financial += ret; // All three update simultaneously
}
// Extract nested results
auto total = financial.get_first().eval();
auto mean = financial.get_second().mean();
auto volatility = financial.get_second().sample_std_dev();
auto worst = financial.get_second().get_second().min();
auto best = financial.get_second().get_second().max();
Mathematical Foundation
Monoid Structure
Each accumulator type A forms a monoid:
- Identity: Empty accumulator with no observations
- Binary operation: Merge two accumulators (combine their observations)
auto a = welford_accumulator<double>();
auto b = welford_accumulator<double>();
// Process different data
for (auto v : data1) a += v;
for (auto v : data2) b += v;
// Merge results
auto combined = a + b; // Equivalent to processing data1 ++ data2
Homomorphism Property
The composition operation preserves structure:
(a + b).process(x) = a.process(x) + b.process(x)
This enables parallel processing: split data, accumulate in parallel, merge results.
Type Safety with C++20 Concepts
Invalid compositions fail at compile-time:
// Compile error: can't add incompatible accumulators
auto invalid = kbn_sum<double>() + kbn_sum<int>(); // Type mismatch!
// OK: compatible types compose
auto valid = kbn_sum<double>() + welford_accumulator<double>();
Use Cases
- Financial Analysis: Track returns, volatility, drawdowns in one pass
- Scientific Computing: Online statistics for streaming sensor data
- Machine Learning: Feature statistics during data preprocessing
- Monitoring Systems: Real-time metrics aggregation
Performance
- O(1) space per accumulator (constant memory regardless of data size)
- O(n) time for n data points (single pass)
- Zero allocations during accumulation
- Header-only: No linking, no dependencies
Installation
Header-only—just include:
#include "accumux/accumulators/kbn_sum.hpp"
#include "accumux/accumulators/welford.hpp"
#include "accumux/core/composition.hpp"
Or with CMake:
add_subdirectory(accumux)
target_link_libraries(your_target PRIVATE accumux::accumux)
Resources
- GitHub: github.com/queelius/accumux
- Paper: AccuMux Technical Report
Accumux: Compose your statistics, compute in one pass, trust the math.
Discussion