Skip to main content

Accumux: Compositional Online Statistical Reductions in C++

Accumux provides a framework for combining statistical accumulators using algebraic composition. Built on solid mathematical foundations (monoids, homomorphisms), it enables single-pass computation of complex statistical measures with optimal numerical stability.

The Problem: Multi-Statistic Computation

Computing multiple statistics over large datasets typically requires:

  • Multiple passes over the data, or
  • Hand-rolled code combining different algorithms, or
  • Numerical instability from naive implementations

Accumux solves this with compositional accumulators: combine any accumulators with +, process data once, extract all results.

Quick Example

#include "accumux/accumulators/kbn_sum.hpp"
#include "accumux/accumulators/welford.hpp"
#include "accumux/core/composition.hpp"

using namespace accumux;

// Compose accumulators with +
auto stats = kbn_sum<double>() + welford_accumulator<double>();

// Single pass through data
std::vector<double> data = {1.0, 2.0, 3.0, 4.0, 5.0};
for (const auto& value : data) {
    stats += value;
}

// Extract all results
auto sum = stats.get_first().eval();           // 15.0
auto mean = stats.get_second().mean();         // 3.0
auto variance = stats.get_second().sample_variance();  // 2.5

Numerically Stable Algorithms

Accumux uses proven algorithms that maintain accuracy even with ill-conditioned data:

Kahan-Babuška-Neumaier Summation

Standard floating-point summation loses precision:

// Naive sum fails on this
std::vector<double> values = {1.0, 1e100, 1.0, -1e100};
// Naive: 0.0 (wrong!)
// KBN:   2.0 (correct!)

auto summer = kbn_sum<double>();
for (auto v : values) summer += v;
std::cout << summer.eval();  // 2.0

Welford’s Online Algorithm

Computes mean and variance in a single pass without catastrophic cancellation:

auto welford = welford_accumulator<double>();
for (auto v : data) welford += v;

welford.count();           // Number of samples
welford.mean();            // Running mean
welford.sample_variance(); // Unbiased variance
welford.sample_std_dev();  // Standard deviation

Min/Max Tracking

auto minmax = minmax_accumulator<double>();
for (auto v : data) minmax += v;

minmax.min();  // Minimum value
minmax.max();  // Maximum value

Algebraic Composition

The key insight: accumulators form a monoid under composition.

// Compose arbitrarily many accumulators
auto financial = kbn_sum<double>() +
                 welford_accumulator<double>() +
                 minmax_accumulator<double>();

std::vector<double> returns = {0.05, -0.02, 0.03, 0.01, -0.01, 0.04};
for (auto ret : returns) {
    financial += ret;  // All three update simultaneously
}

// Extract nested results
auto total = financial.get_first().eval();
auto mean = financial.get_second().mean();
auto volatility = financial.get_second().sample_std_dev();
auto worst = financial.get_second().get_second().min();
auto best = financial.get_second().get_second().max();

Mathematical Foundation

Monoid Structure

Each accumulator type A forms a monoid:

  • Identity: Empty accumulator with no observations
  • Binary operation: Merge two accumulators (combine their observations)
auto a = welford_accumulator<double>();
auto b = welford_accumulator<double>();

// Process different data
for (auto v : data1) a += v;
for (auto v : data2) b += v;

// Merge results
auto combined = a + b;  // Equivalent to processing data1 ++ data2

Homomorphism Property

The composition operation preserves structure:

(a + b).process(x) = a.process(x) + b.process(x)

This enables parallel processing: split data, accumulate in parallel, merge results.

Type Safety with C++20 Concepts

Invalid compositions fail at compile-time:

// Compile error: can't add incompatible accumulators
auto invalid = kbn_sum<double>() + kbn_sum<int>();  // Type mismatch!

// OK: compatible types compose
auto valid = kbn_sum<double>() + welford_accumulator<double>();

Use Cases

  • Financial Analysis: Track returns, volatility, drawdowns in one pass
  • Scientific Computing: Online statistics for streaming sensor data
  • Machine Learning: Feature statistics during data preprocessing
  • Monitoring Systems: Real-time metrics aggregation

Performance

  • O(1) space per accumulator (constant memory regardless of data size)
  • O(n) time for n data points (single pass)
  • Zero allocations during accumulation
  • Header-only: No linking, no dependencies

Installation

Header-only—just include:

#include "accumux/accumulators/kbn_sum.hpp"
#include "accumux/accumulators/welford.hpp"
#include "accumux/core/composition.hpp"

Or with CMake:

add_subdirectory(accumux)
target_link_libraries(your_target PRIVATE accumux::accumux)

Resources


Accumux: Compose your statistics, compute in one pass, trust the math.

Discussion