Skip to main content

Accumux: Compositional Online Statistical Reductions in C++

Accumux is a framework for combining statistical accumulators using algebraic composition. The idea is simple: accumulators form a monoid under composition, so you can combine them with +, process data in a single pass, and extract all results.

The Problem

Computing multiple statistics over large datasets usually means multiple passes over the data, hand-rolled code combining different algorithms, or numerical instability from naive implementations. Accumux solves this with compositional accumulators.

Quick Example

#include "accumux/accumulators/kbn_sum.hpp"
#include "accumux/accumulators/welford.hpp"
#include "accumux/core/composition.hpp"

using namespace accumux;

// Compose accumulators with +
auto stats = kbn_sum<double>() + welford_accumulator<double>();

// Single pass through data
std::vector<double> data = {1.0, 2.0, 3.0, 4.0, 5.0};
for (const auto& value : data) {
    stats += value;
}

// Extract all results
auto sum = stats.get_first().eval();           // 15.0
auto mean = stats.get_second().mean();         // 3.0
auto variance = stats.get_second().sample_variance();  // 2.5

Numerically Stable Algorithms

Accumux uses proven algorithms that maintain accuracy even with ill-conditioned data.

Kahan-Babushka-Neumaier Summation

Standard floating-point summation loses precision:

// Naive sum fails on this
std::vector<double> values = {1.0, 1e100, 1.0, -1e100};
// Naive: 0.0 (wrong!)
// KBN:   2.0 (correct!)

auto summer = kbn_sum<double>();
for (auto v : values) summer += v;
std::cout << summer.eval();  // 2.0

Welford’s Online Algorithm

Computes mean and variance in a single pass without catastrophic cancellation:

auto welford = welford_accumulator<double>();
for (auto v : data) welford += v;

welford.count();           // Number of samples
welford.mean();            // Running mean
welford.sample_variance(); // Unbiased variance
welford.sample_std_dev();  // Standard deviation

Min/Max Tracking

auto minmax = minmax_accumulator<double>();
for (auto v : data) minmax += v;

minmax.min();  // Minimum value
minmax.max();  // Maximum value

Algebraic Composition

The key insight is that accumulators form a monoid under composition.

// Compose arbitrarily many accumulators
auto financial = kbn_sum<double>() +
                 welford_accumulator<double>() +
                 minmax_accumulator<double>();

std::vector<double> returns = {0.05, -0.02, 0.03, 0.01, -0.01, 0.04};
for (auto ret : returns) {
    financial += ret;  // All three update simultaneously
}

// Extract nested results
auto total = financial.get_first().eval();
auto mean = financial.get_second().mean();
auto volatility = financial.get_second().sample_std_dev();
auto worst = financial.get_second().get_second().min();
auto best = financial.get_second().get_second().max();

Mathematical Foundation

Monoid Structure

Each accumulator type A forms a monoid. The identity is the empty accumulator with no observations. The binary operation merges two accumulators (combining their observations).

auto a = welford_accumulator<double>();
auto b = welford_accumulator<double>();

// Process different data
for (auto v : data1) a += v;
for (auto v : data2) b += v;

// Merge results
auto combined = a + b;  // Equivalent to processing data1 ++ data2

Homomorphism Property

The composition operation preserves structure:

(a + b).process(x) = a.process(x) + b.process(x)

This enables parallel processing: split data, accumulate in parallel, merge results.

Type Safety with C++20 Concepts

Invalid compositions fail at compile time:

// Compile error: can't add incompatible accumulators
auto invalid = kbn_sum<double>() + kbn_sum<int>();  // Type mismatch!

// OK: compatible types compose
auto valid = kbn_sum<double>() + welford_accumulator<double>();

Use Cases

Financial analysis (track returns, volatility, drawdowns in one pass), scientific computing (online statistics for streaming sensor data), machine learning (feature statistics during data preprocessing), and monitoring systems (real-time metrics aggregation).

Performance

O(1) space per accumulator (constant memory regardless of data size). O(n) time for n data points (single pass). Zero allocations during accumulation. Header-only: no linking, no dependencies.

Installation

Header-only, just include:

#include "accumux/accumulators/kbn_sum.hpp"
#include "accumux/accumulators/welford.hpp"
#include "accumux/core/composition.hpp"

Or with CMake:

add_subdirectory(accumux)
target_link_libraries(your_target PRIVATE accumux::accumux)

Resources

Discussion