Skip to main content

algebraic.dist: Treating Distributions as First-Class Algebraic Objects in R

Most statistical software treats probability distributions as static parameter sets you pass to sampling or density functions. algebraic.dist takes a different approach: distributions are algebraic objects that compose, transform, and combine using standard mathematical operations.

The Core Idea

Instead of writing:

x <- rnorm(1000, mean=5, sd=2)
y <- rnorm(1000, mean=3, sd=1)
z <- x + y  # Just numeric vectors

You write:

X <- Normal(mean=5, sd=2)
Y <- Normal(mean=3, sd=1)
Z <- X + Y  # A new distribution object!
sample(Z, 1000)

The sum Z knows it’s Normal(mean=8, sd=sqrt(5)) because the algebra works it out.

Why This Matters

1. Correctness Through Type Safety

When you add two normal distributions numerically, you get samples from the sum. But you lose the distributional structure. With algebraic.dist, the result is still a distribution object with proper parameters.

2. Symbolic Computation Before Sampling

You can build complex distributional expressions and simplify them algebraically before ever drawing a sample:

portfolio <- 0.6*StockA + 0.4*StockB
risk <- sd(portfolio)  # Computed symbolically

3. Monte Carlo Without the Monte Carlo

For distributions with known closed-form algebra (normal, exponential, certain mixtures), you don’t need simulation—you just compute the exact answer.

Compositional Statistics

This is functional programming for probability theory. Distributions become composable building blocks:

  • Mixture models: 0.3*Normal(0,1) + 0.7*Normal(5,2)
  • Transformed distributions: exp(Normal(0,1)) is lognormal
  • Conditional distributions: X | (X > 0) for truncation

Connection to My Research

This package embodies a core theme in my work: computation should mirror mathematical structure.

Just as my oblivious computing research uses type theory to enforce privacy invariants, algebraic.dist uses algebraic types to enforce distributional invariants. The algebra tells you what operations are valid and what the results mean.

Related projects:

  • algebraic.mle: Maximum likelihood estimation with algebraic specification
  • numerical.mle: Numerical optimization for MLE when closed forms don’t exist
  • likelihood.model: Likelihood-based inference with compositional model building

Technical Details

  • Language: R
  • Type system: S3 classes with method dispatch for operations
  • Closed-form operations: Normal, exponential, gamma families
  • Fallback: Monte Carlo for complex compositions
  • Repository: github.com/queelius/algebraic.dist

The Bigger Picture

Most statistical software is imperative: you tell the computer what to do step-by-step. algebraic.dist is declarative: you describe the distributional relationships, and the computer figures out what to compute.

This mirrors the Unix philosophy: small composable pieces that do one thing well. Here, the “one thing” is: preserve distributional structure through transformations.

Discussion