Most statistical software treats probability distributions as static parameter sets you pass to sampling or density functions. algebraic.dist takes a different approach: distributions are algebraic objects that compose, transform, and combine using standard mathematical operations.
The Core Idea
Instead of writing:
x <- rnorm(1000, mean=5, sd=2)
y <- rnorm(1000, mean=3, sd=1)
z <- x + y # Just numeric vectors
You write:
X <- Normal(mean=5, sd=2)
Y <- Normal(mean=3, sd=1)
Z <- X + Y # A new distribution object!
sample(Z, 1000)
The sum Z knows it’s Normal(mean=8, sd=sqrt(5)) because the algebra works it out.
Why This Matters
1. Correctness Through Type Safety
When you add two normal distributions numerically, you get samples from the sum. But you lose the distributional structure. With algebraic.dist, the result is still a distribution object with proper parameters.
2. Symbolic Computation Before Sampling
You can build complex distributional expressions and simplify them algebraically before ever drawing a sample:
portfolio <- 0.6*StockA + 0.4*StockB
risk <- sd(portfolio) # Computed symbolically
3. Monte Carlo Without the Monte Carlo
For distributions with known closed-form algebra (normal, exponential, certain mixtures), you don’t need simulation—you just compute the exact answer.
Compositional Statistics
This is functional programming for probability theory. Distributions become composable building blocks:
- Mixture models:
0.3*Normal(0,1) + 0.7*Normal(5,2) - Transformed distributions:
exp(Normal(0,1))is lognormal - Conditional distributions:
X | (X > 0)for truncation
Connection to My Research
This package embodies a core theme in my work: computation should mirror mathematical structure.
Just as my oblivious computing research uses type theory to enforce privacy invariants, algebraic.dist uses algebraic types to enforce distributional invariants. The algebra tells you what operations are valid and what the results mean.
Related projects:
- algebraic.mle: Maximum likelihood estimation with algebraic specification
- numerical.mle: Numerical optimization for MLE when closed forms don’t exist
- likelihood.model: Likelihood-based inference with compositional model building
Technical Details
- Language: R
- Type system: S3 classes with method dispatch for operations
- Closed-form operations: Normal, exponential, gamma families
- Fallback: Monte Carlo for complex compositions
- Repository: github.com/queelius/algebraic.dist
The Bigger Picture
Most statistical software is imperative: you tell the computer what to do step-by-step. algebraic.dist is declarative: you describe the distributional relationships, and the computer figures out what to compute.
This mirrors the Unix philosophy: small composable pieces that do one thing well. Here, the “one thing” is: preserve distributional structure through transformations.
Discussion