Skip to main content

Bernoulli Types: A New Foundation for Approximate and Oblivious Computing

I’ve been working on a series of papers that develop a unified theoretical framework for approximate and oblivious computing, centered around what I call Bernoulli types. These papers explore how we can build rigorous foundations for systems that intentionally introduce controlled approximation—whether for privacy, space efficiency, or computational tractability.

The Core Insight: Latent vs Observed

The fundamental idea is simple but powerful: distinguish between latent (true) values and observed (approximate) values. This duality isn’t just about probabilistic data structures like Bloom filters—it’s a deeper principle that applies to:

  • Algorithms that deliberately sample or approximate (streaming algorithms, sketches)
  • Data structures with false positive guarantees (Bloom filters, Count-Min sketch, HyperLogLog)
  • Privacy mechanisms that add controlled “noise” through lossy observations
  • Encrypted search systems where query results are inherently approximate

Four Papers, One Unified Vision

Paper 1: The Latent/Observed Duality (paper1-latent-observed)

This paper establishes the type-theoretic foundations. Key insight: the mapping from latent → observed isn’t just noise—it’s a structural property that we can reason about formally.

Type: Latent → Observed
Property: Many-to-one mapping creates indistinguishability
Result: Privacy and approximation from the same mathematical structure

We show how Bloom filters, streaming algorithms, and privacy mechanisms all arise from the same underlying pattern: functions with rank-deficient codomains.

Paper 2: Asymptotic Indistinguishability (paper2-asymptotic-privacy)

Traditional differential privacy adds calibrated noise. Our approach is different: we use lossy compression and many-to-one mappings to achieve privacy through rank-deficient observations.

The key advantage: no noise calibration needed. Privacy emerges naturally from the observation function’s structure. As the compression ratio increases, distinct inputs become computationally indistinguishable.

Why this matters: It provides an alternative privacy framework that’s more natural for certain applications—especially those already using approximate data structures.

Paper 3: Hash-Based Bernoulli Constructions (paper3-hash-constructions)

This paper unifies seemingly disparate probabilistic data structures under one mathematical framework:

  • Bloom filters for set membership
  • Count-Min sketch for frequency estimation
  • HyperLogLog for cardinality estimation
  • MinHash for similarity estimation

All of these arise from the same construction: hash functions + Bernoulli trials. We prove they’re all space-optimal for their respective problems and show how to systematically derive new structures.

The framework provides:

  • Rigorous error bounds
  • Composition properties
  • Trade-off analysis between space, time, and accuracy

Paper 4: Bernoulli Sets (paper4-bernoulli-sets)

Sets are fundamental. But what if we can’t afford to store exact sets? This paper develops a complete statistical framework for approximate set operations.

Key contributions:

  • Formal semantics for set operations with error bounds
  • Composition rules (how errors propagate through unions, intersections)
  • Optimal parameter selection
  • Connection to hypothesis testing (set membership as statistical inference)

Practical impact: This gives you principled ways to build systems that trade space for accuracy while maintaining statistical guarantees.

Why This Matters Now

We’re in an era where:

  • Privacy is non-negotiable, but exact computation leaks information
  • Scale demands approximation—you can’t store everything exactly
  • Streaming data requires constant-space algorithms
  • Encrypted search needs oblivious data structures that hide access patterns

Bernoulli types provide a unified mathematical foundation for all of these. Instead of ad-hoc probabilistic structures, we get:

  1. Composable abstractions with clear error semantics
  2. Type safety that prevents mixing latent and observed values
  3. Formal proofs of privacy and accuracy guarantees
  4. Systematic constructions for new data structures

Looking Forward

These foundations enable several exciting research directions:

  • Oblivious algorithms that leak nothing about their inputs
  • Privacy-preserving machine learning with provable guarantees
  • Space-optimal streaming systems with rigorous error bounds
  • Encrypted databases with efficient, private queries

The papers are available now, and I’m actively working on implementations and applications. If you’re interested in approximate computing, privacy, or probabilistic data structures, I think you’ll find this framework valuable.

Related Papers:


These papers represent work done throughout 2024-2025, building on my earlier research in encrypted search and oblivious computing. Check out my Master’s thesis (2015) for the original motivation behind this work.

Discussion