I’ve been working on a series of papers that develop a unified theoretical framework for approximate and oblivious computing, centered around what I call Bernoulli types. These papers explore how we can build rigorous foundations for systems that intentionally introduce controlled approximation—whether for privacy, space efficiency, or computational tractability.
The Core Insight: Latent vs Observed
The fundamental idea is simple but powerful: distinguish between latent (true) values and observed (approximate) values. This duality isn’t just about probabilistic data structures like Bloom filters—it’s a deeper principle that applies to:
- Algorithms that deliberately sample or approximate (streaming algorithms, sketches)
- Data structures with false positive guarantees (Bloom filters, Count-Min sketch, HyperLogLog)
- Privacy mechanisms that add controlled “noise” through lossy observations
- Encrypted search systems where query results are inherently approximate
Four Papers, One Unified Vision
Paper 1: The Latent/Observed Duality (paper1-latent-observed)
This paper establishes the type-theoretic foundations. Key insight: the mapping from latent → observed isn’t just noise—it’s a structural property that we can reason about formally.
Type: Latent → Observed
Property: Many-to-one mapping creates indistinguishability
Result: Privacy and approximation from the same mathematical structure
We show how Bloom filters, streaming algorithms, and privacy mechanisms all arise from the same underlying pattern: functions with rank-deficient codomains.
Paper 2: Asymptotic Indistinguishability (paper2-asymptotic-privacy)
Traditional differential privacy adds calibrated noise. Our approach is different: we use lossy compression and many-to-one mappings to achieve privacy through rank-deficient observations.
The key advantage: no noise calibration needed. Privacy emerges naturally from the observation function’s structure. As the compression ratio increases, distinct inputs become computationally indistinguishable.
Why this matters: It provides an alternative privacy framework that’s more natural for certain applications—especially those already using approximate data structures.
Paper 3: Hash-Based Bernoulli Constructions (paper3-hash-constructions)
This paper unifies seemingly disparate probabilistic data structures under one mathematical framework:
- Bloom filters for set membership
- Count-Min sketch for frequency estimation
- HyperLogLog for cardinality estimation
- MinHash for similarity estimation
All of these arise from the same construction: hash functions + Bernoulli trials. We prove they’re all space-optimal for their respective problems and show how to systematically derive new structures.
The framework provides:
- Rigorous error bounds
- Composition properties
- Trade-off analysis between space, time, and accuracy
Paper 4: Bernoulli Sets (paper4-bernoulli-sets)
Sets are fundamental. But what if we can’t afford to store exact sets? This paper develops a complete statistical framework for approximate set operations.
Key contributions:
- Formal semantics for set operations with error bounds
- Composition rules (how errors propagate through unions, intersections)
- Optimal parameter selection
- Connection to hypothesis testing (set membership as statistical inference)
Practical impact: This gives you principled ways to build systems that trade space for accuracy while maintaining statistical guarantees.
Why This Matters Now
We’re in an era where:
- Privacy is non-negotiable, but exact computation leaks information
- Scale demands approximation—you can’t store everything exactly
- Streaming data requires constant-space algorithms
- Encrypted search needs oblivious data structures that hide access patterns
Bernoulli types provide a unified mathematical foundation for all of these. Instead of ad-hoc probabilistic structures, we get:
- Composable abstractions with clear error semantics
- Type safety that prevents mixing latent and observed values
- Formal proofs of privacy and accuracy guarantees
- Systematic constructions for new data structures
Looking Forward
These foundations enable several exciting research directions:
- Oblivious algorithms that leak nothing about their inputs
- Privacy-preserving machine learning with provable guarantees
- Space-optimal streaming systems with rigorous error bounds
- Encrypted databases with efficient, private queries
The papers are available now, and I’m actively working on implementations and applications. If you’re interested in approximate computing, privacy, or probabilistic data structures, I think you’ll find this framework valuable.
Related Papers:
- The Latent/Observed Duality
- Asymptotic Indistinguishability
- Hash-Based Bernoulli Constructions
- Bernoulli Sets
These papers represent work done throughout 2024-2025, building on my earlier research in encrypted search and oblivious computing. Check out my Master’s thesis (2015) for the original motivation behind this work.
Discussion