Encrypted search has a fundamental problem: you can’t hide what you’re looking for. Even with the best encryption, search patterns leak information. My recent work develops a new approach using oblivious Bernoulli types to achieve information-theoretic privacy in encrypted search systems.
The Core Problem
Traditional encrypted search systems face an impossible trade-off:
Deterministic schemes are efficient but leak access patterns
- Same query → same encrypted trapdoor
- Adversary learns: frequency, co-occurrence, time series
- Result: significant information leakage over time
Fully oblivious schemes (like ORAM) are secure but impractical
- Orders of magnitude slower
- Requires complete index reshuffling
- Doesn’t scale to real-world databases
We need something in between: practical efficiency with strong privacy guarantees.
Three Papers on Oblivious Encrypted Search
1. Encrypted Search with Oblivious Bernoulli Types (encrypted-search-ob-types)
The key insight: Use approximate data structures (Bloom filters, sketches) not just for space efficiency, but as a privacy mechanism.
When you query with a probabilistic structure:
- Same logical query → different encrypted representation each time
- False positives naturally obfuscate the true query
- Information-theoretic privacy from controlled approximation
How it works:
Traditional: "machine learning" → hash(term) → deterministic trapdoor
Oblivious: "machine learning" → Bloom filter → randomized trapdoor set
Every query generates a fresh, statistically indistinguishable representation. The adversary learns almost nothing from observing queries.
Practical advantages:
- No ORAM overhead
- Scales to massive databases
- Tunable privacy/accuracy trade-off
- Works with existing encrypted search infrastructure
2. Boolean Search with Oblivious Types (secure-index-bool-search)
Real search isn’t just keyword lookup—it’s Boolean queries: (machine AND learning) OR (deep AND neural).
This paper extends oblivious types to complex Boolean queries:
- Error propagation analysis: How do false positives combine across AND/OR/NOT?
- Query obfuscation: How to hide Boolean structure itself
- Threshold-based retrieval: Using scoring to control false positives
Key result: We can evaluate arbitrary Boolean queries over oblivious indexes while maintaining privacy, with provable error bounds.
Challenge: Error accumulates through operators. Solution: threshold-based ranking that controls false positive rates while preserving privacy.
3. Maximizing Confidentiality Through Entropy (max-conf-in-encrypted-search)
If we’re going to use approximation for privacy, how much approximation is optimal?
This paper frames it as an optimization problem:
- Maximize entropy of observations (maximizes adversary uncertainty)
- Subject to accuracy constraints
- Find optimal false positive rates for each query term
Information-theoretic result: The optimal secure index has an entropy distribution that depends on:
- Query term frequency distribution
- Required retrieval accuracy
- Adversary’s prior knowledge
We derive closed-form solutions for optimal parameterization and show how to construct maximally-confidential indexes.
Why This Approach Works
The fundamental insight connecting all three papers:
Privacy through controlled indistinguishability, not through computational hardness.
Traditional cryptography says: “It’s too hard to break the encryption.” Our approach says: “There’s nothing to break—the observations are inherently ambiguous.”
This gives us:
- Information-theoretic privacy: Not broken by future quantum computers
- Composable privacy: Multiple queries don’t compound leakage linearly
- Practical efficiency: No ORAM, no secure multi-party computation overhead
- Tunable trade-offs: Application-specific privacy/accuracy balance
Connection to Earlier Work
This builds directly on my Master’s thesis (2015) on encrypted search, where I first identified the access pattern leakage problem and explored early defenses.
The subsequent IEEE paper (2016) on estimating confidentiality degradation showed how quickly deterministic schemes leak information—motivating the need for oblivious approaches.
The known-plaintext attack paper demonstrated time-series forecasting of privacy loss, making the case that deterministic encrypted search is fundamentally broken.
These new papers provide the solution.
Real-World Impact
This isn’t just theory. The techniques enable:
- Privacy-preserving document search in cloud storage
- Confidential database queries on untrusted infrastructure
- Private information retrieval with practical efficiency
- Encrypted log analysis without leaking query patterns
Organizations handling sensitive data—healthcare, legal, financial—can now search encrypted data without exposing what they’re searching for.
Open Challenges
Some questions I’m actively exploring:
- Adaptive attacks: What if the adversary chooses queries based on prior observations?
- Multi-user scenarios: How do different users’ queries interact?
- Range queries: Can we extend this to numerical/temporal queries?
- Dynamic datasets: How to handle insertions/deletions while maintaining privacy?
Try It Yourself
I’m working on open-source implementations of these schemes. The core ideas are surprisingly simple to implement:
- Hash-based oblivious sets (Bloom filter variants)
- Threshold scoring for Boolean queries
- Entropy-optimized parameter selection
If you’re building encrypted search systems or working on privacy-preserving databases, these techniques should be directly applicable.
Related Papers:
- Encrypted Search with Oblivious Bernoulli Types
- Boolean Search with Approximate Sets
- Maximizing Confidentiality Through Entropy
- Master’s Thesis: Encrypted Search (2015)
- Estimating Confidentiality (2016 IEEE)
- Known-Plaintext Attacks on Time Series
This represents several years of work spanning my Master’s thesis through recent research. The combination of Bernoulli types with encrypted search finally gives us practical privacy-preserving search at scale.
Discussion