Skip to main content

Rethinking Encrypted Search: From Access Pattern Leakage to Information-Theoretic Privacy

Encrypted search has a fundamental problem: you can’t hide what you’re looking for. Even with the best encryption, search patterns leak information. My recent work develops a new approach using oblivious Bernoulli types to achieve information-theoretic privacy in encrypted search systems.

The Core Problem

Traditional encrypted search systems face an impossible trade-off:

  • Deterministic schemes are efficient but leak access patterns

    • Same query → same encrypted trapdoor
    • Adversary learns: frequency, co-occurrence, time series
    • Result: significant information leakage over time
  • Fully oblivious schemes (like ORAM) are secure but impractical

    • Orders of magnitude slower
    • Requires complete index reshuffling
    • Doesn’t scale to real-world databases

We need something in between: practical efficiency with strong privacy guarantees.

1. Encrypted Search with Oblivious Bernoulli Types (encrypted-search-ob-types)

The key insight: Use approximate data structures (Bloom filters, sketches) not just for space efficiency, but as a privacy mechanism.

When you query with a probabilistic structure:

  • Same logical query → different encrypted representation each time
  • False positives naturally obfuscate the true query
  • Information-theoretic privacy from controlled approximation

How it works:

Traditional:     "machine learning" → hash(term) → deterministic trapdoor
Oblivious:       "machine learning" → Bloom filter → randomized trapdoor set

Every query generates a fresh, statistically indistinguishable representation. The adversary learns almost nothing from observing queries.

Practical advantages:

  • No ORAM overhead
  • Scales to massive databases
  • Tunable privacy/accuracy trade-off
  • Works with existing encrypted search infrastructure

Real search isn’t just keyword lookup—it’s Boolean queries: (machine AND learning) OR (deep AND neural).

This paper extends oblivious types to complex Boolean queries:

  • Error propagation analysis: How do false positives combine across AND/OR/NOT?
  • Query obfuscation: How to hide Boolean structure itself
  • Threshold-based retrieval: Using scoring to control false positives

Key result: We can evaluate arbitrary Boolean queries over oblivious indexes while maintaining privacy, with provable error bounds.

Challenge: Error accumulates through operators. Solution: threshold-based ranking that controls false positive rates while preserving privacy.

If we’re going to use approximation for privacy, how much approximation is optimal?

This paper frames it as an optimization problem:

  • Maximize entropy of observations (maximizes adversary uncertainty)
  • Subject to accuracy constraints
  • Find optimal false positive rates for each query term

Information-theoretic result: The optimal secure index has an entropy distribution that depends on:

  1. Query term frequency distribution
  2. Required retrieval accuracy
  3. Adversary’s prior knowledge

We derive closed-form solutions for optimal parameterization and show how to construct maximally-confidential indexes.

Why This Approach Works

The fundamental insight connecting all three papers:

Privacy through controlled indistinguishability, not through computational hardness.

Traditional cryptography says: “It’s too hard to break the encryption.” Our approach says: “There’s nothing to break—the observations are inherently ambiguous.”

This gives us:

  • Information-theoretic privacy: Not broken by future quantum computers
  • Composable privacy: Multiple queries don’t compound leakage linearly
  • Practical efficiency: No ORAM, no secure multi-party computation overhead
  • Tunable trade-offs: Application-specific privacy/accuracy balance

Connection to Earlier Work

This builds directly on my Master’s thesis (2015) on encrypted search, where I first identified the access pattern leakage problem and explored early defenses.

The subsequent IEEE paper (2016) on estimating confidentiality degradation showed how quickly deterministic schemes leak information—motivating the need for oblivious approaches.

The known-plaintext attack paper demonstrated time-series forecasting of privacy loss, making the case that deterministic encrypted search is fundamentally broken.

These new papers provide the solution.

Real-World Impact

This isn’t just theory. The techniques enable:

  • Privacy-preserving document search in cloud storage
  • Confidential database queries on untrusted infrastructure
  • Private information retrieval with practical efficiency
  • Encrypted log analysis without leaking query patterns

Organizations handling sensitive data—healthcare, legal, financial—can now search encrypted data without exposing what they’re searching for.

Open Challenges

Some questions I’m actively exploring:

  1. Adaptive attacks: What if the adversary chooses queries based on prior observations?
  2. Multi-user scenarios: How do different users’ queries interact?
  3. Range queries: Can we extend this to numerical/temporal queries?
  4. Dynamic datasets: How to handle insertions/deletions while maintaining privacy?

Try It Yourself

I’m working on open-source implementations of these schemes. The core ideas are surprisingly simple to implement:

  • Hash-based oblivious sets (Bloom filter variants)
  • Threshold scoring for Boolean queries
  • Entropy-optimized parameter selection

If you’re building encrypted search systems or working on privacy-preserving databases, these techniques should be directly applicable.

Related Papers:


This represents several years of work spanning my Master’s thesis through recent research. The combination of Bernoulli types with encrypted search finally gives us practical privacy-preserving search at scale.

Discussion