Skip to main content

Encrypted Search and Oblivious Types

Encrypted search has a problem that no amount of encryption fixes: the search patterns themselves leak information. You can encrypt every document, every index, every query. The adversary still learns what you’re looking for by watching which records come back.

My recent work uses oblivious Bernoulli types to get information-theoretic privacy in encrypted search. Three papers, one core idea.

The Problem

Traditional encrypted search gives you two bad options:

  • Deterministic schemes are fast but leak access patterns. Same query produces the same encrypted trapdoor every time. An adversary watching the channel learns frequency, co-occurrence, time-series structure. Over enough queries, they reconstruct a lot.

  • Fully oblivious schemes (ORAM and friends) are secure but don’t scale. Orders of magnitude slower. Complete index reshuffling on every access. Fine for toy datasets, useless for real ones.

We need practical efficiency with real privacy guarantees. That’s what these papers give us.

Three Papers

1. Encrypted Search with Oblivious Bernoulli Types (paper)

The key insight: use approximate data structures (Bloom filters, sketches) not just for space efficiency, but as a privacy mechanism.

When you query with a probabilistic structure, the same logical query produces a different encrypted representation each time. False positives naturally obfuscate the true query. You get information-theoretic privacy from controlled approximation.

Traditional:     "machine learning" → hash(term) → deterministic trapdoor
Oblivious:       "machine learning" → Bloom filter → randomized trapdoor set

Every query generates a fresh, statistically indistinguishable representation. The adversary sees noise.

No ORAM overhead. Scales to large databases. Tunable privacy/accuracy trade-off. Works with existing encrypted search infrastructure.

2. Boolean Search with Oblivious Types (paper)

Real search isn’t keyword lookup. It’s Boolean queries: (machine AND learning) OR (deep AND neural).

This paper extends oblivious types to complex Boolean queries. The main challenge is error propagation: false positives combine across AND/OR/NOT in ways you need to track carefully. We also address how to hide the Boolean structure itself, and use threshold-based scoring to control false positive rates while preserving privacy.

The result: arbitrary Boolean queries over oblivious indexes, with provable error bounds.

3. Maximizing Confidentiality Through Entropy (paper)

If we’re using approximation for privacy, how much approximation is optimal?

This paper frames it as an optimization problem. Maximize entropy of observations (maximize adversary uncertainty), subject to accuracy constraints. Find optimal false positive rates for each query term.

The optimal secure index has an entropy distribution that depends on query term frequency, required retrieval accuracy, and the adversary’s prior knowledge. We derive closed-form solutions for optimal parameterization and show how to construct maximally-confidential indexes.

Why It Works

The thread connecting all three papers: privacy through controlled indistinguishability, not computational hardness.

Traditional cryptography says “it’s too hard to break the encryption.” This approach says “there’s nothing to break. The observations are inherently ambiguous.”

That gives you:

  • Information-theoretic privacy: quantum computers don’t help the adversary
  • Composable privacy: multiple queries don’t compound leakage linearly
  • Practical efficiency: no ORAM, no secure multi-party computation
  • Tunable trade-offs: you pick the privacy/accuracy balance for your application

How We Got Here

This builds on my Master’s thesis (2015) on encrypted search, where I first dug into the access pattern leakage problem. The IEEE paper (2016) on estimating confidentiality degradation showed how fast deterministic schemes leak information. The known-plaintext attack paper demonstrated time-series forecasting of privacy loss, making the case that deterministic encrypted search is fundamentally broken.

These new papers provide the fix.

What You Can Do With It

The techniques enable privacy-preserving document search in cloud storage, confidential database queries on untrusted infrastructure, private information retrieval with practical efficiency, and encrypted log analysis without leaking query patterns.

If you handle sensitive data (healthcare, legal, financial) and need to search encrypted records without exposing what you’re searching for, this is directly applicable.

Open Questions

Some things I’m still working on:

  1. Adaptive attacks: what if the adversary chooses queries based on prior observations?
  2. Multi-user scenarios: how do different users’ queries interact?
  3. Range queries: can we extend this to numerical and temporal queries?
  4. Dynamic datasets: how to handle insertions and deletions while maintaining privacy?

Discussion