What if a perfect hash function could simultaneously be: (1) cryptographically secure, (2) space-optimal, and (3) maximum-entropy encoded? This paper proves such a construction exists—and analyzes exactly what you sacrifice to get all three.
The Impossible Triangle
Perfect hash functions typically face tradeoffs:
- Space-optimal constructions (CHD, BDZ) sacrifice randomness
- Cryptographic hash functions waste space on collision resistance
- Maximum-entropy encodings require extra bits
We ask: can you have all three?
The Construction
The data type is PH = {0,1} × ℕ* with a constructor ph(X, r) = (n', N) where:
- N = ⌈m/r⌉: Hash table size (where m = |X|)
- β(x,n) = trunc(hash(x’ # n’), k)’ mod N: Hash function parameterized by seed n
- n = min{j ∈ ℕ | β is collision-free on X}: Search for smallest collision-free seed
- n’: Geometric code encoding n (variable-length prefix-free)
The algorithm: Try seeds n = 0, 1, 2, … until β(·,n) has no collisions on X. Each trial is geometrically distributed with success probability p(m,r), so the expected space for encoding n’ achieves the information-theoretic lower bound of ~1.44 bits per element.
Random oracle assumption: hash: {0,1}* → {0,1}^∞ outputs uniform random bits, making the final encoding (n’, N) indistinguishable from a random bit string.
Rate-Distortion Tradeoff
The title’s “rate-distortion” comes from information theory: what’s the rate (bits per key) for a given distortion (lookup time)?
Zero distortion (O(1) lookup): ~n log n bits
Constant distortion (small tables): Practical two-level schemes approach ~1.44n bits
Variable distortion: Trade bits for lookup time continuously
Why Cryptographic Matters
Non-cryptographic perfect hashes are deterministic—adversaries can engineer collision-inducing inputs. A cryptographic perfect hash (random oracle) prevents:
- Adversarial key selection: Can’t craft keys that break the hash
- Side-channel attacks: Encoding reveals no information about keys
- Fingerprinting: Maximum entropy makes encodings look random
The Algebra of Composition
Section 5 proves: composing perfect hash functions preserves injectivity. If h₁: S → T and h₂: T → U are injective, then h₂ ∘ h₁: S → U is injective.
This connects to my algebraic_hashing library—composition of cryptographic hashes (via XOR) preserves both security AND structure.
Connection to Oblivious Computing
This paper is about minimizing information leakage while maintaining functionality—exactly the oblivious computing philosophy. The maximum-entropy encoding hides which permutation you’re using, even though the function is deterministic.
Discussion