Skip to main content

Perfect Hashing: Space Bounds, Entropy, and Cryptographic Security

What if a perfect hash function could simultaneously be: (1) cryptographically secure, (2) space-optimal, and (3) maximum-entropy encoded? This paper proves such a construction exists—and analyzes exactly what you sacrifice to get all three.

The Impossible Triangle

Perfect hash functions typically face tradeoffs:

  • Space-optimal constructions (CHD, BDZ) sacrifice randomness
  • Cryptographic hash functions waste space on collision resistance
  • Maximum-entropy encodings require extra bits

We ask: can you have all three?

The Construction

The data type is PH = {0,1} × ℕ* with a constructor ph(X, r) = (n', N) where:

  • N = ⌈m/r⌉: Hash table size (where m = |X|)
  • β(x,n) = trunc(hash(x’ # n’), k)’ mod N: Hash function parameterized by seed n
  • n = min{j ∈ ℕ | β is collision-free on X}: Search for smallest collision-free seed
  • n’: Geometric code encoding n (variable-length prefix-free)

The algorithm: Try seeds n = 0, 1, 2, … until β(·,n) has no collisions on X. Each trial is geometrically distributed with success probability p(m,r), so the expected space for encoding n’ achieves the information-theoretic lower bound of ~1.44 bits per element.

Random oracle assumption: hash: {0,1}* → {0,1}^∞ outputs uniform random bits, making the final encoding (n’, N) indistinguishable from a random bit string.

Rate-Distortion Tradeoff

The title’s “rate-distortion” comes from information theory: what’s the rate (bits per key) for a given distortion (lookup time)?

Zero distortion (O(1) lookup): ~n log n bits Constant distortion (small tables): Practical two-level schemes approach ~1.44n bits Variable distortion: Trade bits for lookup time continuously

Why Cryptographic Matters

Non-cryptographic perfect hashes are deterministic—adversaries can engineer collision-inducing inputs. A cryptographic perfect hash (random oracle) prevents:

  • Adversarial key selection: Can’t craft keys that break the hash
  • Side-channel attacks: Encoding reveals no information about keys
  • Fingerprinting: Maximum entropy makes encodings look random

The Algebra of Composition

Section 5 proves: composing perfect hash functions preserves injectivity. If h₁: S → T and h₂: T → U are injective, then h₂ ∘ h₁: S → U is injective.

This connects to my algebraic_hashing library—composition of cryptographic hashes (via XOR) preserves both security AND structure.

Connection to Oblivious Computing

This paper is about minimizing information leakage while maintaining functionality—exactly the oblivious computing philosophy. The maximum-entropy encoding hides which permutation you’re using, even though the function is deterministic.

Research paper • 4★ • View PDFGitHub

Discussion