Skip to main content

Perfect Hashing: Space Bounds, Entropy, and Cryptographic Security

Can a perfect hash function be cryptographically secure, space-optimal, and maximum-entropy encoded all at once? This paper proves such a construction exists and analyzes exactly what you sacrifice to get all three.

The Impossible Triangle

Perfect hash functions typically face tradeoffs. Space-optimal constructions (CHD, BDZ) sacrifice randomness. Cryptographic hash functions waste space on collision resistance. Maximum-entropy encodings require extra bits.

The question: can you have all three?

The Construction

The data type is PH = {0,1} x N* with a constructor ph(X, r) = (n', N) where:

  • N = ceil(m/r): Hash table size (where m = |X|)
  • beta(x,n) = trunc(hash(x’ # n’), k)’ mod N: Hash function parameterized by seed n
  • n = min{j in N | beta is collision-free on X}: Search for smallest collision-free seed
  • n’: Geometric code encoding n (variable-length prefix-free)

The algorithm: try seeds n = 0, 1, 2, … until beta(.,n) has no collisions on X. Each trial is geometrically distributed with success probability p(m,r), so the expected space for encoding n’ achieves the information-theoretic lower bound of roughly 1.44 bits per element.

Under the random oracle assumption, hash: {0,1}* -> {0,1}^infinity outputs uniform random bits, making the final encoding (n’, N) indistinguishable from a random bit string.

Rate-Distortion Tradeoff

The “rate-distortion” framing comes from information theory: what’s the rate (bits per key) for a given distortion (lookup time)?

Zero distortion (O(1) lookup): roughly n log n bits. Constant distortion (small tables): practical two-level schemes approach roughly 1.44n bits. Variable distortion: trade bits for lookup time continuously.

Why Cryptographic Matters

Non-cryptographic perfect hashes are deterministic, so adversaries can engineer collision-inducing inputs. A cryptographic perfect hash (random oracle) prevents adversarial key selection (can’t craft keys that break the hash), side-channel attacks (encoding reveals no information about keys), and fingerprinting (maximum entropy makes encodings look random).

The Algebra of Composition

Section 5 proves that composing perfect hash functions preserves injectivity. If h1: S -> T and h2: T -> U are injective, then h2 composed with h1: S -> U is injective.

This connects to my algebraic_hashing library, where composition of cryptographic hashes via XOR preserves both security and structure.

Connection to Oblivious Computing

This paper is fundamentally about minimizing information leakage while maintaining functionality, which is the core of the oblivious computing philosophy. The maximum-entropy encoding hides which permutation you’re using, even though the function is deterministic.

Research paper • 4★ • View PDFGitHub

Discussion