Bernoulli Tuples

This document explores Bernoulli approximations of tuple types (product types), demonstrating two distinct approaches: tuples of Bernoulli values versus Bernoulli tuples. We focus on pairs for concrete examples, but the principles generalize to n-tuples.

Two Approaches to Approximate Pairs

Consider the pair type (bool, bool) with values like (true, false). There are two fundamentally different ways to introduce Bernoulli approximation:

1. Pair of Bernoulli Bools: `(B_bool, B_bool)`

This is a compositional approach where we first construct Bernoulli approximations of each component, then pair them.

// Latent pair
(x1, x2) where x1, x2 ∈ bool

// Pair of Bernoulli observations  
(B_bool(x1), B_bool(x2))

Construction Process: 1. Start with latent values x1 and x2 2. Independently observe B_bool(x1) and B_bool(x2) 3. Construct the pair (B_bool(x1), B_bool(x2))

Error Characteristics: - Each component has its own error rate: Pr{B_bool(x1) ≠ x1} = ε1, Pr{B_bool(x2) ≠ x2} = ε2 - Errors are independent between components - The pair is correct iff both components are correct: Pr{pair correct} = (1-ε1)(1-ε2)

2. Bernoulli Pair of Bools: `B_{(bool,bool)}`

This is a direct approximation of the pair type itself, treating the pair as a single unit.

// Latent pair
p = (x1, x2) where p ∈ (bool, bool)

// Bernoulli approximation of the pair
B_{(bool,bool)}(p)

Construction Process: 1. Start with latent pair p = (x1, x2) 2. Directly construct approximation B_{(bool,bool)}(p) using some process (e.g., hash-based, channel-based) 3. Extract components via first(B_{(bool,bool)}(p)) and second(B_{(bool,bool)}(p))

Error Characteristics: - Single error function: Pr{B_{(bool,bool)}(p) ≠ p} = Error(p) - Can have correlation between component errors - Allows different error patterns than independent component errors

Detailed Analysis

Case 1: Pair of Bernoulli Bools

Individual Component Confusion Matrices:

First component: B_bool^{(1)} (first-order) with symmetric error ε₁: | latent \ observed | T | F | |-------------------|---|---| | T | 1-ε₁ | ε₁ | | F | ε₁ | 1-ε₁ |

Second component: B_bool^{(1)} (first-order) with symmetric error ε₂: | latent \ observed | T | F | |-------------------|---|---| | T | 1-ε₂ | ε₂ | | F | ε₂ | 1-ε₂ |

Induced Confusion Matrix for (B_bool^{(1)}, B_bool^{(1)}):

latent \ observed	`(T,T)`	`(T,F)`	`(F,T)`	`(F,F)`
`(T,T)`	`(1-ε₁)(1-ε₂)`	`(1-ε₁)ε₂`	`ε₁(1-ε₂)`	`ε₁ε₂`
`(T,F)`	`(1-ε₁)ε₂`	`(1-ε₁)(1-ε₂)`	`ε₁ε₂`	`ε₁(1-ε₂)`
`(F,T)`	`ε₁(1-ε₂)`	`ε₁ε₂`	`(1-ε₁)(1-ε₂)`	`(1-ε₁)ε₂`
`(F,F)`	`ε₁ε₂`	`ε₁(1-ε₂)`	`(1-ε₁)ε₂`	`(1-ε₁)(1-ε₂)`

Why is this Order 2?

The 4×4 confusion matrix above is fully determined by just 2 parameters: ε₁ and ε₂. Here's why:

Maximum possible order: 4×4 matrix with row-sum constraint gives 4(4-1) = 12 degrees of freedom
Actual order: Only 2 free parameters due to independence constraint
Structure: Each entry is a product P₁ᵢⱼ × P₂ₖₗ where Pᵢⱼ comes from component confusion matrices

Special Case - Identical Components (ε₁ = ε₂ = ε):

latent \ observed	`(T,T)`	`(T,F)`	`(F,T)`	`(F,F)`
`(T,T)`	`(1-ε)²`	`(1-ε)ε`	`ε(1-ε)`	`ε²`
`(T,F)`	`(1-ε)ε`	`(1-ε)²`	`ε²`	`ε(1-ε)`
`(F,T)`	`ε(1-ε)`	`ε²`	`(1-ε)²`	`(1-ε)ε`
`(F,F)`	`ε²`	`ε(1-ε)`	`(1-ε)ε`	`(1-ε)²`

This becomes order 1 with just parameter ε, showing perfect symmetry.

Rank Analysis:

The rank of the confusion matrix captures something more fundamental than parameter count:

For different components (ε₁ ≠ ε₂): - Parameter count: 2 - Rank: 4 (all entries are unique products) - Inference: Full rank → perfect Bayesian inference theoretically possible

For identical components (ε₁ = ε₂ = ε): - Parameter count: 1
- Rank: 3 when ε ≠ 0.5, rank 1 when ε = 0.5 - Inference: Rank 3 allows inference, rank 1 (maximum entropy) makes inference impossible

Key insight: Parameter count affects estimation difficulty, but rank determines fundamental inferability - whether P(latent|observed) can be computed even with perfect parameter knowledge.

Information-Theoretic Analysis

Adding entropy as a third fundamental measure alongside order and rank:

Matrix Entropy H(confusion matrix): - Different components (ε₁ ≠ ε₂): All 16 entries unique → higher entropy - Identical components (ε₁ = ε₂ = ε): Only 3 unique values → lower entropy - Maximum entropy case (ε = 0.5): Uniform confusion → H maximized

Information Hierarchy for Compositional vs Direct:

Compositional (B_bool, B_bool): 1. Unconstrained 4×4 matrix: maximum entropy space 2. Independence constraint: entropy reduced (Kronecker product structure) 3. Parameter knowledge (ε₁, ε₂): further entropy reduction 4. Observations: conditional entropy H(latent|observed)

Direct B_{(bool,bool)}: 1. Unconstrained 4×4 matrix: same maximum entropy space
2. Structural constraints (α, β parameters): different entropy reduction 3. Can achieve entropy patterns impossible under independence

Entropy and Inference Quality: - Lower matrix entropy → more structured approximation - Independence constraint creates highly structured (low entropy) confusion matrices - Direct approximation allows higher entropy (less structured) patterns - Rank-1 cases correspond to maximum conditional entropy H(latent|observed)

Confusion Matrix Scale Analysis

Small tuples: For (bool, bool) we have 4 possible values, so 4×4 confusion matrix is manageable.

Large tuples: For (X, Y) where X, Y are large types, confusion matrix becomes |X×Y| × |X×Y| which is typically enormous. However:

Compositional structure creates patterns: - Independence constraint forces Kronecker product structure - Sparsity determined by component impossibilities
- Rank properties inherited from component ranks - Block structure enables targeted analysis of sub-problems

Direct approximation allows arbitrary patterns: - Can break independence to create correlations impossible compositionally - May achieve different sparsity patterns optimized for specific use cases - Rank unconstrained by component limitations

This structural perspective explains why compositional and direct approximation lead to fundamentally different inference capabilities even for the same underlying types.

Case 2: Bernoulli Pair of Bools

This allows much richer error patterns. For example, a second-order model might have different error rates for different pairs:

Latent Pair	Error Rate
`(T,T)`	`ε₁`
`(T,F)`	`ε₂`
`(F,T)`	`ε₃`
`(F,F)`	`ε₄`

Confusion Matrix for B_{(bool,bool)}:

For the direct Bernoulli approximation of pairs, we have the latent/observed confusion matrix:

latent \ observed	`(T,T)`	`(T,F)`	`(F,T)`	`(F,F)`
`(T,T)`	`q11`	`q12`	`q13`	`q14`
`(T,F)`	`q21`	`q22`	`q23`	`q24`
`(F,T)`	`q31`	`q32`	`q33`	`q34`
`(F,F)`	`q41`	`q42`	`q43`	`q44`

Where qij = Pr{observe pair_j | latent pair_i} and each row sums to 1.

Maximum Order: - There are 4 possible pairs in (bool, bool) - Maximum confusion matrix is 4×4 with 4(4-1) = 12 degrees of freedom - So B_{(bool,bool)}^{(12)} is the highest-order model

Example: Parameterized Confusion Matrix

A practical second-order model with parameters α (spurious rate) and β (miss rate):

latent \ observed	`(T,T)`	`(T,F)`	`(F,T)`	`(F,F)`
`(T,T)`	`1-β`	`β/3`	`β/3`	`β/3`
`(T,F)`	`α/3`	`1-α`	`α/3`	`α/3`
`(F,T)`	`α/3`	`α/3`	`1-α`	`α/3`
`(F,F)`	`α/3`	`α/3`	`α/3`	`1-α`

Where: - α = spurious rate (probability of observing wrong pair when latent is non-target)
- β = miss rate (probability of missing the correct pair) - Off-diagonal errors are distributed uniformly among wrong options

This introduces correlations impossible with independent component errors while maintaining tractable parameterization.

Set-Theoretic Interpretation

We can view pairs as characteristic functions over the product space:

Pair of Bernoulli Bools

χ₁: {0,1} → B_bool  // First component indicator  
χ₂: {0,1} → B_bool  // Second component indicator
pair_membership(i,j) = (χ₁(i), χ₂(j))

Bernoulli Pair

Χ: {0,1}×{0,1} → B_bool  // Joint indicator over product space
Χ = B_{({0,1}×{0,1}) → bool}(characteristic_function)

Implementation Considerations

Memory Layout

Pair of Bernoulli Bools: Store two separate Bernoulli values
Bernoulli Pair: Store single approximation + extraction functions

Error Propagation

Compositional: Errors compose via probability arithmetic
Direct: Single error model for the entire structure

Use Cases

Pair of Bernoulli Bools: - When components come from independent noisy sources - When you need to operate on components separately - Bloom filter pairs, independent sensor readings

Bernoulli Pair:
- When space efficiency requires joint compression - When error patterns have natural correlation - Hash-based approximate pairs, encrypted/oblivious pairs

Connection to General Framework

This demonstrates the general pattern for any product type A × B:

Compositional: (B_A, B_B) - approximate components first, then compose
Direct: B_{A×B} - approximate the product type directly

Both are valid Bernoulli types with different error characteristics and use cases. The choice depends on whether you want independent component errors or allow correlation through joint approximation.

Future Work

Similar patterns apply to: - Sum types: B_A + B_B vs B_{A+B}
- Function types: B_A → B_B vs B_{A→B} - Higher-order products: B_A × B_B × B_C vs B_{A×B×C}

The systematic exploration of these compositional vs direct approximation patterns forms the foundation of the complete Bernoulli type system.