Mathematical Foundation¶
This section establishes the mathematical theory underlying the Disjoint Interval Set library. We define intervals formally, establish their Boolean algebra structure, and analyze the complexity properties of our canonical representation.
Intervals¶
Definition¶
An interval over a totally ordered set \((T, \leq)\) is defined by its endpoints and boundary types:
where \(a \leq b\) are the lower and upper bounds, and boundary indicators specify whether endpoints are included (closed) or excluded (open).
Boundary Types¶
The four boundary configurations are:
- Closed: \([a, b]\) includes both endpoints
- Open: \((a, b)\) excludes both endpoints
- Left-open: \((a, b]\) excludes left, includes right
- Right-open: \([a, b)\) includes left, excludes right
Membership¶
Point membership is defined as:
Empty Intervals¶
An interval is empty if:
- \(a > b\), or
- \(a = b\) and at least one boundary is open
Empty intervals are represented canonically as \((0, -1)\) with both boundaries open, regardless of how they were constructed. This ensures unique representation.
Special Intervals¶
For types supporting infinity (e.g., double
):
- Unbounded: \((-\infty, \infty)\) contains all values
- Half-bounded: \([a, \infty)\), \((a, \infty)\), \((-\infty, b]\), \((-\infty, b)\)
- Singleton: \([a, a]\) contains exactly one point
Disjoint Interval Sets¶
Definition¶
A disjoint interval set is a finite collection of pairwise disjoint intervals:
Canonical Form¶
We maintain intervals in sorted order by lower bounds:
where \(I < J\) means the upper bound of \(I\) is less than the lower bound of \(J\), accounting for boundary types.
Adjacency¶
Two intervals are adjacent if they share a boundary point with opposite closure properties:
- \((a,b]\) and \((b,c)\) are adjacent at \(b\)
- \([a,b)\) and \([b,c]\) are adjacent at \(b\)
Adjacent intervals can be merged into a single interval:
- \((a,b] \cup (b,c) = (a,c)\)
- \([a,b) \cup [b,c] = [a,c]\)
The canonical form merges all adjacent intervals, ensuring unique representation.
Boolean Algebra Structure¶
Definition¶
The set of all disjoint interval sets \(\mathcal{D}\) forms a complete Boolean algebra:
where:
- Join (\(\vee\)): Set union \(\cup\)
- Meet (\(\wedge\)): Set intersection \(\cap\)
- Complement (\(\neg\)): Relative to universal set \(U = (-\infty, \infty)\)
- Bottom: Empty set \(\emptyset\)
- Top: Universal set \(U\)
Axioms¶
All Boolean algebra axioms hold:
1. Associativity¶
[(A \cup B) \cup C = A \cup (B \cup C)] [(A \cap B) \cap C = A \cap (B \cap C)]
2. Commutativity¶
[A \cup B = B \cup A] [A \cap B = B \cap A]
3. Distributivity¶
[A \cap (B \cup C) = (A \cap B) \cup (A \cap C)] [A \cup (B \cap C) = (A \cup B) \cap (A \cup C)]
4. Identity¶
[A \cup \emptyset = A] [A \cap U = A]
5. Complement¶
[A \cup \neg A = U] [A \cap \neg A = \emptyset]
6. Idempotence¶
[A \cup A = A] [A \cap A = A]
7. Absorption¶
[A \cup (A \cap B) = A] [A \cap (A \cup B) = A]
8. De Morgan's Laws¶
[\neg(A \cup B) = \neg A \cap \neg B] [\neg(A \cap B) = \neg A \cup \neg B]
Proof Sketch¶
These properties follow from:
- The canonical form ensures unique representation
- Operations are defined in terms of point-wise membership
- Set-theoretic operations preserve the disjoint property
- Normalization after each operation maintains sorted, merged form
A full formal verification would use proof assistants like Coq or Isabelle/HOL (future work).
Measure Theory¶
Interval Measure¶
For intervals over \(\mathbb{R}\), the measure (length) is:
Boundary types don't affect measure for real numbers (they have measure zero).
Set Measure¶
For a disjoint interval set:
This is well-defined because intervals are disjoint.
Additional Metrics¶
Span: The smallest interval containing all intervals in the set:
Density: Ratio of measure to span:
Gap Measure: Total measure of gaps between intervals:
Computational Complexity¶
Canonical Form Properties¶
Maintaining disjoint, sorted intervals provides:
- Unique Representation: Each set has exactly one representation
- Efficient Queries: Binary search for membership
- Linear Operations: Zipper merge for set operations
- Space Efficiency: Minimal interval count
Complexity Analysis¶
Let \(n\) and \(m\) be the number of intervals in sets \(A\) and \(B\).
Operation | Time | Space | Notes |
---|---|---|---|
Construction | \(O(n \log n)\) | \(O(n)\) | Sorting dominates |
Normalization | \(O(n)\) | \(O(n)\) | Single pass merge |
Contains | \(O(\log n)\) | \(O(1)\) | Binary search |
Union | \(O(n + m)\) | \(O(n + m)\) | Linear merge |
Intersection | \(O(nm)\) | \(O(\min(n,m))\) | Worst case; often better |
Complement | \(O(n)\) | \(O(n + 1)\) | Invert boundaries |
Measure | \(O(n)\) | \(O(1)\) | Sum lengths |
Space-Time Tradeoffs¶
Caching: Measure could be cached for O(1) queries at cost of O(1) extra space and update overhead.
Lazy Normalization: Could defer merging until needed, trading space (more intervals) for time (avoid unnecessary merges).
Index Structures: For very large sets (10K+ intervals), interval trees provide O(log n + k) overlap queries where k is result size.
Correctness Properties¶
Invariants¶
The implementation maintains:
- Sorted: Intervals ordered by lower bound
- Disjoint: No overlapping intervals
- Non-adjacent: Adjacent intervals are merged
- Non-empty: Empty intervals are removed
These invariants are:
- Established: By normalization after construction
- Maintained: By all operations
- Verified: Through comprehensive testing (97.46% coverage)
Type Safety¶
The type system ensures:
- Only totally ordered types can be interval boundaries (
Boundary
concept) - Operations preserve interval types
- Compile-time intervals have zero runtime overhead
- Invalid constructions (e.g., reversed bounds) result in empty intervals
Summary¶
The mathematical foundation of DIS provides:
- Formal rigor: Well-defined semantics based on set theory
- Completeness: Full Boolean algebra with all axioms satisfied
- Efficiency: Canonical form enables optimal algorithms
- Correctness: Strong invariants with comprehensive testing
This foundation enables both theoretical reasoning about program correctness and practical efficient implementation.