Design Principles¶
PFC is built on a foundation of rigorous design principles inspired by Alex Stepanov's work on generic programming and the STL. Understanding these principles will help you use the library effectively and extend it correctly.
The Zero-Copy Invariant¶
Principle: In-memory representation equals wire format.
Traditional serialization involves two copies:
PFC eliminates this overhead:
Benefits¶
- Performance: No marshaling overhead
- Memory: Single canonical representation
- Simplicity: What you store is what you send
Implementation¶
All packed types maintain bit-level representation:
class PackedInt {
std::vector<uint8_t> bytes_; // Stored in encoded form
public:
uint32_t get() const {
BitReader reader(bytes_.data(), bytes_.size());
return EliasGamma::decode(reader); // Decode on access
}
};
Prefix-Free Codes¶
Principle: All encodings are self-delimiting.
A prefix-free code ensures no codeword is a prefix of another. This enables:
- Streaming without lookahead
- Concatenation without separators
- Guaranteed unique decoding
Example¶
Elias Gamma encoding of integers:
No codeword is a prefix of another, enabling unambiguous decoding:
Regular Types¶
Principle: All types satisfy Stepanov's concept of regular types.
A regular type supports:
- Default construction
- Copy construction and assignment
- Move construction and assignment
- Equality comparison
- Destruction
Why It Matters¶
Regular types can be:
- Stored in containers
- Passed by value
- Used with algorithms
- Reasoned about mathematically
Implementation¶
template<typename T, typename Codec>
class Packed {
public:
Packed() = default; // Default construct
Packed(const Packed&) = default; // Copy
Packed(Packed&&) noexcept = default; // Move
Packed& operator=(const Packed&) = default; // Copy assign
Packed& operator=(Packed&&) noexcept = default; // Move assign
~Packed() = default; // Destruct
bool operator==(const Packed& other) const; // Equality
};
Separation of Concerns¶
Principle: Codecs, containers, and algorithms are orthogonal.
Three Independent Dimensions¶
- Codecs: How values are encoded
- Containers: How collections are stored
- Algorithms: How data is processed
Composability¶
Any codec works with any container:
// Same container, different codecs
PackedVector<Packed<uint32_t, EliasGamma>> vec1;
PackedVector<Packed<uint32_t, EliasDelta>> vec2;
PackedVector<Packed<uint32_t, Fibonacci>> vec3;
// Same codec, different containers
PackedVector<Packed<uint32_t, EliasGamma>> vec;
PackedList<Packed<uint32_t, EliasGamma>> list;
PackedArray<Packed<uint32_t, EliasGamma>, 10> arr;
Concepts Over Inheritance¶
Principle: Use compile-time concepts instead of runtime polymorphism.
Traditional OOP Approach¶
class Codec {
public:
virtual void encode(uint32_t value, BitSink& sink) = 0;
virtual uint32_t decode(BitSource& source) = 0;
};
class EliasGamma : public Codec { /* ... */ };
Problems:
- Virtual dispatch overhead
- Dynamic allocation required
- Type erasure loses information
PFC Approach¶
template<typename C, typename T>
concept Codec = requires(T value, BitSink& sink, BitSource& source) {
{ C::encode(value, sink) } -> std::same_as<void>;
{ C::decode(source) } -> std::same_as<T>;
};
struct EliasGamma {
template<BitSink S>
static void encode(uint32_t value, S& sink) { /* ... */ }
template<BitSource S>
static uint32_t decode(S& source) { /* ... */ }
};
Benefits:
- Zero runtime overhead
- Better inlining
- Compile-time type checking
- No virtual tables
Generic Programming¶
Principle: Algorithms work on any type satisfying requirements.
Requirements as Concepts¶
template<typename T>
concept PackedValue = requires(T t) {
{ t.encode() } -> std::convertible_to<std::vector<uint8_t>>;
{ T::decode(std::declval<std::span<const uint8_t>>()) } -> std::same_as<T>;
};
// Algorithm works on ANY PackedValue
template<PackedValue T>
size_t compute_size(const std::vector<T>& values) {
return std::accumulate(values.begin(), values.end(), 0,
[](size_t sum, const T& v) { return sum + v.byte_size(); });
}
Value Semantics¶
Principle: Objects behave like mathematical values.
Implications¶
- Copying creates an independent object
- Equality is based on value, not identity
- No aliasing concerns
- Predictable behavior
Example¶
PackedInt a(42);
PackedInt b = a; // Independent copy
a = 100; // Doesn't affect b
assert(b == 42); // b retains original value
Algebraic Types¶
Principle: Types form algebraic structures.
Product Types (AND)¶
Combination of values:
Sum Types (OR)¶
Exclusive choice:
Laws¶
Product and sum types obey algebraic laws:
A × 1 ≅ A (unit)
A × B ≅ B × A (commutativity)
(A × B) × C ≅ A × (B × C) (associativity)
A + 0 ≅ A (identity)
A + B ≅ B + A (commutativity)
(A + B) + C ≅ A + (B + C) (associativity)
Proxy Iterators¶
Principle: Decode on access, never store decoded values.
Why Proxies?¶
Storing decoded values violates the zero-copy invariant:
// BAD: Stores decoded values
std::vector<uint32_t> decoded_values;
// GOOD: Decodes on access
PackedVector<Packed<uint32_t, EliasGamma>> packed_values;
Implementation¶
template<typename T, typename Codec>
class PackedIterator {
const uint8_t* data_;
size_t bit_offset_;
public:
class Reference {
const uint8_t* data_;
size_t bit_offset_;
public:
operator T() const {
BitReader reader(data_, bit_offset_);
return Codec::decode(reader); // Decode on conversion
}
};
Reference operator*() const {
return Reference{data_, bit_offset_};
}
};
Correctness Over Performance¶
Principle: Correctness first, then optimize.
Testing Strategy¶
- Property-based testing: Verify algebraic laws
- Round-trip testing: encode(decode(x)) == x
- Edge cases: Boundaries, empty, maximum values
- Fuzzing: Random inputs
Example Property¶
TEST_CASE("Codec round-trip") {
for (uint32_t value = 0; value < 10000; ++value) {
uint8_t buffer[64];
BitWriter writer(buffer);
EliasGamma::encode(value, writer);
writer.align();
BitReader reader(buffer);
auto decoded = EliasGamma::decode(reader);
REQUIRE(decoded == value);
}
}
Extensibility¶
Principle: Make extension easy, modification unnecessary.
Open/Closed Principle¶
The library is:
- Open for extension: Add new codecs, types, algorithms
- Closed for modification: Core doesn't need changes
Example: Custom Codec¶
struct MyCustomCodec {
static constexpr bool is_fixed_size() { return false; }
static constexpr size_t max_encoded_bits() { return 64; }
template<BitSink S>
static void encode(uint64_t value, S& sink) {
// Your encoding logic
}
template<BitSource S>
static uint64_t decode(S& source) {
// Your decoding logic
}
};
// Immediately works with all PFC infrastructure
PackedVector<Packed<uint64_t, MyCustomCodec>> vec;
auto compressed = compress<MyCustomCodec>(data);
Performance Characteristics¶
Time Complexity¶
- Encoding: O(log n) for universal codes, O(1) for fixed-width
- Decoding: Same as encoding
- Container access: O(n) worst case (sequential scan)
- Iteration: O(n) for sequential, O(1) per element
Space Complexity¶
- Universal codes: O(log n) bits per value
- Fixed-width: O(1) bits per value (constant)
- Containers: O(n) compressed bits, no overhead
Mathematical Foundation¶
PFC is grounded in information theory and coding theory:
Kraft-McMillan Inequality¶
For prefix-free codes:
Where li is the length of the i-th codeword.
Universal Codes¶
Elias codes achieve asymptotically optimal encoding for positive integers:
Elias Gamma: ⌈log₂(n)⌉ + 2⌊log₂(n)⌋ + 1 bits
Elias Delta: ⌈log₂(n)⌉ + 2⌊log₂(log₂(n))⌋ + 1 bits
Elias Omega: O(log n + log log n + log log log n + ...)
Lessons from STL¶
PFC adopts STL design patterns:
- Iterators: Separate algorithms from containers
- Value semantics: Predictable, composable
- Generic algorithms: Work on ranges, not specific types
- Concepts: Express requirements precisely
- No ownership: Containers own, algorithms borrow
Influences¶
- Alex Stepanov: Generic programming, STL design
- Bartosz Milewski: Category theory, algebraic types
- Peter Elias: Universal codes, information theory
- Haskell/ML: Algebraic data types, pattern matching
- Rust: Ownership, sum types, Result
Further Reading¶
- Stepanov & McJones: "Elements of Programming"
- Elias: "Universal Codeword Sets and Representations of the Integers"
- Knuth: "The Art of Computer Programming, Vol 1"
- MacKay: "Information Theory, Inference, and Learning Algorithms"