PFC: Zero-Copy Data Compression Through Prefix-Free Codecs

June 10, 2024 10 min read Updated: October 7, 2025

PFC (Prefix-Free Codecs) is a header-only C++20 library that demonstrates data compression and zero-copy access are compatible goals when built on prefix-free codes and generic programming. The library achieves 3-10× compression on typical integer distributions while maintaining full STL integration and type safety.

The Core Innovation: Zero-Copy Invariant

Traditional compression requires marshaling:

// Traditional approach
std::vector<uint32_t> data = {1, 2, 3, 5, 8, 13};

// Step 1: Compress (marshal to wire format)
auto compressed = compress(data);

// Step 2: Store/transmit compressed data
store_to_disk(compressed);

// Step 3: Decompress (unmarshal back)
auto restored = decompress(compressed);

// Problem: data lives in two worlds
// - Uncompressed in memory (32 bits × n)
// - Compressed on wire/disk (variable bits)

PFC’s approach: The zero-copy invariant

\[ \text{In-memory representation} = \text{Wire representation} \]

// PFC approach
PackedContainer<uint32_t, EliasGamma> data;
data.push_back(1);
data.push_back(2);
data.push_back(3);

// Data is ALREADY compressed in memory
// Access through proxy decodes on-the-fly
uint32_t value = data[0];  // Decodes from compressed form

// Write to disk? Zero copy!
write(fd, data.bytes().data(), data.bytes().size());

The data structure IS the compressed format. No marshaling step exists.

Prefix-Free Codes: The Foundation

PFC is built on prefix-free codes—self-delimiting encodings where you can determine where one value ends and the next begins without external delimiters.

The Prefix-Free Property

A code is prefix-free if no codeword is a prefix of another:

✅ Prefix-free:
  1   → 0
  2   → 10
  3   → 110
  4   → 1110

❌ Not prefix-free:
  1   → 0
  2   → 01    ← "0" is a prefix of "01"
  3   → 011

Why this matters: Prefix-free codes compose naturally:

// Encode multiple values without delimiters
encode(1);  // 0
encode(2);  // 10
encode(3);  // 110
// Result: 010110 (self-delimiting!)

// Decode unambiguously
decode();  // Reads "0" → 1
decode();  // Reads "10" → 2
decode();  // Reads "110" → 3

This enables streaming and composition.

Universal Codes: Optimal Without Prior Knowledge

PFC implements several universal codes—asymptotically optimal for any distribution without knowing the distribution in advance.

Elias Gamma

Encodes positive integer \(n\) in \(2\lfloor\log_2 n\rfloor + 1\) bits:

n     Binary   Elias Gamma
1     1        1
2     10       010
3     11       011
4     100      00100
5     101      00101
8     1000     0001000

Encoding algorithm:

Write \(\lfloor\log_2 n\rfloor\) zeros
Write binary representation of \(n\)

Mathematical property:

\[ L(n) = 2\lfloor\log_2 n\rfloor + 1 \approx 2\log_2 n \]

This is asymptotically optimal for geometric distributions.

Fibonacci Encoding

Uses Fibonacci sequence for encoding:

n   Fibonacci   Encoding
1   1           11
2   2           011
3   3           0011
4   5           1011
5   8           00011

Property: Every positive integer has a unique Zeckendorf representation (sum of non-consecutive Fibonacci numbers).

Encoding: Find representation, mark with terminal “11”.

Rice/Golomb Codes

Parametric codes optimal for geometric distribution with parameter \(p\):

// Rice code with parameter k=3
RiceCodec<3> codec;

// Optimal for geometric distribution with p = 1/2^3

Encoding:

Quotient \(q = \lfloor n/2^k \rfloor\) in unary
Remainder \(r = n \bmod 2^k\) in binary

Excellent for values with exponential decay.

Core Architecture

Bit-Level I/O

// Bit-precise writing
BitWriter writer;
writer.write_bit(1);
writer.write_bits(0b1010, 4);
writer.flush();

// Bit-precise reading
BitReader reader(bytes);
bool bit = reader.read_bit();
uint8_t nibble = reader.read_bits(4);

Key insight: All codecs work at the bit level, enabling optimal packing without byte alignment waste.

Codec Concept

template<typename C>
concept Codec = requires(C codec, uint64_t value, BitWriter& writer, BitReader& reader) {
    // Encoding
    { codec.encode(value, writer) } -> std::same_as<void>;

    // Decoding
    { codec.decode(reader) } -> std::same_as<uint64_t>;

    // Properties
    { C::is_fixed_size() } -> std::same_as<bool>;
    { C::max_encoded_bits() } -> std::same_as<size_t>;
};

C++20 concepts ensure type safety at compile time.

Packed<T, Codec> Template

template<typename T, Codec C>
class Packed {
    std::vector<uint8_t> bytes_;

public:
    void set(T value) {
        BitWriter writer(bytes_);
        C::encode(value, writer);
    }

    T get() const {
        BitReader reader(bytes_);
        return C::decode(reader);
    }
};

Key property: The Packed object stores data in compressed form. Access decodes on-the-fly.

Algebraic Type System

PFC provides a complete algebraic type system with optimal encoding:

Product Types

// Pair with minimal overhead
using Point = PackedPair<
    Packed<int, Signed<EliasGamma>>,
    Packed<int, Signed<EliasGamma>>
>;

Point p{10, 20};  // Compressed in memory
auto [x, y] = p;  // Decode on access

Encoding: Concatenate encodings (no delimiters needed—prefix-free!)

Sum Types (Variants)

// Variant with minimal discriminator
using Value = PackedVariant<
    Packed<int, EliasGamma>,
    Packed<double, FloatingPoint<32>>,
    Packed<std::string, PrefixedString>
>;

Value v = 42;  // Encodes: discriminator + value

Discriminator optimization: For \(n\) types, uses \(\lceil\log_2 n\rceil\) bits.

Optional Types

// Optional with 1-bit overhead
PackedOptional<Packed<int, EliasGamma>> maybe_value = 42;
PackedOptional<Packed<int, EliasGamma>> no_value = std::nullopt;

Encoding: 1 bit for presence + value if present.

\[ L(\text{optional}) = \begin{cases} 1 & \text{if none} \\ 1 + L(\text{value}) & \text{if some} \end{cases} \]

Recursive Types

// Linked list in compressed form
using PackedList = PackedOptional<
    PackedPair<
        Packed<int, EliasGamma>,
        PackedList  // Recursive!
    >
>;

PackedList list = {{1, {{2, {{3, std::nullopt}}}}}};

PFC handles recursive structures through lazy evaluation and content-addressable storage.

STL Integration: Zero-Copy Iteration

Random Access Container

template<typename T, Codec C>
class PackedContainer {
    std::vector<uint8_t> bytes_;
    std::vector<size_t> offsets_;  // O(1) random access

public:
    class Iterator {
        // Proxy value that decodes on access
        T operator*() const {
            BitReader reader(bytes_, offset_);
            return C::decode(reader);
        }
    };

    // STL-compliant interface
    Iterator begin();
    Iterator end();
    T operator[](size_t i);
    size_t size() const;
};

Key innovation: Offset vector enables O(1) random access without decompressing everything.

STL Algorithm Compatibility

PackedContainer<uint32_t, EliasGamma> data;
// ... fill data ...

// Works with STL algorithms!
auto max = *std::max_element(data.begin(), data.end());
auto sum = std::accumulate(data.begin(), data.end(), 0);
auto it = std::find(data.begin(), data.end(), 42);
std::sort(data.begin(), data.end());

Cost model:

Random access: O(1) via offsets + decode time
Sequential access: Amortized O(1) by caching decoder state
Sorting: O(n log n) comparisons, each requiring decode

Numeric Codecs

Configurable Floating Point

// 16-bit float (1 sign + 5 exponent + 10 mantissa)
using Float16 = FloatingPoint<16, 5>;

// BFloat16 (1 sign + 8 exponent + 7 mantissa)
using BFloat16 = FloatingPoint<16, 8>;

// Custom precision
using Float24 = FloatingPoint<24, 7>;  // 17-bit mantissa

Encoding: IEEE-style format with configurable precision.

Precision/size trade-off:

Format	Bits	Relative Error
Float16	16	\(2^{-10} \approx 0.1\%\)
BFloat16	16	\(2^{-7} \approx 0.8\%\)
Float32	32	\(2^{-23} \approx 10^{-7}\)
Float64	64	\(2^{-52} \approx 10^{-16}\)

Rational Numbers

Packed<Rational, RationalCodec<EliasGamma>> ratio{22, 7};  // π approximation

// Encodes numerator and denominator separately
// Optional: continued fraction approximation for compression

Exact representation: No floating-point rounding errors.

Complex Numbers

// Rectangular form
PackedComplex<FloatingPoint<32>> z{3.0, 4.0};  // 3 + 4i

// Polar form (often more compact for unit circle)
PackedComplexPolar<FloatingPoint<16>, EliasGamma> w{1.0, M_PI/4};

Signed Integer Encoding: Zigzag

// Signed integers via zigzag encoding
Packed<int, Signed<EliasGamma>> value = -5;

Zigzag mapping:

\[ \text{zigzag}(n) = \begin{cases} 2n & \text{if } n \geq 0 \\ -2n - 1 & \text{if } n < 0 \end{cases} \]

Maps: \(0 \to 0, -1 \to 1, 1 \to 2, -2 \to 3, 2 \to 4, \ldots\)

Benefit: Small magnitude values (positive or negative) have short encodings.

Type Erasure for Runtime Polymorphism

// Store different packed types in same container
TypeErasedPackedContainer container;

container.push_back(PackedInt(42));
container.push_back(PackedRational(22, 7));
container.push_back(PackedString("hello"));
container.push_back(PackedOptional<PackedInt>(std::nullopt));

// Iterate heterogeneously
for (const auto& item : container) {
    // Dynamic dispatch based on stored type
    std::visit([](const auto& value) {
        std::cout << value << "\n";
    }, item);
}

Implementation: External polymorphism pattern—virtual methods operate on opaque byte buffers.

Key property: Preserves zero-copy semantics internally despite runtime polymorphism.

Algorithms on Packed Data

Zero-Copy Transform

template<Codec C, typename Func>
auto packed_transform(
    PackedContainer<T, C>& input,
    Func f
) -> PackedContainer<decltype(f(T{})), C> {
    PackedContainer<decltype(f(T{})), C> result;
    for (const auto& value : input) {
        result.push_back(f(value));  // Decode, transform, re-encode
    }
    return result;
}

Parallel Processing

#include <execution>

PackedContainer<double, FloatingPoint<16>> data(1000000);
// ... fill with data ...

// Parallel transform with C++17 execution policies
auto squared = packed_transform(
    std::execution::par_unseq,
    data.begin(), data.end(),
    [](double x) { return x * x; }
);

Performance: Decompression can be parallelized trivially—each thread operates on independent segments.

Performance Analysis

Compression Ratios

For typical distributions:

Distribution	Codec	Bits/Value	Compression
Geometric	Elias Gamma	~3.3	9.7×
Uniform [1,100]	Elias Gamma	~10	3.2×
Power law	Fibonacci	~4.5	7.1×
Sorted sequence	Delta + Gamma	~2.8	11.4×

Baseline: 32-bit integers = 32 bits/value

Access Overhead

Operation	Time (ns)	Notes
Decode Elias Gamma	6.9	Single value
Decode Fibonacci	8.3	Single value
Random access	11.3	Offset lookup + decode
Sequential access	7.2	Amortized with decoder state caching

Comparison: Raw memory access ≈ 2-3 ns

Trade-off: 3-4× slower access for 3-10× compression.

Memory Layout

// Traditional: 32 bits × 1000 = 4000 bytes
std::vector<uint32_t> traditional(1000);

// PFC: ~400 bytes (compressed) + 8000 bytes (offsets) = 8400 bytes
PackedContainer<uint32_t, EliasGamma> packed(1000);

Problem: Offset vector overhead can dominate for small containers.

Solution: Sparse offset vectors (only every N-th element) for large containers:

// Store offset every 64 elements
PackedContainer<uint32_t, EliasGamma, 64> packed(1000000);
// Offsets: 1000000/64 × 8 = 125KB
// Data: ~400KB
// Total: 525KB vs 4MB raw

Design Principles: Stepanov’s Generic Programming

PFC follows Alex Stepanov’s principles:

1. Regular Types

All types are:

Copyable: auto x = y;
Assignable: x = y;
Comparable: x == y, x < y
Default constructible: T x;

PackedInt a = 42;
PackedInt b = a;      // Copy
a = 10;               // Assign
bool same = (a == b); // Compare
PackedInt c;          // Default construct

2. Value Semantics

Objects behave like values, not references:

PackedInt a = 5;
PackedInt b = a;
a = 10;
// b is still 5 (not affected by change to a)

3. Separation of Concerns

Orthogonal components:

Codecs: How to encode/decode
Containers: Where to store bytes
Algorithms: What operations to perform

// Mix and match
PackedContainer<int, EliasGamma> gamma_container;
PackedContainer<int, Fibonacci> fib_container;

// Same algorithms work on both
std::sort(gamma_container.begin(), gamma_container.end());
std::sort(fib_container.begin(), fib_container.end());

4. Zero-Cost Abstractions

Compile-time decisions:

if constexpr (Codec::is_fixed_size()) {
    // Optimized path for fixed-size codecs
} else {
    // General path with offset tracking
}

No runtime virtual dispatch unless explicitly requested (type erasure).

5. Mathematical Foundation

Based on algebraic structures:

Monoids: Concatenation of encodings
Functors: map over packed containers
Algebras: Sum and product types

Custom Codec Example

struct DeltaCodec {
    uint64_t last_value = 0;

    template<BitSink S>
    void encode(uint64_t value, S& sink) {
        uint64_t delta = value - last_value;
        EliasGamma::encode(delta, sink);
        last_value = value;
    }

    template<BitSource S>
    uint64_t decode(S& source) {
        uint64_t delta = EliasGamma::decode(source);
        last_value += delta;
        return last_value;
    }

    static constexpr bool is_fixed_size() { return false; }
};

Use case: Sorted sequences (timestamps, IDs) where deltas are small.

Practical Applications

1. Memory-Constrained Embedded Systems

// Store sensor data in compressed form
PackedContainer<float, FloatingPoint<16>> sensor_readings;

// 50% memory savings with acceptable precision

2. Network Transmission

PackedContainer<uint32_t, EliasGamma> data;
// ... fill ...

// Send compressed bytes directly
send(socket, data.bytes().data(), data.bytes().size());

// Receiver decodes on access
uint32_t value = data[i];

Benefit: No serialization/deserialization overhead.

3. Database Storage

// Column store with per-column compression
struct ColumnStore {
    PackedContainer<int, DeltaCodec> ids;         // Sorted
    PackedContainer<double, FloatingPoint<16>> prices;
    PackedContainer<std::string, PrefixedString> names;
};

4. Checkpoint/Resume

// Serialize program state
PackedTuple<
    Packed<int, EliasGamma>,
    PackedList<Packed<double, Float32>>,
    PackedOptional<PackedString>
> checkpoint = make_checkpoint();

// Write directly to disk
write(fd, checkpoint.bytes(), checkpoint.size());

Limitations

1. Random access overhead: Offset vectors can be large

Solution: Sparse offsets for large containers

2. Update performance: Inserting requires re-encoding

Solution: Use for append-heavy or read-heavy workloads

3. Fixed codecs: Can’t adapt to data distribution dynamically

Solution: Choose codec based on expected distribution

Future Work

Huffman coding: Adaptive to actual data distribution
Arithmetic coding: Asymptotically optimal
SIMD acceleration: Parallel decoding of multiple values
Compression-aware algorithms: Exploit encoding structure
Lazy offset construction: Build offsets on-demand

Comparison with Other Systems

System	Approach	Zero-Copy	STL-compatible	Type-safe
PFC	Intrinsic encoding	✅	✅	✅
zlib	External compression	❌	❌	❌
Protocol Buffers	Schema-based	❌	❌	✅
FlatBuffers	Zero-copy access	✅	❌	✅
Cap’n Proto	Zero-copy RPC	✅	❌	✅

PFC’s niche: Zero-copy + STL compatibility + type safety + flexible compression.

Quick Start

# Header-only library
git clone https://github.com/queelius/pfc.git
cd pfc
mkdir build && cd build
cmake .. -DCMAKE_CXX_STANDARD=20
cmake --build .
./pfc_tests

Basic usage:

#include <pfc/pfc.hpp>
using namespace pfc;

// Compress integers
std::vector<uint32_t> data = {1, 2, 3, 5, 8, 13, 21};
auto compressed = compress<EliasGamma>(data);

// Decompress
auto restored = decompress<EliasGamma, uint32_t>(compressed);

// Or use packed container directly
PackedContainer<uint32_t, EliasGamma> container;
for (uint32_t value : data) {
    container.push_back(value);
}

// Access like normal container
uint32_t first = container[0];
std::sort(container.begin(), container.end());

Design Philosophy

🎯 Compression as Type Property: Not an external operation, but intrinsic to the type

0️⃣ Zero-Copy Invariant: In-memory = wire format

🧩 Composability: Prefix-free codes enable natural composition

📐 Mathematical Rigor: Based on universal coding theory

🔧 Generic Programming: Follows Stepanov’s principles

⚡ Zero-Cost Abstractions: Compile-time optimizations

Resources

Repository: github.com/queelius/pfc
Whitepaper: papers/whitepaper.pdf (16 pages, academic format)
Examples: See examples/ directory
Tests: Comprehensive test suite in tests/

License

MIT

PFC: Where compression becomes intrinsic. Zero-copy data structures through prefix-free codes and generic programming.