PFC (Prefix-Free Codecs) is a header-only C++20 library built on a simple observation: data compression and zero-copy access are not contradictory goals, as long as you build on prefix-free codes and generic programming. The library gets 3-10x compression on typical integer distributions while maintaining full STL integration and type safety.
The zero-copy invariant
Traditional compression creates two worlds. Data lives uncompressed in memory (32 bits per integer) and compressed on disk (variable bits). You marshal between them. PFC eliminates that boundary:
\[ \text{In-memory representation} = \text{Wire representation} \]// Traditional approach
std::vector<uint32_t> data = {1, 2, 3, 5, 8, 13};
auto compressed = compress(data); // Marshal to wire format
store_to_disk(compressed);
auto restored = decompress(compressed); // Unmarshal back
// PFC approach
PackedContainer<uint32_t, EliasGamma> data;
data.push_back(1);
data.push_back(2);
data.push_back(3);
// Data is ALREADY compressed in memory
uint32_t value = data[0]; // Decodes from compressed form on access
// Write to disk? Zero copy.
write(fd, data.bytes().data(), data.bytes().size());
The data structure IS the compressed format. There’s no marshaling step.
Prefix-free codes
A code is prefix-free if no codeword is a prefix of another:
Prefix-free: Not prefix-free:
1 -> 0 1 -> 0
2 -> 10 2 -> 01 ("0" is a prefix of "01")
3 -> 110 3 -> 011
4 -> 1110
This matters because prefix-free codes compose naturally. Concatenate encodings without delimiters and decode unambiguously:
encode(1); // 0
encode(2); // 10
encode(3); // 110
// Result: 010110 (self-delimiting)
decode(); // Reads "0" -> 1
decode(); // Reads "10" -> 2
decode(); // Reads "110" -> 3
This enables streaming and composition without any framing overhead.
Universal codes
PFC implements several universal codes, meaning they’re asymptotically optimal for any distribution without knowing the distribution in advance.
Elias Gamma encodes positive integer \(n\) in \(2\lfloor\log_2 n\rfloor + 1\) bits:
n Binary Elias Gamma
1 1 1
2 10 010
3 11 011
4 100 00100
5 101 00101
8 1000 0001000
Write \(\lfloor\log_2 n\rfloor\) zeros, then the binary representation of \(n\). Asymptotically optimal for geometric distributions.
Fibonacci encoding uses Zeckendorf representation (sum of non-consecutive Fibonacci numbers) with a terminal “11” marker. Every positive integer has a unique representation.
Rice/Golomb codes are parametric, optimal for geometric distributions with known parameter. Quotient in unary, remainder in binary. Good for values with exponential decay.
Core architecture
Everything works at the bit level:
BitWriter writer;
writer.write_bit(1);
writer.write_bits(0b1010, 4);
writer.flush();
BitReader reader(bytes);
bool bit = reader.read_bit();
uint8_t nibble = reader.read_bits(4);
Codecs conform to a C++20 concept:
template<typename C>
concept Codec = requires(C codec, uint64_t value, BitWriter& writer, BitReader& reader) {
{ codec.encode(value, writer) } -> std::same_as<void>;
{ codec.decode(reader) } -> std::same_as<uint64_t>;
{ C::is_fixed_size() } -> std::same_as<bool>;
{ C::max_encoded_bits() } -> std::same_as<size_t>;
};
The Packed<T, Codec> template stores data in compressed form and decodes on access:
template<typename T, Codec C>
class Packed {
std::vector<uint8_t> bytes_;
public:
void set(T value) {
BitWriter writer(bytes_);
C::encode(value, writer);
}
T get() const {
BitReader reader(bytes_);
return C::decode(reader);
}
};
Algebraic type system
The prefix-free property means encodings compose without delimiters, so you get algebraic types for free.
Product types (pairs, tuples): concatenate encodings.
using Point = PackedPair<
Packed<int, Signed<EliasGamma>>,
Packed<int, Signed<EliasGamma>>
>;
Point p{10, 20}; // Compressed in memory
auto [x, y] = p; // Decode on access
Sum types (variants): discriminator bits + value encoding. For \(n\) types, uses \(\lceil\log_2 n\rceil\) bits for the discriminator.
using Value = PackedVariant<
Packed<int, EliasGamma>,
Packed<double, FloatingPoint<32>>,
Packed<std::string, PrefixedString>
>;
Optional types: 1-bit overhead.
\[ L(\text{optional}) = \begin{cases} 1 & \text{if none} \\ 1 + L(\text{value}) & \text{if some} \end{cases} \]Recursive types (linked lists, trees): handled through lazy evaluation and content-addressable storage.
using PackedList = PackedOptional<
PackedPair<
Packed<int, EliasGamma>,
PackedList // Recursive
>
>;
STL integration
PackedContainer provides random access via an offset vector:
template<typename T, Codec C>
class PackedContainer {
std::vector<uint8_t> bytes_;
std::vector<size_t> offsets_; // O(1) random access
public:
class Iterator {
T operator*() const {
BitReader reader(bytes_, offset_);
return C::decode(reader);
}
};
Iterator begin();
Iterator end();
T operator[](size_t i);
size_t size() const;
};
Standard algorithms work:
PackedContainer<uint32_t, EliasGamma> data;
auto max = *std::max_element(data.begin(), data.end());
auto sum = std::accumulate(data.begin(), data.end(), 0);
auto it = std::find(data.begin(), data.end(), 42);
std::sort(data.begin(), data.end());
Random access is O(1) via offsets plus decode time. Sequential access amortizes with decoder state caching. Sorting is O(n log n) comparisons, each requiring a decode.
Numeric codecs
Configurable floating point with IEEE-style format:
using Float16 = FloatingPoint<16, 5>; // 1 sign + 5 exponent + 10 mantissa
using BFloat16 = FloatingPoint<16, 8>; // 1 sign + 8 exponent + 7 mantissa
using Float24 = FloatingPoint<24, 7>; // Custom precision
| Format | Bits | Relative Error |
|---|---|---|
| Float16 | 16 | \(2^{-10} \approx 0.1\%\) |
| BFloat16 | 16 | \(2^{-7} \approx 0.8\%\) |
| Float32 | 32 | \(2^{-23} \approx 10^{-7}\) |
| Float64 | 64 | \(2^{-52} \approx 10^{-16}\) |
Signed integers use zigzag encoding to keep small-magnitude values short:
\[ \text{zigzag}(n) = \begin{cases} 2n & \text{if } n \geq 0 \\ -2n - 1 & \text{if } n < 0 \end{cases} \]Maps: \(0 \to 0, -1 \to 1, 1 \to 2, -2 \to 3, 2 \to 4, \ldots\)
Custom codecs
Writing a codec is straightforward. Here’s delta encoding for sorted sequences:
struct DeltaCodec {
uint64_t last_value = 0;
template<BitSink S>
void encode(uint64_t value, S& sink) {
uint64_t delta = value - last_value;
EliasGamma::encode(delta, sink);
last_value = value;
}
template<BitSource S>
uint64_t decode(S& source) {
uint64_t delta = EliasGamma::decode(source);
last_value += delta;
return last_value;
}
static constexpr bool is_fixed_size() { return false; }
};
Good for timestamps, sorted IDs, anything where consecutive deltas are small.
Performance
Compression ratios for typical distributions:
| Distribution | Codec | Bits/Value | Compression |
|---|---|---|---|
| Geometric | Elias Gamma | ~3.3 | 9.7x |
| Uniform [1,100] | Elias Gamma | ~10 | 3.2x |
| Power law | Fibonacci | ~4.5 | 7.1x |
| Sorted sequence | Delta + Gamma | ~2.8 | 11.4x |
Baseline: 32-bit integers = 32 bits/value.
Access overhead:
| Operation | Time (ns) | Notes |
|---|---|---|
| Decode Elias Gamma | 6.9 | Single value |
| Decode Fibonacci | 8.3 | Single value |
| Random access | 11.3 | Offset lookup + decode |
| Sequential access | 7.2 | Amortized with caching |
Raw memory access is about 2-3ns, so you’re paying 3-4x slower access for 3-10x compression. Whether that trade-off is worth it depends entirely on your bottleneck. If you’re memory-bound, smaller data means fewer cache misses, and the net effect can be positive.
One thing to watch: offset vector overhead. A packed container of 1000 elements with Elias Gamma encoding uses about 400 bytes of compressed data plus 8000 bytes of offsets. For small containers, the offsets dominate. For large containers, use sparse offsets (store every Nth offset):
PackedContainer<uint32_t, EliasGamma, 64> packed(1000000);
// Offsets: 1000000/64 * 8 = 125KB
// Data: ~400KB
// Total: 525KB vs 4MB raw
Design philosophy
PFC follows Stepanov’s generic programming principles: regular types (copyable, assignable, comparable, default constructible), value semantics, orthogonal decomposition of codecs, containers, and algorithms. Everything composes.
The mathematical foundation is algebraic: monoids for concatenation of encodings, functors for mapping over packed containers, sum and product types from the algebraic type system.
Zero-cost abstractions throughout. Fixed-size codecs take optimized paths at compile time via if constexpr. No virtual dispatch unless you explicitly opt into type erasure.
Comparison
| System | Approach | Zero-Copy | STL-compatible | Type-safe |
|---|---|---|---|---|
| PFC | Intrinsic encoding | Yes | Yes | Yes |
| zlib | External compression | No | No | No |
| Protocol Buffers | Schema-based | No | No | Yes |
| FlatBuffers | Zero-copy access | Yes | No | Yes |
| Cap’n Proto | Zero-copy RPC | Yes | No | Yes |
PFC occupies a specific niche: zero-copy plus STL compatibility plus type safety plus flexible compression. If you don’t need all four, simpler tools exist.
Quick start
git clone https://github.com/queelius/pfc.git
cd pfc
mkdir build && cd build
cmake .. -DCMAKE_CXX_STANDARD=20
cmake --build .
./pfc_tests
#include <pfc/pfc.hpp>
using namespace pfc;
std::vector<uint32_t> data = {1, 2, 3, 5, 8, 13, 21};
auto compressed = compress<EliasGamma>(data);
auto restored = decompress<EliasGamma, uint32_t>(compressed);
// Or use packed container directly
PackedContainer<uint32_t, EliasGamma> container;
for (uint32_t value : data) {
container.push_back(value);
}
uint32_t first = container[0];
std::sort(container.begin(), container.end());
Resources
- github.com/queelius/pfc
- Whitepaper (16 pages)
- Examples in
examples/, tests intests/
MIT licensed.
Discussion