pfc
A header-only C++20 library for zero-copy, prefix-free data representations with algebraic types and succinct data structures
Resources & Distribution
Source Code
Package Registries
PFC - Prefix-Free Codecs
A header-only C++20 library for zero-copy, prefix-free data representations with full algebraic type support and STL integration.
Documentation | API Reference | Examples
Features
- Zero-copy invariant: In-memory bytes = wire bytes (no marshaling)
- Prefix-free codes: Self-delimiting, streaming-friendly
- Algebraic type system: Sum types, product types, recursive types
- STL integration: Works seamlessly with standard algorithms
- Rich codec library: Universal codes, floating point, rationals, and more
- Compile-time composition: Modern C++ concepts and templates
- Mathematical rigor: Based on Stepanov’s generic programming principles
Quick Start
#include <pfc/pfc.hpp>
using namespace pfc;
// Simple compression
std::vector<uint32_t> data = {1, 2, 3, 5, 8, 13, 21};
auto compressed = compress<EliasGamma>(data);
auto decompressed = decompress<EliasGamma, uint32_t>(compressed);
// Algebraic types
using OptionalInt = PackedOptional<Packed<int, Signed<EliasGamma>>>;
OptionalInt some_value = 42;
OptionalInt no_value = std::nullopt;
// STL integration
PackedContainer<uint32_t, EliasGamma> container;
container.push_back(100);
container.push_back(200);
std::sort(container.begin(), container.end());
Building
mkdir build && cd build
cmake .. -DCMAKE_CXX_STANDARD=20
cmake --build .
# Run tests
./pfc_tests
# Run examples
./tutorial
Test Coverage
The library is thoroughly tested with 86% test suite pass rate (6/7 suites passing, 3477+ assertions):
- Core Tests (
pfc_tests): All codecs, packed values, bit I/O, basic algorithms - Advanced Tests (
test_advanced): Algebraic types, STL integration, coordinate systems - Production Tests (
test_production): Huffman, LZ77, error handling, custom allocators - CRC Tests (
test_crc): CRC8, CRC16, CRC32 checksums - Succinct Tests (
test_succinct): SuccinctBitVector with rank/select (3167 assertions) - New Codecs Tests (
test_new_codecs): VByte, ExpGolomb, EliasOmega (2000 assertions) - Stream/Arithmetic Tests (
test_stream_arithmetic): Stream I/O and arithmetic coding (partial)
Library Structure
Core Components (include/pfc/)
core.hpp: Bit I/O (BitReader/BitWriter), concepts, fundamental abstractionscodecs.hpp: Universal codes (Elias, Fibonacci, Rice, VByte, ExpGolomb, Omega)numeric_codecs.hpp: Floating point, rationals, complex numberspacked.hpp: Packed value types and containersalgebraic.hpp: Sum/product types, Maybe, Either, Resultstl_integration.hpp: STL-compatible containers and iteratorscoordinates.hpp: Vectors, matrices, affine transformationsalgorithms.hpp: Generic algorithms for packed dataerror_handling.hpp: Result types for robust error handlinghuffman.hpp: Huffman coding with frequency analysislz77.hpp: LZ77 compression algorithmcrc.hpp: CRC8, CRC16, CRC32 checksumsstream_io.hpp: Stream-based I/O operationsarithmetic_coding.hpp: Arithmetic coding for compression (production-ready)succinct.hpp: SuccinctBitVector with rank/select supportallocator_support.hpp: Custom allocator supportpfc.hpp: Main header including everything
Codecs Available
Universal Integer Codes
- Unary: Simple unary encoding
- Elias Gamma/Delta/Omega: Asymptotically optimal universal codes
- Fibonacci: Based on Fibonacci sequence
- Rice/Golomb: Parametric codes for geometric distributions
- VByte: Industry-standard varint (Protocol Buffers compatible)
- Exponential-Golomb: Parameterized family for video coding (H.264, HEVC)
Numeric Types
- FloatingPoint: Configurable mantissa/exponent (Float16, BFloat16, Float32, Float64)
- FixedDecimal: Exact decimal representation
- Rational: Fractions with continued fraction approximation
- Complex: Both rectangular and polar forms
Composite Types
- Signed: Zigzag encoding for signed integers
- Optional/Maybe: Nullable values
- Either/Result: Error handling
- Vector/List: Variable-length sequences
- Variant: Type-safe discriminated unions
Design Principles
This library follows Alex Stepanov’s generic programming principles:
- Regular types: All types are copyable, assignable, and comparable
- Value semantics: Objects behave like values, not references
- Separation of concerns: Algorithms, containers, and codecs are orthogonal
- Zero-cost abstractions: Template metaprogramming, no virtual dispatch
- Mathematical foundation: Based on algebraic structures
Advanced Usage
Custom Codecs
struct MyCodec {
static constexpr bool is_fixed_size() { return false; }
static constexpr size_t max_encoded_bits() { return 64; }
template<BitSink S>
static void encode(uint64_t value, S& sink) {
// Your encoding logic
}
template<BitSource S>
static uint64_t decode(S& source) {
// Your decoding logic
}
};
Type-Erased Containers
// Store different packed types in the same container
TypeErasedPackedContainer container;
container.push_back(PackedInt(42));
container.push_back(PackedRational(22, 7));
container.push_back(PackedString("hello"));
Parallel Processing
PackedContainer<double, FloatingPoint<16>> data(1000000);
// ... fill with data ...
// Parallel transform with TBB/execution policies
auto squared = packed_transform(std::execution::par_unseq,
data.begin(), data.end(),
[](double x) { return x * x; });
Performance
The library achieves excellent compression ratios while maintaining zero-copy semantics:
- Geometric distributions: ~1.7 bits/value with Elias Gamma
- Sorted sequences: 65%+ improvement with delta encoding
- Floating point: Configurable precision/size tradeoff
- All operations are zero-copy with decode-on-access via proxies
- Succinct structures: n + 0.2%n bits with O(1) rank/select
Compression Comparison (10,000 geometric values)
| Codec | Compression | Use Case |
|---|---|---|
| Elias Gamma | 94.8% | Small values, geometric distributions |
| VByte | 75% | Moderate values, cache-friendly |
| Elias Omega | 94.7% | Large integers (>1M) |
| ExpGolomb | 91.9% | Tunable for specific distributions |
Documentation
Full documentation is available at https://queelius.github.io/pfc/
Contributing
Contributions are welcome! The library is designed to be extended with:
- New codecs
- Additional algebraic types
- More generic algorithms
- Performance optimizations
See the Contributing Guide for details.
License
MIT License - see LICENSE file for details.
Production Status
Production-Ready Components
- All universal codes (Elias, Fibonacci, Rice, VByte, ExpGolomb)
- Arithmetic coding (fully functional)
- Huffman coding
- LZ77 compression
- SuccinctBitVector with rank/select
- All STL integration features
- All algebraic types
Experimental/Limited Components
- Range coding: Works for short sequences only (2-3 bytes)
- Recommendation: Use arithmetic coding for production workloads
- Technical limitation: Integer division rounding in encode/decode cycle
- Alternative: Consider ANS (Asymmetric Numeral Systems) for future implementation
Known Limitations
- Range Coding: Only reliable for sequences up to 3 bytes due to mathematical rounding issues in the encode/decode cycle. Use arithmetic coding instead, which is fully functional.
- Floating Point Precision: Some edge cases may have precision issues with very small or very large values.
Acknowledgments
Inspired by the work of Alex Stepanov, the design prioritizes mathematical elegance and efficiency. Special thanks to the authors of universal coding theory and the C++ standards committee for concepts and ranges.