Succinct Data Structures¶
The succinct.hpp header provides space-efficient data structures that support powerful query operations in constant or logarithmic time while maintaining the zero-copy invariant.
Overview¶
Succinct data structures achieve near-optimal space complexity (n + o(n) bits) while providing efficient query operations. The PFC library implements these structures with its signature zero-copy architecture: wire format equals in-memory format.
SuccinctBitVector¶
A space-efficient bit vector supporting fast rank and select operations.
Space Complexity¶
- Bit data: n bits (exact)
- Rank support: ~0.2% overhead (n/512 + n/65536 bits)
- Total: n + 0.2%n bits
Time Complexity¶
| Operation | Complexity | Description |
|---|---|---|
operator[] |
O(1) | Access bit at position |
rank(pos) |
O(1) | Count 1s in [0, pos) |
select(k) |
O(log n) | Find position of k-th 1 |
set/reset/flip |
O(1) | Modify individual bits |
build_rank_support() |
O(n) | Build rank/select metadata |
Construction¶
#include <pfc/succinct.hpp>
using namespace pfc::succinct;
// Create bit vector with 1000 bits
SuccinctBitVector bv(1000);
// Set some bits
for (size_t i = 0; i < 1000; i += 7) {
bv.set(i); // Set every 7th bit
}
// Build rank/select support (required for rank/select operations)
bv.build_rank_support();
Query Operations¶
// Access individual bits
bool bit = bv[42]; // O(1) access
// Rank: count number of 1s before position
size_t ones_before_100 = bv.rank(100); // Count 1s in [0, 100)
// Select: find position of k-th 1
size_t tenth_one = bv.select(10); // Find position of 10th 1-bit
// Rank at end gives total count
size_t total_ones = bv.rank(bv.size());
Zero-Copy Serialization¶
The key advantage of PFC's succinct structures is instant serialization/deserialization:
// Serialize to buffer
uint8_t buffer[10000];
BitWriter writer(buffer);
SuccinctBitVector::encode(bv, writer);
writer.align();
// Deserialize from buffer (zero-copy!)
BitReader reader(buffer, 10000);
auto bv2 = SuccinctBitVector::decode(reader);
// Works immediately - no build step needed!
size_t rank = bv2.rank(100); // Instant query
Traditional Approach:
PFC Approach:
Block-Based Rank Support¶
SuccinctBitVector uses a two-level indexing structure for O(1) rank queries:
Architecture¶
Superblock (65536 bits = 8KB)
├── Block (512 bits)
├── Block (512 bits)
├── Block (512 bits)
└── ...
Metadata:
- Superblocks: Absolute rank every 65536 bits (32-bit integer)
- Blocks: Relative rank within superblock (16-bit integer)
Space Overhead Calculation¶
Superblock overhead: 32 bits / 65536 bits = 0.049%
Block overhead: 16 bits / 512 bits = 3.125%
Average: (65536 / 65536) * 0.049% + (512 / 512) * 3.125% / 128 = 0.024%
Total overhead: ~0.2% (including alignment)
Modification Operations¶
SuccinctBitVector bv(100);
// Set bits
bv.set(42); // Set bit 42 to 1
bv.set(43, true); // Explicit set to 1
bv.set(44, false); // Set to 0 (same as reset)
// Reset bits
bv.reset(42); // Set bit 42 to 0
// Flip bits
bv.flip(42); // Toggle bit 42
// Note: After modifications, rebuild rank support
bv.build_rank_support();
Size and Capacity¶
SuccinctBitVector bv(1000);
size_t n = bv.size(); // Number of bits (1000)
size_t bytes = bv.byte_size(); // Memory size in bytes
bool empty = bv.empty(); // Check if size is 0
API Reference¶
Constructor¶
Creates a bit vector with num_bits bits, all initialized to initial_value.
Bit Access¶
Returns the value of the bit at position pos. O(1) time.
Bit Modification¶
set(pos, value): Set bit at position to value (default true)reset(pos): Set bit at position to falseflip(pos): Toggle bit at position
Note: Rank support must be rebuilt after modifications.
Rank Support¶
Builds the rank/select metadata structure. Must be called after construction or modification and before using rank() or select() operations. O(n) time, 0.2% space overhead.
Rank Operation¶
Returns the number of 1-bits in the range [0, pos). O(1) time.
Special cases:
- rank(0) returns 0
- rank(size()) returns total count of 1s
- rank(pos) where pos > size() returns total count of 1s
Select Operation¶
Returns the position of the k-th 1-bit (0-indexed). O(log n) time using binary search.
Returns size_t(-1) if fewer than k+1 ones exist.
Size Operations¶
size(): Number of bits in the vectorbyte_size(): Memory footprint in bytesempty(): True if size is 0
Zero-Copy Encoding/Decoding¶
template<BitSink S>
static void encode(const SuccinctBitVector& bv, S& sink);
template<BitSource S>
static SuccinctBitVector decode(S& source);
Encodes or decodes a SuccinctBitVector to/from a bit stream. The decoded structure is immediately queryable without any build step.
Use Cases¶
1. Sparse Sets¶
Represent sparse sets of integers efficiently:
// Represent set {7, 14, 21, 28, ...} in universe [0, 1000)
SuccinctBitVector set(1000);
for (size_t i = 7; i < 1000; i += 7) {
set.set(i);
}
set.build_rank_support();
// Check membership
bool contains_42 = set[42]; // false
bool contains_49 = set[49]; // true
// Count elements before position
size_t elements_before_100 = set.rank(100); // 14 elements
// Find k-th element
size_t fifth_element = set.select(4); // Position 28 (0-indexed)
2. Bitmap Indices¶
Efficient filtering in databases:
// Index for "active users" in database of 1M users
SuccinctBitVector active_users(1000000);
// Mark active users
for (auto user_id : active_user_ids) {
active_users.set(user_id);
}
active_users.build_rank_support();
// Query: How many active users have ID < 500000?
size_t count = active_users.rank(500000);
// Query: What is the 1000th active user ID?
size_t user_id = active_users.select(999);
3. Compressed Graphs¶
Building block for succinct graph representations:
// Mark which edges exist in a graph
// For n vertices, adjacency represented as n^2 bit vector
size_t n = 1000;
SuccinctBitVector adjacency(n * n);
// Add edges
auto add_edge = [&](size_t u, size_t v) {
adjacency.set(u * n + v);
};
add_edge(0, 1);
add_edge(0, 5);
// ... more edges ...
adjacency.build_rank_support();
// Query: How many edges from vertex 0?
size_t degree = adjacency.rank((0 + 1) * n) - adjacency.rank(0 * n);
4. Zero-Copy Persistence¶
Instant loading of pre-built data structures:
// Server startup: load pre-built bit vector from file
FILE* fp = fopen("index.pfc", "rb");
std::vector<uint8_t> data = read_all(fp);
BitReader reader(data.data(), data.size());
auto index = SuccinctBitVector::decode(reader);
// Immediately usable - no parsing or build time!
size_t count = index.rank(position);
Performance Characteristics¶
Memory Layout¶
┌─────────────────────────────────────────────────────┐
│ Bit Vector Data (n bits) │
├─────────────────────────────────────────────────────┤
│ Superblock Ranks (32-bit integers, n/65536 entries)│
├─────────────────────────────────────────────────────┤
│ Block Ranks (16-bit integers, n/512 entries) │
└─────────────────────────────────────────────────────┘
Total: n + (n/65536)*32 + (n/512)*16 bits
= n + 0.049%n + 3.125%n bits
≈ n + 0.2%n bits (with alignment)
Cache Efficiency¶
- Block size: 512 bits = 64 bytes (one cache line)
- Rank query: At most 2 cache line accesses (1 superblock + 1 block)
- Sequential access: Excellent cache locality
Comparison with Alternatives¶
| Approach | Space | Rank Time | Select Time | Build Time |
|---|---|---|---|---|
| Naive (std::vector |
n bits | O(n) | O(n) | O(1) |
| Dense rank array | 32n bits | O(1) | O(1) | O(n) |
| SuccinctBitVector | n + 0.2%n | O(1) | O(log n) | O(n) |
Integration with PFC¶
SuccinctBitVector implements the PackedValue concept and integrates seamlessly:
// Use in PackedVector
PackedVector<SuccinctBitVector> bitvector_collection;
bitvector_collection.push_back(bv1);
bitvector_collection.push_back(bv2);
// Use in PackedOptional
PackedOptional<SuccinctBitVector> maybe_bv = bv;
// Compose with other codecs
using CompactBitVec = Packed<SuccinctBitVector, codecs::EliasGamma>;
Future Succinct Structures¶
The SuccinctBitVector serves as the foundation for additional succinct structures planned:
- RoaringBitmap: Hybrid compressed integer sets
- WaveletTree: Sequence rank/select on arbitrary alphabets
- InterpolativeCoding: Optimal compression for sorted sequences
- PackedGraph: Succinct graph representations
- Rank-Select Dictionary: For very sparse bit vectors
See SUCCINCT_DESIGN.md for detailed design documentation.
References¶
- Jacobson, G. (1989). "Space-efficient static trees and graphs"
- Clark, D. (1996). "Compact Pat Trees"
- Vigna, S. (2008). "Broadword Implementation of Rank/Select Queries"
See Also¶
- Core API - BitReader/BitWriter fundamentals
- Codecs - Universal integer codes
- Packed Types - PackedValue concept