Zero-Overhead Inter-Process Communication for High-Performance Computing
Overview
This library provides production-ready, lock-free data structures built on POSIX shared memory for high-performance inter-process communication (IPC). Designed for simulations, real-time systems, and high-throughput applications where nanosecond-level performance matters.
Key Features
- π Zero read overhead - Proven identical performance to native arrays
- π Lock-free operations - Where algorithmically possible
- π¦ Auto-discovery - Named data structures findable across processes
- π― Cache-efficient - Optimized memory layouts for modern CPUs
- π§ Modern C++23 - Concepts, ranges, string_view, [[nodiscard]]
- π Configurable overhead - Template-based table sizes from 904B to 26KB
- π§ͺ Battle-tested - Comprehensive test suite with Catch2
Why Shared Memory?
Traditional IPC methods (sockets, pipes, message queues) require:
- Kernel transitions (~1000ns overhead)
- Data copying (2x memory bandwidth)
- Serialization (CPU cycles + allocations)
Shared memory provides:
- Direct memory access (~0.5ns for L1 hit)
- Zero-copy data sharing
- No serialization for POD types
- Cache coherence handled by hardware
Performance Guarantees
Operation | Time Complexity | Actual Performance |
Array Read | O(1) | 0.5-2ns (cache hit) |
Array Write | O(1) | 2-5ns (atomic CAS) |
Queue Enqueue | O(1) | 5-10ns (lock-free) |
Queue Dequeue | O(1) | 5-10ns (lock-free) |
Pool Acquire | O(1) | 10-20ns (lock-free) |
Atomic Update | O(1) | 2-5ns (hardware) |
Discovery | O(n) | ~100ns (one-time) |
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Shared Memory Segment β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β ββββββββββββ β
β β RefCount β Atomic reference counting β
β ββββββββββββ€ β
β β Table β Metadata for discovery β
β β ββEntry1β "sensor_data" β offset, size β
β β ββEntry2β "event_queue" β offset, size β
β β ββ... β β
β ββββββββββββ€ β
β β Array<T> β Contiguous data β
β ββββββββββββ€ β
β β Queue<T> β Circular buffer + atomics β
β ββββββββββββ€ β
β β Pool<T> β Free list + object storage β
β ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Components
1. Foundation Layer
- posix_shm - POSIX shared memory lifecycle management
- shm_table - Metadata and discovery system
- shm_span - Base class for memory regions
2. Data Structures
- shm_array<T> - Fixed-size contiguous array
- shm_queue<T> - Lock-free FIFO queue
- shm_atomic<T> - Named atomic variables
- shm_object_pool<T> - O(1) object allocation
- shm_ring_buffer<T> - Bulk operations for streaming
3. Template Configurations
POSIX shared memory wrapper with RAII and reference counting.
posix_shm_impl< shm_table > posix_shm
Default shared memory type with standard table configuration.
Quick Start
sensors[0] = 3.14159;
events.enqueue({timestamp, data});
auto value = sensors[0];
if (auto e = events.dequeue()) {
process(*e);
}
Fixed-size array in shared memory with zero-overhead access.
Lock-free circular queue for shared memory IPC.
Core POSIX shared memory management with automatic reference counting.
Fixed-size shared memory array with STL compatibility.
Use Cases
High-Frequency Trading
- Market data distribution
- Order book sharing
- Strategy coordination
Scientific Simulation
- Particle systems (10,000+ entities)
- Sensor data aggregation (MHz rates)
- Grid-based computations (CFD, weather)
Robotics & Autonomous Systems
- Sensor fusion pipelines
- Control loop communication
- Perception data sharing
Game Servers
- Entity state replication
- Physics synchronization
- Event broadcasting
Proven Performance
Our benchmarks demonstrate zero overhead for reads:
Sequential Read Performance:
Heap array: 2.32 ns/op
Shared array: 2.32 ns/op β Identical!
Shared raw pointer: 2.31 ns/op β Direct access
Random Access Performance:
Heap array: 2.33 ns/op
Shared array: 2.33 ns/op β Same cache behavior
Safety & Correctness
- Type safety via C++23 concepts
- Bounds checking in debug builds
- RAII memory management
- Atomic operations for thread safety
- Process crash resilience
Getting Started
- Tutorial - Step-by-step guide
- Performance Guide - Optimization tips
- Architecture - Design deep dive
- API Reference - Complete documentation
Requirements
- C++23 compiler (GCC 13+, Clang 16+)
- POSIX-compliant OS (Linux, macOS, BSD)
- CMake 3.20+
License
MIT License - Use freely in commercial projects
Contributing
Contributions welcome! Areas of interest:
- Additional data structures (B-tree, hash map)
- Performance optimizations (huge pages, NUMA)
- Language bindings (Python, Rust)
- Platform ports (Windows shared memory)