System Architecture Overview
┌──────────────────────────────────────────────────────────┐
│ Application Layer │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Simulation │ │ Renderer │ │ Analytics │ │
│ └─────────────┘ └──────────────┘ └──────────────┘ │
└───────────────────────┬──────────────────────────────────┘
│
┌───────────────────────▼──────────────────────────────────┐
│ Data Structure Layer │
│ ┌────────┐ ┌────────┐ ┌──────┐ ┌──────┐ ┌────────┐ │
│ │ Array │ │ Queue │ │ Pool │ │ Ring │ │ Atomic │ │
│ └────────┘ └────────┘ └──────┘ └──────┘ └────────┘ │
└───────────────────────┬──────────────────────────────────┘
│
┌───────────────────────▼──────────────────────────────────┐
│ Foundation Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ posix_shm│ │ shm_table│ │ shm_span │ │
│ └──────────┘ └──────────┘ └──────────┘ │
└───────────────────────┬──────────────────────────────────┘
│
┌───────────────────────▼──────────────────────────────────┐
│ POSIX API Layer │
│ shm_open, mmap, shm_unlink, futex │
└───────────────────────┬──────────────────────────────────┘
│
┌───────────────────────▼──────────────────────────────────┐
│ Kernel Space │
│ Virtual Memory, Page Tables, IPC │
└──────────────────────────────────────────────────────────┘
Memory Layout
Shared Memory Segment Structure
Offset Size Component
──────────────────────────────────────────
0x0000 4 Reference Counter (atomic)
0x0004 4-424KB Metadata Table (configurable)
0xXXXX Variable Data Structure 1
0xYYYY Variable Data Structure 2
...
Metadata Table Entry
struct entry {
char name[MAX_NAME_SIZE];
size_t offset;
size_t size;
size_t elem_size;
size_t num_elem;
bool active;
};
Component Design
1. posix_shm - Memory Management
Responsibilities:
- POSIX shared memory lifecycle
- Reference counting
- Memory mapping
- Automatic cleanup
Key Design Decisions:
- RAII Pattern: Automatic cleanup in destructor
- Reference Counting: Last process cleans up
- Header Embedding: Table stored in shared memory
- Template Configuration: Compile-time table sizing
template<typename TableType>
struct header {
std::atomic<int> ref_count;
TableType table;
};
};
POSIX shared memory wrapper with RAII and reference counting.
2. shm_table - Discovery System
Responsibilities:
- Name-to-offset mapping
- Dynamic structure discovery
- Allocation tracking
- Metadata storage
Design Pattern: Service Locator
table->add("sensor_data", offset, size);
auto* entry = table->find("sensor_data");
Trade-offs:
- Fixed maximum entries (compile-time)
- Linear search (simple, cache-friendly)
- No defragmentation (predictable performance)
Memory Management: Stack/Bump Allocation
Our current implementation uses stack allocation (also called bump allocation):
size_t allocate(size_t size) {
size_t offset = sizeof(table) + get_total_allocated_size();
total_allocated += size;
return offset;
}
This means:
- ✅ Simple: O(1) allocation, no scanning
- ✅ Predictable: No fragmentation delays
- ❌ No reclamation: Erased structures leave permanent gaps
- ❌ Memory leak: Repeated create/delete exhausts memory
Best Practices:
- Initialize once: Create all structures at startup
- Never erase: Treat structures as permanent
- Use pools: For dynamic needs, use
shm_object_pool
- Size appropriately: Allocate enough memory upfront
Future Improvement: A free-list allocator could reuse gaps, but adds complexity and potential fragmentation issues.
3. Data Structure Implementations
shm_array<T> - Contiguous Storage
Memory Layout:
[T][T][T][T][T][T][T][T]...
Operations:
operator[]
: Direct pointer arithmetic
- Iterators: Raw pointers
- No dynamic growth (fixed size)
shm_queue<T> - Lock-Free FIFO
Memory Layout:
[Header]
atomic<size_t> head
atomic<size_t> tail
size_t capacity
[T][T][T][T]... (circular buffer)
Algorithm: Lock-Free Enqueue
bool enqueue(T value) {
size_t tail = tail_.load(acquire);
size_t next = (tail + 1) % capacity;
if (next == head_.load(acquire))
return false;
buffer[tail] = value;
tail_.store(next, release);
return true;
}
shm_object_pool<T> - Free List
Memory Layout:
[Header]
atomic<uint32_t> free_head
uint32_t next[capacity]
[T][T][T][T]... (object storage)
Algorithm: Lock-Free Stack
uint32_t acquire() {
uint32_t old_head = free_head.load();
while (old_head != NULL) {
uint32_t new_head = next[old_head];
if (CAS(&free_head, old_head, new_head))
return old_head;
}
return NULL;
}
shm_ring_buffer<T> - Bulk Operations
Design Goals:
- Optimize for throughput over latency
- Support non-consuming reads
- Allow overwrite mode for streaming
Key Features:
size_t push_bulk(span<T> values);
size_t pop_bulk(span<T> output);
void push_overwrite(T value);
uint64_t total_written();
Synchronization Strategies
1. Lock-Free Algorithms
Used In: Queue, Pool, Ring Buffer
Techniques:
- Compare-And-Swap (CAS)
- Fetch-And-Add (FAA)
- Memory ordering (acquire-release)
Benefits:
- No kernel involvement
- Bounded latency
- Progress guarantee
2. Wait-Free Readers
Used In: Array, Atomic
Guarantee: Readers never block
T read(size_t index) {
return data[index];
}
3. Memory Ordering
data[index] = value;
flag.store(true, memory_order_release);
if (flag.load(memory_order_acquire)) {
auto value = data[index];
}
Template Metaprogramming
Compile-Time Configuration
template<size_t MaxNameSize, size_t MaxEntries>
return sizeof(entry) * MaxEntries + sizeof(size_t);
}
};
Metadata table for managing shared memory data structures.
static constexpr size_t size_bytes()
Get the actual size of this table in bytes.
Benefits:
- Zero runtime overhead
- Custom memory footprints
- Type safety
Concept Constraints
template<typename T>
requires std::is_trivially_copyable_v<T>
};
Fixed-size array in shared memory with zero-overhead access.
Performance Optimizations
1. Cache Line Alignment
struct alignas(64) CacheLine {
std::atomic<uint64_t> value;
char padding[56];
};
Prevents: False sharing between cores
2. Memory Prefetching
for (size_t i = 0; i < n; i++) {
__builtin_prefetch(&data[i + 8], 0, 1);
process(data[i]);
}
Benefit: Hides memory latency
3. Bulk Operations
for (auto& item : items)
queue.push(item);
queue.push_bulk(items);
Benefit: Amortizes overhead
4. Branch-Free Code
size_t min = a ^ ((a ^ b) & -(a > b));
Benefit: No pipeline stalls
Error Handling Philosophy
1. Constructor Failures
if (!initialize())
throw std::runtime_error("...");
}
Rationale: RAII requires successful construction
2. Operation Failures
[[nodiscard]] bool enqueue(T value);
[[nodiscard]] std::optional<T> dequeue();
Rationale: Let caller decide on failure handling
3. Resource Exhaustion
if (pool.full())
return invalid_handle;
Rationale: System continues with reduced capacity
Scalability Considerations
Process Scaling
Strategy: Read-heavy optimization
class MRSW_Array {
T* data;
std::atomic<uint64_t> version;
};
Memory Scaling
Strategy: Hierarchical structures
class LargeMap {
};
High-performance object pool for shared memory.
NUMA Awareness
void* addr = mmap(...);
mbind(addr, size, MPOL_BIND, &nodemask, ...);
Security Considerations
1. Access Control
int fd = shm_open(name, O_CREAT | O_RDWR, 0660);
2. Input Validation
if (index >= capacity)
throw std::out_of_range("...");
3. Resource Limits
static constexpr size_t MAX_ALLOCATION = 1ULL << 30;
if (size > MAX_ALLOCATION)
throw std::invalid_argument("...");
Future Directions
1. Persistent Memory Support
void persist() {
pmem_persist(base_addr, size);
}
};
2. GPU Shared Memory
class cuda_shm {
void* allocate(size_t size) {
cudaMallocManaged(&ptr, size);
}
};
3. Distributed Shared Memory
}
};
std::reverse_iterator< iterator > reverse_iterator
Reverse iterator.
Design Principles Summary
- Zero-Overhead Abstraction: Pay only for what you use
- Lock-Free Where Possible: Minimize contention
- Cache-Conscious: Optimize for memory hierarchy
- Fail-Safe: Graceful degradation over crashes
- Discoverable: Named structures for flexibility
- Composable: Build complex from simple
- Type-Safe: Compile-time constraints
- RAII: Automatic resource management