Projects
Open source repositories spanning machine learning, cryptography, statistical computing, and software engineering
Featured Projects
Highlighted work representing significant research and development contributions
sparse_spatial_hash
CTK - Conversation Toolkit
BTK - Bookmark Toolkit
ollama_data_tools
elasticsearch-lm
femtograd
accumux
algebraic.mle
Symbolic likelihood models in Python. Build, compose, and analyze likelihood functions with automatic differentiation and symbolic manipulation.
A pattern matching and term rewriting library for Python. Define rewrite rules with intuitive DSL syntax and apply them to transform symbolic expressions.
Automatic fuzzy rule discovery through differentiable soft circuits - learn fuzzy logic systems from data without expert knowledge
A header-only C++20 library for zero-copy, prefix-free data representations with algebraic types and succinct data structures
Immutable graph library with 56+ algorithms, transformers, selectors, and lazy views.
Variable-length n-gram language models using suffix arrays.
Topology-aware RAG using complex network analysis. Features community detection, hub/bridge identification, and a YAML DSL for configuring field embeddings and …
Instrumental Goals and Latent Codes in RL-Fine-Tuned Language Models
A comprehensive theoretical and empirical analysis of mesa-optimization risks, deceptive …
Theoretical analysis of cryptographic perfect hash functions with optimal space complexity
Statistical modeling library built on autograd-cpp for MLE, bootstrap, regression
LaTeX to multiple formats converter with modular themes and components
Modern C++ automatic differentiation library with neural networks and statistical modeling
A novel framework exploring intelligence as boundary maintenance between regions with different computational dynamics
LLMs as Intelligent Priors: Enhancing Classical Algorithms Through Learned Initialization
A virtual POSIX filesystem with content-addressable DAG structure.
Modern LLM web automation agent with model-specific prompt optimization
NFA Tools: Regular Languages and Finite Automata
An elegant, pedagogical implementation of finite automata with NFA to DFA conversion, regex parsing, and …
Computational Basis Transforms - A header-only C++17 library for transformations between computational domains
Monte Carlo Tree Search for LLM-based reasoning with fluent API and advanced sampling strategies
A Claude Code-inspired CLI for local LLMs with MCP support
Seqwise - Sequential Image Analysis with Vision Language Models
A simple, cost-free approach to analyzing sequences of images using local Vision Language Models …
Bernoulli Types
A unified C++ framework for probabilistic data structures based on the fundamental distinction between latent (true but unobservable) values and …
The Dot Ecosystem
“What started as a single, humble function evolved into a complete, coherent ecosystem for manipulating data structures—a journey in …
A network-native functional language.
A powerful relational algebra CLI and library for JSONL data manipulation.
Fisher Flow: A unified information-geometric framework for sequential inference revealing how modern optimizers (Adam, Natural Gradient, K-FAC, EWC) emerge as …
Cognitive MRI of AI Conversations: Conference paper analyzing ChatGPT conversations through network science. Presented at Complex Networks 2025.
Cognitive MRI of AI Conversations: Network analysis of ChatGPT conversation logs using semantic embeddings to reveal knowledge topology, community structure, …
CLI tool for managing and querying your git repository collection. Tracks events, metadata, and provides powerful queries across all your repos with GitHub, …
Encrypted Search Research Repository
Overview
This repository contains the complete source code, simulation data, related files, and presentations from my …
Logic programming with LLM integration and wake-sleep learning cycles
Convert source code to structured, context-optimized markdown for LLMs with intelligent summarization.
CLI tool for managing ebooks with semantic search, virtual libraries, annotations, and multi-format export. Part of the Long Echo toolkit for personal data …
A streaming data processing system for JSON with lazy evaluation, composable operations, and a fluent API.
Unix-composable fuzzy logic inference with elegant Pythonic API
A powerful symbolic expression toolkit for rule-based term rewriting.
Fuzzy logic search on plain documents and JSON documents.
Reverse-Process Synthetic Data Generation: Automatically Generating Training Language Models for Complex Problem Solving
Abstract:
This paper introduces a …
ZeroIPC - Active Computational Substrate for Shared Memory
Overview
ZeroIPC transforms shared memory from passive storage into an active computational …
DigiStar - High-Performance N-Body Particle Simulation
A massively parallel particle simulation system capable of simulating millions of particles in real-time …
LangCalc: A Calculus for Language Models
An elegant mathematical framework for composing language models through algebraic operations, featuring efficient …
Tree Rewriter
A minimal term rewriting system. 15 lines of code. Infinite possibilities.
The Insight
What if we could express computational rules as simple …
How 256 bits pretend to be infinity: A pedagogical exploration of random oracles and computational randomness
A powerful, immutable-by-default tree manipulation library for Python with functional programming patterns, composable transformations, and advanced pattern …
A consistent API for hypothesis testing in R. Provides generic methods for p-values, test statistics, degrees of freedom, and significance testing. Includes LRT …
marp: true #theme: uncover math: mathjax
SLUUG Talk: Large Language Models
This repository contains the slides and code for the talk:
- Demystifying Large …
mediavault - Universal Media Playlist Manager
Anonymous batch job execution system with Linux namespace/seccomp sandboxing, resource limits, and WebSocket streaming
Scalable lock based on 2-thread Peterson lock.
Model selection for reliability estimation in series systems with Weibull components: when can engineers safely use simpler models?
A modern C++ header-only library implementing Disjoint Interval Sets as a complete Boolean algebra. Features elegant API, compile-time intervals, and …
Weibull series system estimation from data with censored lifetimes and masked component cause of failure.
Maximum likelihood estimation for series system reliability with Weibull components under right-censoring and masked failure data, including likelihood ratio …
Dynamic failure rate distributions (DFR)
Likelihood model for series systems with masked component cause of failure and other censoring mechanisms
ChatGPT chat search
This was the first python app I developed in quite some time. I wanted to host ChatGPT logs, experiment with heroku, and see how easy it …
Likelihood model framework
Composable MLE solvers: a DSL for maximum likelihood estimation where solvers are first-class functions that combine via chaining, racing, and restarts
Seeing how easy it is to convert an old project on Google App Engine to a modern framework with the help of ChatGPT
R package: Algebra over distributions (random elements) with automatic simplification to closed forms
R package: md.tools
A miscellaneous set of tools for working with masked data and common features of masked data. The tool set takes inspiration from …
mdrelax
Relaxed Candidate Set Models for Masked Data in Series Systems
Overview
This R package implements likelihood-based inference for series systems with …
Alga
A mathematically elegant C++20 library for algebraic text processing and compositional parsing with fuzzy matching. Built on rigorous algebraic foundations …
Time series analysis of a confidentiality measure for an Encrypted search system
We derive a confidentiality measure against an adversary deploying a …
Algebraic cipher types
Encrypted Search: A Probabilistic Estimator of Confiidentiality
Research code and data for the IEEE CloudCom 2016 paper on estimating confidentiality risks in encrypted search systems. The Moving Average Bootstrap (MAB) …
Universal function Bernoulli approximators
Oblivious maps
A set is an unordered collection of distinct elements, typically from some implicitly understood …
Modern C++20 header-only library for algebraic hash function composition with elegant DSL
Space-efficient approximate mappings using perfect hash functions. Supports arbitrary function approximation (X→Y) with configurable storage (8/16/32/64-bit) …
Cipher Trapdoor Sets (CTS)
A modern C++20 header-only library for privacy-preserving set operations using cryptographic trapdoor functions. This library enables …
Bernoulli Data Type
A general framework for understanding and constructing probabilistic data structures with controlled error rates. This framework can also …