Projects
Open source repositories spanning machine learning, cryptography, statistical computing, and software engineering
Featured Projects
Highlighted work representing significant research and development contributions
sparse_spatial_hash
CTK - Conversation Toolkit
BTK - Bookmark Toolkit
ollama_data_tools
elasticsearch-lm
femtograd
accumux
algebraic.mle
Multi-collection curated awesome-lists powered by curalist
Unix-philosophy image manipulation CLI with lazy evaluation, JSON piping, and multi-image composition
A self-describing, TOML-backed personal metadata store. Unix-philosophy CLI for making personal details instantly available to coding agents and scripts.
Arbitrary-order exact derivatives at machine precision
Multi-collection curated awesome-list manager
Live-reloading TeX editor with PDF viewer in the browser
Non-blocking audio playback daemon with named pipe protocol
Functorial framework for secure computation through homomorphic operations on encrypted algebraic structures
Unified terminal graphics library with multiple renderers (braille, quadrants, sextants, ASCII, sixel, kitty)
Soprano: Instant, Ultra-Realistic Text-to-Speech
aptus
Latin: fitted, adapted
REST API and Python client for remote LLM fine-tuning. Run the server on your GPU machine, submit training jobs from anywhere. …
Generate a conversable persona from personal data: conversations, writings, emails, bookmarks, photos, reading notes
A unified framework for sequential decision-making: from classical search to deep RL. All methods are approximations of expectimax with different representation …
Essays on induction, inference, and the search for useful representations
Posts influenced by SICP—on abstraction, composition, and computation as a medium for expressing ideas
Essays on digital legacy, graceful degradation, and designing systems that outlast their creators
ECHO philosophy documentation and compliance validator for durable personal data
Photo Toolkit - CLI for managing personal photo libraries with AI-powered organization, SHA256 deduplication, and semantic search
dual
Forward-mode automatic differentiation via dual numbers for C++20.
Overview
Dual numbers are a simple yet powerful technique for computing exact …
Pedagogical C++20 automatic differentiation library
Pedagogical C++20 linear algebra library
Paper: Preventing Ransomware Damages using In-Operation Off-Site Backup to Achieve a 10^-8 False-Negative Miss-Detection Rate (IEEE ICCI 2025)
Mail Toolkit - Personal email archive management with semantic search, relationship mapping, and privacy controls
A CLI-first, plaintext-native toolkit for capturing and organizing ideas, plans, tasks, and notes. Designed for the LLM era.
The Call of Asheron: An epic fantasy novel exploring forced migration, consciousness, and transformation through four protagonists on an alien world where …
Echoes of the Sublime - A philosophical horror novel exploring AI safety, consciousness, and cognitive bandwidth limits (~103k words)
The Policy - A literary SF novel exploring AI alignment, consciousness, and emergence
Symbolic likelihood models in Python. Build, compose, and analyze likelihood functions with automatic differentiation and symbolic manipulation.
A pattern matching and term rewriting library for Python. Define rewrite rules with intuitive DSL syntax and apply them to transform symbolic expressions.
Automatic fuzzy rule discovery through differentiable soft circuits - learn fuzzy logic systems from data without expert knowledge
A header-only C++20 library for zero-copy, prefix-free data representations with algebraic types and succinct data structures
Immutable graph library with 56+ algorithms, transformers, selectors, and lazy views.
Variable-length n-gram language models using suffix arrays.
Topology-aware RAG using complex network analysis. Features community detection, hub/bridge identification, and a YAML DSL for configuring field embeddings and …
Instrumental Goals and Latent Codes in RL-Fine-Tuned Language Models
A comprehensive theoretical and empirical analysis of mesa-optimization risks, deceptive …
Theoretical analysis of cryptographic perfect hash functions with optimal space complexity
LaTeX to multiple formats converter with modular themes and components
LLMs as Intelligent Priors: Enhancing Classical Algorithms Through Learned Initialization
A virtual POSIX filesystem with content-addressable DAG structure. Features immutable nodes, Git-style hashing, functional transformations, and an embedded …
Modern LLM web automation agent with model-specific prompt optimization
NFA Tools: Regular Languages and Finite Automata
An elegant, pedagogical implementation of finite automata with NFA to DFA conversion, regex parsing, and …
Computational Basis Transforms - A header-only C++17 library for transformations between computational domains
Monte Carlo Tree Search for LLM-based reasoning with fluent API and advanced sampling strategies
Seqwise - Sequential Image Analysis with Vision Language Models
A simple, cost-free approach to analyzing sequences of images using local Vision Language Models …
The Dot Ecosystem
“What started as a single, humble function evolved into a complete, coherent ecosystem for manipulating data structures—a journey in …
A network-native functional language.
A powerful relational algebra CLI and library for JSONL data manipulation.
Fisher Flow: A unified information-geometric framework for sequential inference revealing how modern optimizers (Adam, Natural Gradient, K-FAC, EWC) emerge as …
Cognitive MRI of AI Conversations: Conference paper analyzing ChatGPT conversations through network science. Presented at Complex Networks 2025.
Cognitive MRI of AI Conversations: Network analysis of ChatGPT conversation logs using semantic embeddings to reveal knowledge topology, community structure, …
CLI tool for managing and querying your git repository collection. Tracks events, metadata, and provides powerful queries across all your repos with GitHub, …
Logic programming with LLM integration and wake-sleep learning cycles
Convert source code to structured, context-optimized markdown for LLMs with intelligent summarization.
CLI tool for managing ebooks with semantic search, virtual libraries, annotations, and multi-format export. Part of the Long Echo toolkit for personal data …
A streaming data processing system for JSON with lazy evaluation, composable operations, and a fluent API.
Unix-composable fuzzy logic inference with elegant Pythonic API
A powerful symbolic expression toolkit for rule-based term rewriting.
Fuzzy logic search on plain documents and JSON documents.
Reverse-Process Synthetic Data Generation: Automatically Generating Training Language Models for Complex Problem Solving
Abstract:
This paper introduces a …
ZeroIPC - Active Computational Substrate for Shared Memory
Overview
ZeroIPC transforms shared memory from passive storage into an active computational …
LangCalc: A Calculus for Language Models
An elegant mathematical framework for composing language models through algebraic operations, featuring efficient …
Tree Rewriter
A minimal term rewriting system. 15 lines of code. Infinite possibilities.
The Insight
What if we could express computational rules as simple …
How 256 bits pretend to be infinity: A pedagogical exploration of random oracles and computational randomness
A powerful, immutable-by-default tree manipulation library for Python with functional programming patterns, composable transformations, and advanced pattern …
A consistent API for hypothesis testing in R. Provides generic methods for p-values, test statistics, degrees of freedom, and significance testing. Includes LRT …
marp: true #theme: uncover math: mathjax
SLUUG Talk: Large Language Models
This repository contains the slides and code for the talk:
- Demystifying Large …
Anonymous batch job execution system with Linux namespace/seccomp sandboxing, resource limits, and WebSocket streaming
Scalable lock based on 2-thread Peterson lock.
Model selection for reliability estimation in series systems with Weibull components: when can engineers safely use simpler models?
A modern C++ header-only library implementing Disjoint Interval Sets as a complete Boolean algebra. Features elegant API, compile-time intervals, and …
Weibull series system estimation from data with censored lifetimes and masked component cause of failure.
Maximum likelihood estimation for series system reliability with Weibull components under right-censoring and masked failure data, including likelihood ratio …
Dynamic failure rate distributions for survival analysis and reliability engineering in R
Likelihood model for series systems with masked component cause of failure and other censoring mechanisms
ChatGPT chat search
This was the first python app I developed in quite some time. I wanted to host ChatGPT logs, experiment with heroku, and see how easy it …
R package for specifying and using likelihood models for statistical inference. Provides a flexible framework for independent likelihood contributions across …
Composable MLE solvers: a DSL for maximum likelihood estimation where solvers are first-class functions that combine via chaining, racing, and restarts
Seeing how easy it is to convert an old project on Google App Engine to a modern framework with the help of ChatGPT
Pedagogical blog posts on generic programming in C++, inspired by Alex Stepanov
R package: Algebra over distributions (random elements) with automatic simplification to closed forms
R package: md.tools
A miscellaneous set of tools for working with masked data and common features of masked data. The tool set takes inspiration from …
mdrelax
Relaxed Candidate Set Models for Masked Data in Series Systems
Overview
This R package implements likelihood-based inference for series systems with …
Apertures
A minimal Lisp-like language where “holes” (written ?x or ?ns.x) represent unknown values that can be filled later. This enables pausable, …
Alga
A mathematically elegant C++20 library for algebraic text processing and compositional parsing with fuzzy matching. Built on rigorous algebraic foundations …
Composable calculus expressions for C++20: symbolic differentiation, numerical integration, and algebraic composition
Time series analysis of a confidentiality measure for an Encrypted search system
We derive a confidentiality measure against an adversary deploying a …
Algebraic cipher types
Master's thesis on encrypted search: enabling standard IR on encrypted collections. Published via ProQuest (2014). Part of the oblivious-computing research …
Encrypted Search with Oblivious Bernoulli Types: Information-Theoretic Privacy through Controlled Approximation
Probabilistic framework for quantifying confidentiality of encrypted search systems using bootstrap methods and entropy analysis
Research code and data for the IEEE CloudCom 2016 paper on estimating confidentiality risks in encrypted search systems. The Moving Average Bootstrap (MAB) …
A unified theoretical framework for oblivious function approximation through algebraic structures and Bernoulli models
Closed-form MLE and Fisher information for exponential series systems with masked failure data. Includes theoretical results, proofs, and numerical validation.
Modern C++20 header-only library for algebraic hash function composition with elegant DSL
Space-efficient approximate mappings using perfect hash functions. Supports arbitrary function approximation (X→Y) with configurable storage (8/16/32/64-bit) …
Privacy-preserving set operations using cryptographic trapdoor functions. Minimal Python library implementing Bernoulli types framework with explicit error …
Bernoulli Data Type
A general framework for understanding and constructing probabilistic data structures with controlled error rates. This framework can also …