Projects

Open source repositories spanning machine learning, cryptography, statistical computing, and software engineering

All Projects
Filter and Search Projects
Category
Primary Language

A header-only C++20 library for zero-copy, prefix-free data representations with algebraic types and succinct data structures

Topology-aware RAG using complex network analysis. Features community detection, hub/bridge identification, and a YAML DSL for configuring field embeddings and …

Instrumental Goals and Latent Codes in RL-Fine-Tuned Language Models

A comprehensive theoretical and empirical analysis of mesa-optimization risks, deceptive …

NFA Tools: Regular Languages and Finite Automata

An elegant, pedagogical implementation of finite automata with NFA to DFA conversion, regex parsing, and …

Computational Basis Transforms - A header-only C++17 library for transformations between computational domains

Seqwise - Sequential Image Analysis with Vision Language Models

A simple, cost-free approach to analyzing sequences of images using local Vision Language Models …

Bernoulli Types

A unified C++ framework for probabilistic data structures based on the fundamental distinction between latent (true but unobservable) values and …

The Dot Ecosystem

“What started as a single, humble function evolved into a complete, coherent ecosystem for manipulating data structures—a journey in …

Fisher Flow: A unified information-geometric framework for sequential inference revealing how modern optimizers (Adam, Natural Gradient, K-FAC, EWC) emerge as …

Cognitive MRI of AI Conversations: Network analysis of ChatGPT conversation logs using semantic embeddings to reveal knowledge topology, community structure, …

Encrypted Search Research Repository

Overview

This repository contains the complete source code, simulation data, related files, and presentations from my …

Reverse-Process Synthetic Data Generation: Automatically Generating Training Language Models for Complex Problem Solving

Abstract:

This paper introduces a …

ZeroIPC - Active Computational Substrate for Shared Memory

Overview

ZeroIPC transforms shared memory from passive storage into an active computational …

DigiStar - High-Performance N-Body Particle Simulation

A massively parallel particle simulation system capable of simulating millions of particles in real-time …

LangCalc: A Calculus for Language Models

An elegant mathematical framework for composing language models through algebraic operations, featuring efficient …

Tree Rewriter

A minimal term rewriting system. 15 lines of code. Infinite possibilities.

The Insight

What if we could express computational rules as simple …

A powerful, immutable-by-default tree manipulation library for Python with functional programming patterns, composable transformations, and advanced pattern …

A consistent API for hypothesis testing in R. Provides generic methods for p-values, test statistics, degrees of freedom, and significance testing. Includes LRT …


marp: true #theme: uncover math: mathjax

SLUUG Talk: Large Language Models

This repository contains the slides and code for the talk:

  • Demystifying Large …

Anonymous batch job execution system with Linux namespace/seccomp sandboxing, resource limits, and WebSocket streaming

ChatGPT chat search

This was the first python app I developed in quite some time. I wanted to host ChatGPT logs, experiment with heroku, and see how easy it …

Seeing how easy it is to convert an old project on Google App Engine to a modern framework with the help of ChatGPT

R package: md.tools

A miscellaneous set of tools for working with masked data and common features of masked data. The tool set takes inspiration from …

mdrelax

Relaxed Candidate Set Models for Masked Data in Series Systems

Overview

This R package implements likelihood-based inference for series systems with …

Alga

A mathematically elegant C++20 library for algebraic text processing and compositional parsing with fuzzy matching. Built on rigorous algebraic foundations …

Universal function Bernoulli approximators

Oblivious maps

A set is an unordered collection of distinct elements, typically from some implicitly understood …

Space-efficient approximate mappings using perfect hash functions. Supports arbitrary function approximation (X→Y) with configurable storage (8/16/32/64-bit) …

Bernoulli Data Type

A general framework for understanding and constructing probabilistic data structures with controlled error rates. This framework can also …