Raw Papers

Auto-generated list of all papers from /static/latex/

52 Papers
0 Preprints
15 Top Rated
Search & Filter
Showing 52 of 52 papers
Bernoulli Types: Theory and Construction
{'name': 'Alex Towell', 'email': 'lex@metafunctor.com'}
2026-01-24 Research Paper 26 pages 455.9 KB

A theoretical framework for reasoning about computation with controlled approximation, distinguishing between latent values (true mathematical objects) and observed values (what computations produce).

probabilistic data structures type theory Bloom filters approximate computing error propagation hash functions oblivious computing
Oblivious Computing: Privacy Through Approximation
{'name': 'Alex Towell', 'email': 'lex@metafunctor.com'}
2026-01-24 Research Paper 18 pages 391.4 KB

A framework where privacy emerges from uniform observable distributions, establishing that uniformity equals privacy through the uniform encoding construction.

oblivious computing privacy uniform encoding information theory secure computation access pattern hiding
Private Information Retrieval and Systems
{'name': 'Alex Towell', 'email': 'lex@metafunctor.com'}
2026-01-24 Research Paper 14 pages 404.1 KB

Bridging PIR theory and practice through Bernoulli types, introducing tuple encoding for hiding Boolean AND correlations in conjunctive queries.

private information retrieval PIR tuple encoding oblivious data structures encrypted search privacy-preserving systems oblivious computing
Regular Types in the Bernoulli Model: When Equality Isn't Equal
{'name': 'Alex Towell', 'email': 'lex@metafunctor.com'}
2026-01-24 Research Paper 9 pages 343.5 KB

Analysis of how regular type axioms (reflexivity, symmetry, transitivity) behave under noisy observation, with applications to distributed systems and fault-tolerant computing.

regular types type theory probabilistic equality distributed systems fault tolerance Byzantine computing oblivious computing
Statistical Foundations and Empirical Validation
{'name': 'Alex Towell', 'email': 'lex@metafunctor.com'}
2026-01-24 Research Paper 12 pages 409.4 KB

Statistical foundations for Bernoulli types with empirical validation, introducing entropy maps and the Order-Rank-Entropy trinity.

statistical validation entropy maps binomial distributions confidence intervals empirical analysis probabilistic data structures oblivious computing
Two-Level Threshold Structures for Approximate Membership Testing
{'name': 'Alex Towell', 'email': 'lex@metafunctor.com'}
2026-01-24 Research Paper 11 pages 362.9 KB

A novel approach to approximate membership testing achieving constant query time with exact control over false positive rates, requiring only 2 hash operations regardless of error rate.

threshold structures approximate membership Bloom filters probabilistic data structures perfect hashing space-computation trade-offs oblivious computing
{'name': 'Alex Towell', 'email': 'lex@metafunctor.com'}, {'name': 'John Matta', 'email': 'jmatta@siue.edu'}
2025-12-09 Conference Paper 12 pages 1.5 MB Complex Networks 2025

Through a single-user case study of 449 ChatGPT conversations, we introduce a cognitive MRI applying network analysis to reveal thought topology hidden in linear conversation logs. We construct semantic similarity networks with user-weighted embeddings to identify knowledge communities and bridge conversations that enable cross-domain flow. Our analysis reveals heterogeneous topology: theoretical domains exhibit hub-and-spoke structures while practical domains show tree-like hierarchies. We identify three distinct bridge types that facilitate knowledge integration across communities.

complex networks AI conversation semantic embedding knowledge graphs ChatGPT network analysis community detection cognitive science
Infinigram: Corpus-Based Language Modeling via Suffix Arrays with LLM Probability Mixing
Alex Towell
2025-12-03 Technical Report 8 pages 392.7 KB

We present Infinigram, a corpus-based language model that uses suffix arrays for variable-length n-gram pattern matching. Unlike neural language models that require expensive training and fine-tuning, Infinigram provides instant training—the corpus is the model. Given a context, Infinigram finds the longest matching suffix in the training corpus and estimates next-token probabilities from observed continuations. This approach offers O(m log n) query time, complete explainability since every prediction traces to specific corpus evidence, and the ability to ground LLM outputs through probability mixing without retraining. We introduce a theoretical framework that views inductive biases as projections—transformations applied to queries or training data that enable generalization.

language models n-gram suffix arrays NLP LLM grounding probability mixing Python machine learning
Alex Towell
2025-12-03 Technical Report 7 pages 250.6 KB

This technical report introduces the likelihood.model.series.md R package for maximum likelihood estimation in series systems with masked component cause of failure data. The package provides a unified framework for exponential and Weibull series systems, implementing log-likelihood functions, score vectors, and Hessian matrices under specific masking conditions (C1, C2, C3). We describe the mathematical foundation, software architecture, and integration with the broader likelihood.model ecosystem. The package enables practitioners to perform parameter estimation, construct confidence intervals, and conduct hypothesis tests for series system reliability problems where component failure causes are only partially observable.

R statistics reliability series systems masked data maximum likelihood Weibull exponential distribution
Statistical Inference for Series Systems from Masked Failure Time Data: The Exponential Case
Alex Towell
2025-12-02 Research Paper 29 pages 484.5 KB

We consider the problem of estimating component failure rates in series systems when observations consist of system failure times paired with partial information about the failed component. For the case where component lifetimes follow exponential distributions, we derive closed-form expressions for the maximum likelihood estimator, the Fisher information matrix, and establish sufficient statistics. The asymptotic sampling distribution of the estimator is characterized and confidence intervals are provided. A detailed analysis of a three-component system demonstrates the theoretical results.

statistics fisher information masked data series systems reliability exponential distribution maximum likelihood sufficient statistics asymptotic theory confidence intervals
MCTS-Reasoning: A Canonical Specification of Monte Carlo Tree Search for LLM Reasoning
Alex Towell
2025-12-01 Technical Report 15 pages 488.8 KB

This technical report provides a rigorous specification of Monte Carlo Tree Search (MCTS) applied to Large Language Model (LLM) reasoning. We present formal definitions of the search tree structure, the four phases of MCTS (Selection, Expansion, Simulation, Backpropagation), and their adaptation for step-by-step reasoning with language models. Key design choices—including tree-building rollouts and terminal-only evaluation—are explicitly documented and justified. The goal is to establish a canonical reference specification that authentically captures the MCTS algorithm while adapting it for the unique characteristics of LLM-based reasoning tasks.

MCTS Monte Carlo Tree Search LLM reasoning tree search UCB1 machine learning artificial intelligence
Cipher Maps: A Unified Framework for Oblivious Function Approximation Through Algebraic Structures and Bernoulli Models
Alex Towell
2025-10-14 Research Paper 10 pages 246.7 KB

We present cipher maps, a comprehensive theoretical framework unifying oblivious function approximation, algebraic cipher types, and Bernoulli data models. Building on the mathematical foundations of cipher functors that lift monoids into cipher monoids, we develop oblivious Bernoulli maps that prov...

oblivious computing cipher maps algebraic cipher types Bernoulli models function approximation oblivious data types cryptography encrypted search probabilistic data structures category theory
Encrypted Search with Oblivious Bernoulli Types: Information-Theoretic Privacy through Controlled Approximation
Alex Towell
2025-10-14 Research Paper 20 pages 463.3 KB

Problem: Traditional encrypted search systems face a fundamental tension: deterministic schemes leak access patterns enabling inference attacks, while probabilistic structures like Bloom filters provide space efficiency but fail to hide what is being queried.

encrypted search oblivious computing Bernoulli types oblivious data types information-theoretic privacy probabilistic data structures access pattern obfuscation cryptography privacy-preserving search query privacy
Maximizing Confidentiality in Encrypted Search Through Entropy Optimization
Alex Towell
2025-10-14 Research Paper 51 pages 914.6 KB

We present a framework for maximizing confidentiality in encrypted search systems through entropy optimization. By analyzing the information-theoretic properties of query distributions and access patterns, we develop techniques to maximize the adversary's uncertainty about user queries and document contents. Our approach combines secure indexing with entropy maximization to provide quantifiable privacy guarantees.

encrypted search oblivious computing entropy optimization confidentiality maximization information theory privacy-preserving search cryptography secure indexing query privacy
2025-10-12 10 pages 193.9 KB

We present DagShell, a virtual POSIX-compliant filesystem implemented as a content-addressable directed acyclic graph (DAG). Unlike traditional filesystems that mutate data in place, DagShell treats all filesystem objects as immutable nodes identified by SHA256 content hashes, similar to Git’s objec...

2025-10-08 6 pages 215.7 KB

We present JSONL Algebra (ja), a command-line tool and interactive REPL that applies relational algebra operations to semi-structured JSONL (JSON Lines) data. Unlike traditional database systems that require rigid schemas, ja embraces the flexibility of JSON while providing the expressive power of r...

Alex Towell
2025-10-07 White Paper 25 pages 307.9 KB

We present alea, a C++20 header-only library that implements probability distributions and Monte Carlo methods using generic programming and type erasure. The library provides: (1) a type-erased interface for probability distributions that enables runtime polymorphism while maintaining type safety; ...

c++ probability Monte Carlo library statistics type erasure
Alex Towell
2025-10-07 White Paper 4 pages 153.3 KB

We present Alga, a header-only C++20 template library that models parsers as composable algebraic structures. By treating parsers as elements of monoids and leveraging monadic composition patterns, Alga provides a mathematically rigorous yet practically efficient framework for text processing. The l...

c++ parsing monads algebra library functional programming
Alex Towell
2025-10-07 Research Paper 8 pages 217.2 KB

We present algebraic cipher types, a functorial framework that lifts monoids and other algebraic structures into cryptographically secure representations while preserving their computational properties. Our cipher functor cA provides a systematic construction that maps a monoid (S,∗,e) to a cipher m...

oblivious computing algebraic cipher types category theory functors cryptography encrypted search homomorphic operations secure computation oblivious data types type theory
Alex Towell
2025-10-07 Research Paper 39 pages 395.3 KB

We present a comprehensive algebraic framework for language model composition that transforms how we build and reason about language systems. Our framework introduces a rich set of operators—mixture (+), scalar (*), maximum (—), minimum (&), exclusive-or (⊕), temperature (**), threshold (¿¿), tr...

large language models algebra composition machine learning natural language processing
Alex Towell
2025-10-07 Research Paper 5 pages 186.8 KB

Fuzzy logic systems have traditionally relied on domain experts to define membership functions and inference rules, creating a significant barrier to deployment in domains where expert knowledge is limited or expensive to obtain. We present a novel approach to fuzzy system design through fuzzy soft ...

fuzzy logic machine learning differentiable programming soft circuits rule discovery
Alex Towell
2025-10-07 Research Paper 10 pages 228.9 KB

We present DreamLog, a neural-symbolic system that integrates logic programming with large language models (LLMs) through a biologically-inspired architecture featuring wake-sleep cycles, compression-based learning, and recursive knowledge generation. DreamLog addresses the fundamental challenge of ...

neural-symbolic logic programming large language models machine learning compression knowledge representation
Alex Towell
2025-10-01 White Paper 8 pages 662.9 KB

Streaming data processing requires algorithms that compute statistical aggregates in a single pass with constant memory—a challenge complicated by floating-point precision loss and the need to compute multiple statistics simultaneously. We present accumux, a C++ library that solves these challenges ...

c++ streaming data structures library numerical stability
Alex Towell
2025-10-01 White Paper 18 pages 453.9 KB

We present Algebraic Hashing, a header-only C++20 library that enables systematic composition of hash functions through XOR operations. The library explores algebraic properties of hash function composition, providing practical implementations with compile-time composition via template metaprogrammi...

cryptography hash functions algebra c++ library
Apertures: Coordinated Partial Evaluation for Distributed Computation
Alex Towell
2025-10-01 White Paper 9 pages 139.0 KB

We present apertures, a coordination mechanism for distributed computation based on partial evaluation with explicit holes. Apertures (denoted ?variable) represent unknown expressions that enable pausable and resumable evaluation across multiple parties. Unlike security-focused approaches, we make n...

oblivious computing encrypted search distributed computation partial evaluation coordination cryptography secure multi-party computation oblivious query processing
Cryptographic perfect hash functions: A theoretical analysis on space efficiency, time complexity, and entropy
Alex Towell
2025-10-01 Research Paper 16 pages 391.3 KB

We analyze a theoretical perfect hash function that has three desirable properties: (1) it is a cryptographic hash function; (2) its in-place encoding obtains the theoretical lower-bound for the expected space complexity; and (3) its in-place encoding is a random bit string with maximum entropy.

cryptography performance hash functions data structures c++ library
Alex Towell
2025-10-01 White Paper 10 pages 218.6 KB

We present maph (Map based on Perfect Hash), a high-performance key-value storage system that achieves sub-microsecond latency through a novel combination of memory-mapped I/O, approximate perfect hashing, and lock-free atomic operations. Unlike traditional key-value stores that suffer from kernel/u...

hash functions data structures c++ library perfect hashing key-value storage
Alex Towell
2025-10-01 White Paper 18 pages 253.4 KB

We present PFC (Prefix-Free Codecs), a header-only C++20 library that achieves full-featured data compression without marshaling overhead through a novel combination of prefix-free codes and generic programming principles. The library implements a zero-copy invariant: in-memory representation equals...

cryptography security c++ data compression prefix-free codes generic programming library
Alex Towell
2025-04-19 Technical Report 21 pages 272.5 KB

We present Xtk (Expression Toolkit), a powerful and extensible system for symbolic expression manipulation through rule-based term rewriting. Xtk provides a simple yet expressive framework for pattern matching, expression transformation, and symbolic computation. The system employs an Abstract Syntax Tree (AST) representation using nested Python lists, enabling intuitive expression construction while maintaining formal rigor. We demonstrate that Xtk’s rule-based approach is Turing-complete and show its applicability to diverse domains including symbolic differentiation, algebraic simplification, theorem proving via tree search algorithms, and expression optimization. The toolkit includes an extensive library of predefined mathematical rules spanning calculus, algebra, trigonometry, and logic, along with an interactive REPL for exploratory computation. We present the theoretical foundations of the system, describe its implementation architecture, analyze its computational complexity, and provide comprehensive examples demonstrating its practical applications.

symbolic computation expression rewriting term rewriting pattern matching theorem proving algebraic simplification symbolic differentiation expression optimization python library toolkit
Alex Towell, Hiroshi Fujinoki
2025-04-08 Conference 8 pages 804.0 KB 2025 7th International Conference on Computer Communication and the Internet (ICCCI)

This paper presents a novel approach to preventing ransomware damages through in-operation off-site backup systems designed to achieve an exceptionally low false-negative miss-detection rate of 10⁻⁸.

IEEE ransomware backup systems cybersecurity false-negative detection
Fisher Flow: An Information-Geometric Framework for Sequential Estimation
Alex Towell
2024-06-20 Research Paper 56 pages 456.9 KB

We present Fisher Flow (FF), a framework for sequential statistical inference that propagates Fisher information rather than probability distributions. Fisher Flow provides a computationally efficient alternative to Bayesian updating while maintaining rigorous uncertainty quantification. The key ins...

statistics information geometry sequential estimation fisher information hessian maximum likelihood
Alex Towell
2024-01-10 Technical Paper 28 pages 318.0 KB

How do you store infinity in 256 bits? This paper explores the fundamental deception at the heart of computational cryptography: using finite information to simulate infinite randomness. We prove why true random oracles are impossible, then show how lazy evaluation creates a beautiful lie—a finite a...

cryptography philosophy
Estimating Confidentiality of Encrypted Search Systems: A Probabilistic Framework
{'name': 'Alex Towell', 'email': 'lex@metafunctor.com'}
2024-01-01 Research Paper 17 pages 337.2 KB

We develop a probabilistic framework for quantifying the confidentiality of encrypted search systems based on the sampling distribution of a confidentiality statistic. Using the Bootstrap method with normal approximation, we efficiently estimate the risk of confidentiality breach, enabling proactive countermeasures. Our analysis shows how entropy of the query distribution relates to the adversary's expected accuracy, providing theoretical grounding for resilience engineering approaches to encrypted search security.

encrypted search confidentiality bootstrap method frequency analysis privacy-preserving computation oblivious computing
Alex Towell
2023-10-07 Research Paper 29 pages 710.9 KB

This paper explores model selection for reliability estimation of components in multi-component series systems using Weibull distributions. We assess the sensitivity of a likelihood model based on right-censored and masked failure data to deviations from well-designed series systems where components...

statistics reliability series systems model selection weibull distribution maximum likelihood
Alex Towell
2023-08-09 Master's Thesis 40 pages 978.9 KB

This paper investigates maximum likelihood techniques to estimate component reliability from masked failure data in series systems. A likelihood model accounts for right-censoring and candidate sets indicative of masked failure causes. Extensive simulation studies assess the accuracy and precision o...

R statistics reliability series systems masked data maximum likelihood
Boolean Algebra over Trapdoor Sets: A Practical Framework for Privacy-Preserving Set Operations with Probabilistic Guarantees
Alex Towell
2022-01-01 Research Paper 11 pages 432.2 KB

We present Hash-Based Oblivious Sets (HBOS), a practical framework for privacy-preserving set operations that combines cryptographic hash functions with probabilistic data structures. Unlike traditional approaches using fully homomorphic encryption or secure multi-party computation, HBOS achieves microsecond-scale performance by embracing approximate operations with explicitly managed error rates. HBOS provides oblivious representation through one-way hash transformations—the underlying data cannot be recovered from hash values. Operations return Bernoulli Booleans: plaintext approximate results with explicit error rates. This layered model trades full obliviousness of query results for practical efficiency, making it suitable for applications where query patterns are not sensitive or results are shared.

oblivious computing hash-based sets privacy-preserving set operations probabilistic data structures Bloom filters cryptographic hashing encrypted search oblivious data structures trapdoor sets
Time Series Analysis of Confidentiality Degradation in Encrypted Search Systems
Alex Towell
2021-04-01 Research Paper 24 pages 2.5 MB

We present a time series analysis of confidentiality degradation in encrypted search systems subject to known-plaintext attacks. By modeling the adversary's accuracy as a time-dependent confidentiality measure, we develop forecasting models based on ARIMA and dynamic regression techniques. Our analy...

encrypted search oblivious computing known-plaintext attacks time series analysis confidentiality degradation security analysis cryptography privacy leakage statistical forecasting
Alex Towell
2020-10-01 R Package

hypothesize is an R package that provides a simple API for hypothesis testing. It is designed to be easy to use, and to provide a consistent interface for hypothesis testing. It is built on top of the likelihood.model package, which provides a consistent interface for likelihood models.

R statistics hypothesis testing

Likelihood Model

Alex Towell
2020-10-01 R Package

Facilitates building likelihood models. It defines a number of generic methods for working with likelihoods, and a few functions that are strictly functions of these generic methods. It is designed to interoperate well with the algebraic.mle R package. It also comes with a likelihood contributions model, a likelihood model that is composed of multiple likelihood contributions assuming i.i.d. data.

R statistics maximum likelihood
Alexander A. Towell, Hiroshi Fujinoki
2016-11-02 Conference Paper 8 pages 524.7 KB 8th International Conference on Computing Communication and Networking Technologies (ICCCNT)

This paper applies an approach of resilience engineering in studying how effective encrypted searches will be. One of the concerns on encrypted searches is frequency attacks. In frequency attacks, adversaries guess the meaning of the encrypted words by observing a large number of encrypted words in search queries and mapping the encrypted words to guessed plain text words using their known histogram. Thus, it is important for defenders to know how many encrypted words adversaries need to observe before they correctly guess the encrypted words. However, doing so takes long time for defenders because of the large volume of the encrypted words involved. We developed and evaluated Moving Average Bootstrap (MAB) method for estimating the number of encrypted words (N*) an adversary needs to observe before an adversary correctly guesses a certain percentage of the observed words with a certain confidence. Our experiments indicate that MAB method lets defenders to estimate N* using only 5% of the time, compared to the cases without MAB. Because of the significant reduction in the required time for estimating N*, MAB will contribute to the safety in encrypted searches.

encrypted search oblivious computing frequency attacks known-plaintext attacks privacy-preserving search confidentiality estimation bootstrap method statistical analysis information retrieval cryptography security analysis access pattern leakage
{'name': 'Alex Towell', 'email': 'lex@metafunctor.com'}
2015-05-01 Master's Thesis 107 pages 7.2 MB ProQuest

Encrypted Search, a technique for securely searching documents on untrusted systems. The author designs and compares several secure index types to determine their performance in terms of time complexity, space complexity, and retrieval accuracy. The study also examines the risk to confidentiality posed by storing frequency and proximity information and suggests ways to mitigate the risk. Additionally, the author investigates the impact of false positives and secure index poisoning on performance and confidentiality. Finally, the thesis explores techniques to protect against adversaries with access to encrypted query histories, such as query obfuscation.

encrypted search secure index information retrieval confidentiality oblivious computing privacy-preserving search query obfuscation false positives secure index poisoning adversarial access query histories frequency attacks access pattern leakage cryptography thesis