Raw Papers

Auto-generated list of all papers from /static/latex/

49 Papers
0 Preprints
11 Top Rated
Search & Filter
Showing 49 of 49 papers
Cognitive MRI of AI Conversations: Analyzing AI Interactions through Semantic Embedding Networks
Alex Towell, John Matta
2025-12-09 Conference Paper 12 pages 1.5 MB Complex Networks 2025

Through a single-user case study of 449 ChatGPT conversations, we introduce a cognitive MRI applying network analysis to reveal thought topology hidden in linear conversation logs. We construct semantic similarity networks with user-weighted embeddings to identify knowledge communities and bridge conversations that enable cross-domain flow. Our analysis reveals heterogeneous topology: theoretical domains exhibit hub-and-spoke structures while practical domains show tree-like hierarchies. We identify three distinct bridge types that facilitate knowledge integration across communities.

complex networks AI conversation semantic embedding knowledge graphs ChatGPT network analysis community detection cognitive science
Infinigram: Corpus-Based Language Modeling via Suffix Arrays with LLM Probability Mixing
Alex Towell
2025-12-03 Technical Report 8 pages 385.5 KB

We present Infinigram, a corpus-based language model that uses suffix arrays for variable-length n-gram pattern matching. Unlike neural language models that require expensive training and fine-tuning, Infinigram provides instant training—the corpus is the model. Given a context, Infinigram finds the longest matching suffix in the training corpus and estimates next-token probabilities from observed continuations. This approach offers O(m log n) query time, complete explainability since every prediction traces to specific corpus evidence, and the ability to ground LLM outputs through probability mixing without retraining. We introduce a theoretical framework that views inductive biases as projections—transformations applied to queries or training data that enable generalization.

language models n-gram suffix arrays NLP LLM grounding probability mixing Python machine learning
Likelihood Models for Series Systems with Masked Component Failure Data: An R Package for Maximum Likelihood Estimation
Alex Towell
2025-12-03 Technical Report 7 pages 250.6 KB

This technical report introduces the likelihood.model.series.md R package for maximum likelihood estimation in series systems with masked component cause of failure data. The package provides a unified framework for exponential and Weibull series systems, implementing log-likelihood functions, score vectors, and Hessian matrices under specific masking conditions (C1, C2, C3). We describe the mathematical foundation, software architecture, and integration with the broader likelihood.model ecosystem. The package enables practitioners to perform parameter estimation, construct confidence intervals, and conduct hypothesis tests for series system reliability problems where component failure causes are only partially observable.

R statistics reliability series systems masked data maximum likelihood Weibull exponential distribution
Statistical Inference for Series Systems from Masked Failure Time Data: The Exponential Case
Alex Towell
2025-12-02 Research Paper 26 pages 455.8 KB

We consider the problem of estimating component failure rates in series systems when observations consist of system failure times paired with partial information about the failed component. For the case where component lifetimes follow exponential distributions, we derive closed-form expressions for the maximum likelihood estimator, the Fisher information matrix, and establish sufficient statistics. The asymptotic sampling distribution of the estimator is characterized and confidence intervals are provided. A detailed analysis of a three-component system demonstrates the theoretical results.

statistics fisher information masked data series systems reliability exponential distribution maximum likelihood sufficient statistics asymptotic theory confidence intervals
MCTS-Reasoning: A Canonical Specification of Monte Carlo Tree Search for LLM Reasoning
Alex Towell
2025-12-01 Technical Report 11 pages 472.0 KB

This technical report provides a rigorous specification of Monte Carlo Tree Search (MCTS) applied to Large Language Model (LLM) reasoning. We present formal definitions of the search tree structure, the four phases of MCTS (Selection, Expansion, Simulation, Backpropagation), and their adaptation for step-by-step reasoning with language models. Key design choices—including tree-building rollouts and terminal-only evaluation—are explicitly documented and justified. The goal is to establish a canonical reference specification that authentically captures the MCTS algorithm while adapting it for the unique characteristics of LLM-based reasoning tasks.

MCTS Monte Carlo Tree Search LLM reasoning tree search UCB1 machine learning artificial intelligence
Asymptotic Indistinguishability: Privacy Through Rank-Deficient Observations
Alex Towell
2025-10-14 Research Paper 9 pages 228.9 KB

We present a novel privacy framework based on rank-deficient observation functions that create computational indistinguishability between distinct inputs. Unlike differential privacy, which adds calibrated noise, our approach leverages lossy compression and many-to-one mappings to achieve privacy. W...

oblivious computing privacy asymptotic indistinguishability rank-deficient observations information-theoretic privacy lossy compression cryptography differential privacy alternative
Bernoulli Sets: A Comprehensive Statistical Framework for Probabilistic Set Membership
Alex Towell
2025-10-14 Research Paper 20 pages 291.7 KB

We present Bernoulli sets, a comprehensive statistical framework for probabilistic set data structures that provides rigorous foundations for approximating set membership queries under uncertainty. Our framework unifies the treatment of fundamental data structures like Bloom filters, Count-Min sketc...

oblivious computing Bernoulli sets probabilistic data structures set membership statistical framework Bloom filters oblivious data types approximate set operations
Boolean Encrypted Search with Approximate Sets: Error Propagation, Query Obfuscation, and Threshold-Based Retrieval
Alex Towell
2025-10-14 Research Paper 21 pages 558.2 KB

proof

encrypted search oblivious computing Boolean search approximate sets query obfuscation error propagation threshold-based retrieval secure indexing privacy-preserving search
Cipher Maps: A Unified Framework for Oblivious Function Approximation Through Algebraic Structures and Bernoulli Models
Alex Towell
2025-10-14 Research Paper 10 pages 246.7 KB

We present cipher maps, a comprehensive theoretical framework unifying oblivious function approximation, algebraic cipher types, and Bernoulli data models. Building on the mathematical foundations of cipher functors that lift monoids into cipher monoids, we develop oblivious Bernoulli maps that prov...

oblivious computing cipher maps algebraic cipher types Bernoulli models function approximation oblivious data types cryptography encrypted search probabilistic data structures category theory
Encrypted Search with Oblivious Bernoulli Types: Information-Theoretic Privacy through Controlled Approximation
Alex Towell
2025-10-14 Research Paper 16 pages 379.0 KB

Problem: Traditional encrypted search systems face a fundamental tension: deterministic schemes leak access patterns enabling inference attacks, while probabilistic structures like Bloom filters provide space efficiency but fail to hide what is being queried.

encrypted search oblivious computing Bernoulli types oblivious data types information-theoretic privacy probabilistic data structures access pattern obfuscation cryptography privacy-preserving search query privacy
Hash-Based Bernoulli Constructions: Space-Optimal Probabilistic Data Structures
Alex Towell
2025-10-14 Research Paper 12 pages 242.1 KB

We present a universal construction for space-optimal probabilistic data structures based on hash functions and Bernoulli types. Our framework unifies Bloom filters, Count-Min sketches, HyperLogLog, and other probabilistic structures under a common mathematical foundation, proving that they all aris...

oblivious computing probabilistic data structures Bernoulli types hash functions Bloom filters Count-Min sketch HyperLogLog space-optimal oblivious data structures
Hash-Based Oblivious Sets: A Practical Framework for Privacy-Preserving Set Operations with Probabilistic Guarantees
Alex Towell
2025-10-14 Research Paper 7 pages 223.3 KB

We present Hash-Based Oblivious Sets (HBOS), a practical framework for privacy-preserving set operations that combines cryptographic hash functions with probabilistic data structures. Unlike traditional approaches using fully homomorphic encryption or secure multi-party computation, HBOS achieves mi...

oblivious computing hash-based sets privacy-preserving set operations probabilistic data structures Bloom filters cryptographic hashing encrypted search oblivious data structures trapdoor sets
The Latent/Observed Duality: A Unified Theory of Approximate Computing
Alex Towell
2025-10-14 Research Paper 10 pages 249.9 KB

We present a unified type-theoretic framework for approximate computing based on the fundamental distinction between latent (true) and observed (approximate) values. This duality naturally arises not only in probabilistic data structures like Bloom filters, but equally in algorithms themselves—from ...

oblivious computing approximate computing type theory latent/observed duality probabilistic data structures Bernoulli types oblivious data types information theory
DagShell: A Content-Addressable Virtual Filesystem with Multiple Interface Paradigms
2025-10-12 10 pages 193.9 KB

We present DagShell, a virtual POSIX-compliant filesystem implemented as a content-addressable directed acyclic graph (DAG). Unlike traditional filesystems that mutate data in place, DagShell treats all filesystem objects as immutable nodes identified by SHA256 content hashes, similar to Git’s objec...

JSONL Algebra: A Relational Algebra Framework for Semi-Structured Data with Interactive Workspace
2025-10-08 6 pages 215.7 KB

We present JSONL Algebra (ja), a command-line tool and interactive REPL that applies relational algebra operations to semi-structured JSONL (JSON Lines) data. Unlike traditional database systems that require rigid schemas, ja embraces the flexibility of JSON while providing the expressive power of r...

Alea: A Modern C++ Library for Algebraic Random Elements
Alex Towell
2025-10-07 White Paper 25 pages 307.9 KB

We present alea, a C++20 header-only library that implements probability distributions and Monte Carlo methods using generic programming and type erasure. The library provides: (1) a type-erased interface for probability distributions that enables runtime polymorphism while maintaining type safety; ...

c++ probability Monte Carlo library statistics type erasure
Alga: Algebraic Parser Composition through Monadic Design A C++20 Template Library for Type-Safe Text Processing
Alex Towell
2025-10-07 White Paper 4 pages 153.3 KB

We present Alga, a header-only C++20 template library that models parsers as composable algebraic structures. By treating parsers as elements of monoids and leveraging monadic composition patterns, Alga provides a mathematically rigorous yet practically efficient framework for text processing. The l...

c++ parsing monads algebra library functional programming
Algebraic Cipher Types: A Functorial Framework for Secure Computation
Alex Towell
2025-10-07 Research Paper 8 pages 217.2 KB

We present algebraic cipher types, a functorial framework that lifts monoids and other algebraic structures into cryptographically secure representations while preserving their computational properties. Our cipher functor cA provides a systematic construction that maps a monoid (S,∗,e) to a cipher m...

oblivious computing algebraic cipher types category theory functors cryptography encrypted search homomorphic operations secure computation oblivious data types type theory
An Algebraic Framework for Language Model Composition: Unifying Projections, Mixtures, and Constraints
Alex Towell
2025-10-07 Research Paper 39 pages 395.3 KB

We present a comprehensive algebraic framework for language model composition that transforms how we build and reason about language systems. Our framework introduces a rich set of operators—mixture (+), scalar (*), maximum (—), minimum (&), exclusive-or (⊕), temperature (**), threshold (¿¿), tr...

large language models algebra composition machine learning natural language processing
Automatic Fuzzy Rule Discovery Through Differentiable Soft Circuits
Alex Towell
2025-10-07 Research Paper 5 pages 186.8 KB

Fuzzy logic systems have traditionally relied on domain experts to define membership functions and inference rules, creating a significant barrier to deployment in domains where expert knowledge is limited or expensive to obtain. We present a novel approach to fuzzy system design through fuzzy soft ...

fuzzy logic machine learning differentiable programming soft circuits rule discovery
DreamLog: Neural-Symbolic Integration through Compression-Based Learning and Wake-Sleep Cycles
Alex Towell
2025-10-07 Research Paper 10 pages 228.9 KB

We present DreamLog, a neural-symbolic system that integrates logic programming with large language models (LLMs) through a biologically-inspired architecture featuring wake-sleep cycles, compression-based learning, and recursive knowledge generation. DreamLog addresses the fundamental challenge of ...

neural-symbolic logic programming large language models machine learning compression knowledge representation
Learning to Prompt in Unknown Environments: A POMDP Framework with Compositional Actions for Large Language Models
Alex Towell
2025-10-07 Research Paper 19 pages 348.3 KB

We present a novel framework that addresses the fundamental challenge of optimizing prompts for large language models (LLMs) whose internal dynamics are unknown and unobservable. Unlike approaches that assume knowledge of the LLM’s behavior, we formulate prompting as a Partially Observable Markov De...

large language models reinforcement learning POMDP prompt engineering AI alignment
Algebraic Composition for Streaming Data Reduction: A Type-Safe Framework with Numerical Stability
Alex Towell
2025-10-01 White Paper 8 pages 662.9 KB

Streaming data processing requires algorithms that compute statistical aggregates in a single pass with constant memory—a challenge complicated by floating-point precision loss and the need to compute multiple statistics simultaneously. We present accumux, a C++ library that solves these challenges ...

c++ streaming data structures library numerical stability
Algebraic Hashing A Modern C++20 Library for Composable Hash Functions Version 2.0
Alex Towell
2025-10-01 White Paper 18 pages 453.9 KB

We present Algebraic Hashing, a header-only C++20 library that enables systematic composition of hash functions through XOR operations. The library explores algebraic properties of hash function composition, providing practical implementations with compile-time composition via template metaprogrammi...

cryptography hash functions algebra c++ library
Apertures: Coordinated Partial Evaluation for Distributed Computation
Alex Towell
2025-10-01 White Paper 9 pages 137.4 KB

We present apertures, a coordination mechanism for distributed computation based on partial evaluation with explicit holes. Apertures (denoted ?variable) represent unknown expressions that enable pausable and resumable evaluation across multiple parties. Unlike security-focused approaches, we make n...

oblivious computing encrypted search distributed computation partial evaluation coordination cryptography secure multi-party computation oblivious query processing
maph: Maps Based on Perfect Hashing for Sub-Microsecond Key-Value Storage
Alex Towell
2025-10-01 White Paper 10 pages 218.6 KB

We present maph (Map based on Perfect Hash), a high-performance key-value storage system that achieves sub-microsecond latency through a novel combination of memory-mapped I/O, approximate perfect hashing, and lock-free atomic operations. Unlike traditional key-value stores that suffer from kernel/u...

hash functions data structures c++ library perfect hashing key-value storage
PFC: Zero-Copy Data Compression Through Prefix-Free Codecs and Generic Programming
Alex Towell
2025-10-01 White Paper 16 pages 243.2 KB

We present PFC (Prefix-Free Codecs), a header-only C++20 library that achieves full-featured data compression without marshaling overhead through a novel combination of prefix-free codes and generic programming principles. The library implements a zero-copy invariant: in-memory representation equals...

cryptography security c++ data compression prefix-free codes generic programming library
Xtk: A Rule-Based Expression Rewriting Toolkit for Symbolic Computation
Alex Towell
2025-04-19 Technical Report 21 pages 272.5 KB

We present Xtk (Expression Toolkit), a powerful and extensible system for symbolic expression manipulation through rule-based term rewriting. Xtk provides a simple yet expressive framework for pattern matching, expression transformation, and symbolic computation. The system employs an Abstract Syntax Tree (AST) representation using nested Python lists, enabling intuitive expression construction while maintaining formal rigor. We demonstrate that Xtk’s rule-based approach is Turing-complete and show its applicability to diverse domains including symbolic differentiation, algebraic simplification, theorem proving via tree search algorithms, and expression optimization. The toolkit includes an extensive library of predefined mathematical rules spanning calculus, algebra, trigonometry, and logic, along with an interactive REPL for exploratory computation. We present the theoretical foundations of the system, describe its implementation architecture, analyze its computational complexity, and provide comprehensive examples demonstrating its practical applications.

symbolic computation expression rewriting term rewriting pattern matching theorem proving algebraic simplification symbolic differentiation expression optimization python library toolkit
Preventing Ransomware Damages Using in-Operation Off-Site Backup to Achieve a 10⁻⁸ False-Negative Miss-Detection Rate
Alex Towell, Hiroshi Fujinoki
2025-04-08 Conference 8 pages 804.0 KB 2025 7th International Conference on Computer Communication and the Internet (ICCCI)

This paper presents a novel approach to preventing ransomware damages through in-operation off-site backup systems designed to achieve an exceptionally low false-negative miss-detection rate of 10⁻⁸.

IEEE ransomware backup systems cybersecurity false-negative detection
Fisher Flow: An Information-Geometric Framework for Sequential Estimation
Alex Towell
2024-06-20 Research Paper 51 pages 451.5 KB

We present Fisher Flow (FF), a framework for sequential statistical inference that propagates Fisher information rather than probability distributions. Fisher Flow provides a computationally efficient alternative to Bayesian updating while maintaining rigorous uncertainty quantification. The key ins...

statistics information geometry sequential estimation fisher information hessian maximum likelihood
The Stepanov Library: Advancing Generic Programming in C++ Through Mathematical Abstractions and Zero-Cost Composition
Alex Towell
2024-05-07 White Paper 46 pages 604.6 KB

We present the Stepanov library, a header-only C++20/23 library that demonstrates the power of generic programming through mathematical abstractions and efficient implementations. Building on Alex Stepanov’s foundational principles, this library advances the state of generic programming by introduci...

The Beautiful Deception: How 256 Bits Pretend to be Infinity
Alex Towell
2024-01-10 Technical Paper 28 pages 318.0 KB

How do you store infinity in 256 bits? This paper explores the fundamental deception at the heart of computational cryptography: using finite information to simulate infinite randomness. We prove why true random oracles are impossible, then show how lazy evaluation creates a beautiful lie—a finite a...

cryptography philosophy
Model Selection for Reliability Estimation in Series Systems
Alex Towell
2023-10-07 Research Paper 27 pages 685.8 KB

This paper explores model selection for reliability estimation of components in multi-component series systems using Weibull distributions. We assess the sensitivity of a likelihood model based on right-censored and masked failure data to deviations from well-designed series systems where components...

statistics reliability series systems model selection weibull distribution maximum likelihood
Reliability Estimation in Series Systems Maximum Likelihood Techniques for Right-Censored and Masked Failure Data
Alex Towell
2023-08-09 Master's Thesis 40 pages 978.9 KB

This paper investigates maximum likelihood techniques to estimate component reliability from masked failure data in series systems. A likelihood model accounts for right-censoring and candidate sets indicative of masked failure causes. Extensive simulation studies assess the accuracy and precision o...

R statistics reliability series systems masked data maximum likelihood
Time Series Analysis of Confidentiality Degradation in Encrypted Search Systems
Alex Towell
2021-04-01 Research Paper 20 pages 2.5 MB

We present a time series analysis of confidentiality degradation in encrypted search systems subject to known-plaintext attacks. By modeling the adversary's accuracy as a time-dependent confidentiality measure, we develop forecasting models based on ARIMA and dynamic regression techniques. Our analy...

encrypted search oblivious computing known-plaintext attacks time series analysis confidentiality degradation security analysis cryptography privacy leakage statistical forecasting
Alex Towell
2020-10-01 R Package N/A

hypothesize is an R package that provides a simple API for hypothesis testing. It is designed to be easy to use, and to provide a consistent interface for hypothesis testing. It is built on top of the likelihood.model package, which provides a consistent interface for likelihood models.

R statistics hypothesis testing

Likelihood Model

Alex Towell
2020-10-01 R Package N/A

Facilitates building likelihood models. It defines a number of generic methods for working with likelihoods, and a few functions that are strictly functions of these generic methods. It is designed to interoperate well with the algebraic.mle R package. It also comes with a likelihood contributions model, a likelihood model that is composed of multiple likelihood contributions assuming i.i.d. data.

R statistics maximum likelihood
Estimating How Confidential Encrypted Searches Are Using Moving Average Bootstrap Method
Alexander A. Towell, Hiroshi Fujinoki
2016-11-02 8 pages 524.7 KB

This paper applies an approach of resilience engineering in studying how effective encrypted searches will be. One of the concerns on encrypted searches is frequency attacks. In frequency attacks, adversaries guess the meaning of the encrypted words by observing a large number of encrypted words in search queries and mapping the encrypted words to guessed plain text words using their known histogram. Thus, it is important for defenders to know how many encrypted words adversaries need to observe before they correctly guess the encrypted words. However, doing so takes long time for defenders because of the large volume of the encrypted words involved. We developed and evaluated Moving Average Bootstrap (MAB) method for estimating the number of encrypted words (N*) an adversary needs to observe before an adversary correctly guesses a certain percentage of the observed words with a certain confidence. Our experiments indicate that MAB method lets defenders to estimate N* using only 5% of the time, compared to the cases without MAB. Because of the significant reduction in the required time for estimating N*, MAB will contribute to the safety in encrypted searches.

encrypted search oblivious computing frequency attacks known-plaintext attacks privacy-preserving search confidentiality estimation bootstrap method statistical analysis information retrieval cryptography security analysis access pattern leakage
Encrypted Search - Enabling Standard Information Retrieval Techniques for Several New Secure Index Types While Preserving Confidentiality Against an Adversary With Access to Query Histories and Secure Index Contents
Alex Towell
2015-05-01 Master's Thesis 107 pages 7.2 MB ProQuest

Encrypted Search, a technique for securely searching documents on untrusted systems. The author designs and compares several secure index types to determine their performance in terms of time complexity, space complexity, and retrieval accuracy. The study also examines the risk to confidentiality posed by storing frequency and proximity information and suggests ways to mitigate the risk. Additionally, the author investigates the impact of false positives and secure index poisoning on performance and confidentiality. Finally, the thesis explores techniques to protect against adversaries with access to encrypted query histories, such as query obfuscation.

encrypted search secure index information retrieval confidentiality oblivious computing privacy-preserving search query obfuscation false positives secure index poisoning adversarial access query histories frequency attacks access pattern leakage cryptography thesis