Reverse-Process Synthetic Data Generation for Math Reasoning
Training LLMs on mathematical reasoning by inverting easy-to-solve problems: generate derivatives, reverse them into integration exercises with full step-by-step solutions.
Browse posts by tag
Training LLMs on mathematical reasoning by inverting easy-to-solve problems: generate derivatives, reverse them into integration exercises with full step-by-step solutions.
What if reasoning traces could learn their own usefulness? A simple RL framing for trace memory, and why one reward signal is enough.
The classical AI curriculum teaches rational agents as utility maximizers. The progression from search to RL to LLMs is really about one thing: finding representations that make decision-making tractable.
Why the simplest forms of learning are incomputable, and what that means for the intelligence we can build.
Modern graduate ML text with causal inference, decision making, and ML foundations. Accessible free textbook with strong conceptual framing.
SIGMA uses Q-learning rather than direct policy learning. This architectural choice makes it both transparent and terrifying. You can read its value function, but what you read is chilling.
A logic programming system that alternates between wake and sleep phases, using LLMs for knowledge generation during wake and compression-based learning during sleep.
Learning fuzzy membership functions and inference rules automatically through gradient descent on soft circuits, instead of hand-crafting them.
Three approaches to computing derivatives, forward-mode AD, reverse-mode AD, and finite differences, each with different trade-offs for numerical computing and machine learning.
Science is search through hypothesis space. Intelligence prunes; testing provides signal. Synthetic worlds could accelerate the loop.
Applying Monte Carlo Tree Search to large language model reasoning, with a formal specification of the algorithm.
Using GMM clustering to improve retrieval in topically diverse knowledge bases
What if LLMs could remember their own successful reasoning? A simple experiment in trace retrieval, and why 'latent' is the right word.
Solomonoff induction, MDL, speed priors, and neural networks are all special cases of one Bayesian framework with four knobs.
Gradient descent in Euclidean space ignores the geometry of probability distributions. Natural gradient descent uses the Fisher information metric instead. Fisher Flow makes this continuous.
A tiny autodiff library for understanding how backpropagation actually works.
The AI course this semester keeps hammering one idea: intelligence is utility maximization under uncertainty. A* search, reinforcement learning, Bayesian networks, MDPs. One principle connects all of it.
Abstractions let us reason about complex systems despite our cognitive limits. But some systems resist compression entirely.
How the limited capacity of human working memory acts as regularization, shaping our reasoning and possibly preventing cognitive overfitting.
Reverse-mode automatic differentiation is just the chain rule applied systematically. I built one in C++20 to understand what PyTorch and JAX are actually doing.
I finally tried ChatGPT after weeks of ignoring it. My reaction was not surprise. It was recognition. The Solomonoff connection, language models as compression, prediction as intelligence. The pieces were all there.
Dual numbers extend the reals with an infinitesimal epsilon where epsilon^2 = 0. Evaluate f(x + epsilon) and you get f(x) + f'(x)*epsilon. The derivative falls out of the algebra.
The problem of predicting what comes next, from compression to language models