FemtoGrad: A Minimal Automatic Differentiation Library

March 15, 2024 2 min read Updated: October 5, 2025

FemtoGrad is a minimalist automatic differentiation library designed for education and understanding the core concepts behind modern deep learning frameworks.

What is Automatic Differentiation?

Automatic differentiation (autodiff) computes derivatives of functions specified by computer programs. Unlike:

Numerical differentiation: Approximate, unstable
Symbolic differentiation: Grows exponentially, inefficient

Autodiff provides exact derivatives with computational cost proportional to computing the function itself.

Reverse Mode AD

FemtoGrad implements reverse mode AD (backpropagation), which:

Forward Pass: Compute function value, record operations
Backward Pass: Accumulate gradients by chain rule
Efficiency: O(1) cost per output, regardless of input dimensionality

Core Abstractions

class Tensor:
    def __init__(self, data, _children=(), _op=''):
        self.data = data
        self.grad = 0
        self._backward = lambda: None
        self._prev = set(_children)
        self._op = _op

Each tensor tracks:

Its value (data)
Its gradient (grad)
How it was computed (_prev, _op)
How to backpropagate (_backward)

Educational Value

FemtoGrad demonstrates:

Computational Graphs: How operations form a DAG
Gradient Flow: The chain rule in action
Dynamic Construction: Graphs built during forward pass
Simplicity: Core autodiff in ~100 lines

Supported Operations

Basic operations:

Arithmetic: add, multiply, divide, power
Activation functions: ReLU, sigmoid, tanh
Reductions: sum, mean

This is enough to build and train neural networks!

Example Usage

# Create tensors
a = Tensor(2.0)
b = Tensor(3.0)

# Build computation
c = a * b + b**2
c.backward()

# Gradients computed
print(a.grad)  # dc/da
print(b.grad)  # dc/db

Beyond FemtoGrad

Understanding FemtoGrad provides insight into:

PyTorch’s autograd
TensorFlow’s GradientTape
JAX’s grad function

All implement the same core ideas with additional optimizations and features.

What is Automatic Differentiation?

Reverse Mode AD

Core Abstractions

Educational Value

Supported Operations

Example Usage

Beyond FemtoGrad

Related Posts

MCTS-Reasoning: Tree Search for LLM Reasoning

DreamLog: Logic Programming That Dreams to Improve Itself

JAF: Production-Grade Streaming Boolean Algebra Over Nested JSON

jsonl-algebra: Production-Grade Relational Algebra for Nested JSON

The Dot Ecosystem: From Simple Paths to Data Algebras

Discussion