FemtoGrad is a minimalist automatic differentiation library designed for education and understanding the core concepts behind modern deep learning frameworks.
What is Automatic Differentiation?
Automatic differentiation (autodiff) computes derivatives of functions specified by computer programs. Unlike:
- Numerical differentiation: Approximate, unstable
- Symbolic differentiation: Grows exponentially, inefficient
Autodiff provides exact derivatives with computational cost proportional to computing the function itself.
Reverse Mode AD
FemtoGrad implements reverse mode AD (backpropagation), which:
- Forward Pass: Compute function value, record operations
- Backward Pass: Accumulate gradients by chain rule
- Efficiency: O(1) cost per output, regardless of input dimensionality
Core Abstractions
class Tensor:
def __init__(self, data, _children=(), _op=''):
self.data = data
self.grad = 0
self._backward = lambda: None
self._prev = set(_children)
self._op = _op
Each tensor tracks:
- Its value (data)
- Its gradient (grad)
- How it was computed (_prev, _op)
- How to backpropagate (_backward)
Educational Value
FemtoGrad demonstrates:
- Computational Graphs: How operations form a DAG
- Gradient Flow: The chain rule in action
- Dynamic Construction: Graphs built during forward pass
- Simplicity: Core autodiff in ~100 lines
Supported Operations
Basic operations:
- Arithmetic: add, multiply, divide, power
- Activation functions: ReLU, sigmoid, tanh
- Reductions: sum, mean
This is enough to build and train neural networks!
Example Usage
# Create tensors
a = Tensor(2.0)
b = Tensor(3.0)
# Build computation
c = a * b + b**2
c.backward()
# Gradients computed
print(a.grad) # dc/da
print(b.grad) # dc/db
Beyond FemtoGrad
Understanding FemtoGrad provides insight into:
- PyTorch’s autograd
- TensorFlow’s GradientTape
- JAX’s grad function
All implement the same core ideas with additional optimizations and features.
Discussion