Skip to main content

FemtoGrad: A Minimal Automatic Differentiation Library

FemtoGrad is a minimalist automatic differentiation library designed for education and understanding the core concepts behind modern deep learning frameworks.

What is Automatic Differentiation?

Automatic differentiation (autodiff) computes derivatives of functions specified by computer programs. Unlike:

  • Numerical differentiation: Approximate, unstable
  • Symbolic differentiation: Grows exponentially, inefficient

Autodiff provides exact derivatives with computational cost proportional to computing the function itself.

Reverse Mode AD

FemtoGrad implements reverse mode AD (backpropagation), which:

  1. Forward Pass: Compute function value, record operations
  2. Backward Pass: Accumulate gradients by chain rule
  3. Efficiency: O(1) cost per output, regardless of input dimensionality

Core Abstractions

class Tensor:
    def __init__(self, data, _children=(), _op=''):
        self.data = data
        self.grad = 0
        self._backward = lambda: None
        self._prev = set(_children)
        self._op = _op

Each tensor tracks:

  • Its value (data)
  • Its gradient (grad)
  • How it was computed (_prev, _op)
  • How to backpropagate (_backward)

Educational Value

FemtoGrad demonstrates:

  • Computational Graphs: How operations form a DAG
  • Gradient Flow: The chain rule in action
  • Dynamic Construction: Graphs built during forward pass
  • Simplicity: Core autodiff in ~100 lines

Supported Operations

Basic operations:

  • Arithmetic: add, multiply, divide, power
  • Activation functions: ReLU, sigmoid, tanh
  • Reductions: sum, mean

This is enough to build and train neural networks!

Example Usage

# Create tensors
a = Tensor(2.0)
b = Tensor(3.0)

# Build computation
c = a * b + b**2
c.backward()

# Gradients computed
print(a.grad)  # dc/da
print(b.grad)  # dc/db

Beyond FemtoGrad

Understanding FemtoGrad provides insight into:

  • PyTorch’s autograd
  • TensorFlow’s GradientTape
  • JAX’s grad function

All implement the same core ideas with additional optimizations and features.

Discussion