Fisher Flow: Information Geometry in Optimization

April 20, 2024 2 min read Updated: October 5, 2025

Fisher Flow explores using Fisher information and information geometry for optimization problems, particularly in machine learning.

Information Geometry Primer

Information geometry studies the geometry of probability distributions:

Manifold Structure: Probability distributions form a Riemannian manifold
Fisher Information Metric: Provides the natural distance metric
Geodesics: Shortest paths on the manifold = efficient optimization

Fisher Information Matrix

For parametric distribution p(x|θ), the Fisher Information is:

FIM: I(θ) = E[∇log p(x|θ) ∇log p(x|θ)ᵀ]

This captures:

Sensitivity: How much distribution changes with parameters
Curvature: Second-order information about likelihood surface
Natural Gradient: The true gradient direction in parameter space

Natural Gradient Descent

Standard gradient descent:

θₜ₊₁ = θₜ - α ∇L(θₜ)
Uses Euclidean geometry
Inefficient in parameter space

Natural gradient descent:

θₜ₊₁ = θₜ - α I(θₜ)⁻¹ ∇L(θₜ)
Uses information geometry
Invariant to parametrization

Fisher Flow Dynamics

The paper proposes continuous-time optimization:

dθ/dt = -I(θ)⁻¹ ∇L(θ)

This defines a flow on the parameter manifold with properties:

Monotonic Decrease: Loss decreases along flow
Geodesic Paths: Flow follows natural geometry
Adaptive Step Size: Curvature automatically scales updates

Applications to Deep Learning

Neural Network Training

Benefits:

Faster Convergence: Accounts for parameter correlations
Better Generalization: Natural geometry reduces overfitting
Stable Training: Adaptive scaling prevents instability

Challenges:

Computational Cost: Computing I(θ)⁻¹ expensive for large networks
Approximations: Use diagonal or block-diagonal FIM
Stochastic Setting: Estimate FIM from mini-batches

K-FAC Algorithm

Kronecker-Factored Approximate Curvature:

Approximates FIM with Kronecker products
O(n) computation vs. O(n³) for exact natural gradient
Practical for modern deep networks

Variational Inference

Fisher flow provides natural framework:

Variational Distribution: q(z|φ)
Objective: KL[q(z|φ) || p(z|x)]
Flow: Follow Fisher information geometry

Results in efficient inference with geometric guarantees.

Theoretical Properties

The paper proves:

Convergence:

Fisher flow converges to local optima
Convergence rate depends on conditioning of FIM

Stability:

Flow is stable under perturbations
Natural gradient provides implicit regularization

Geometry:

Respects underlying probability manifold structure
Invariant to reparametrization

Practical Implementation

def fisher_flow_step(params, loss_fn, data):
    # Compute gradient
    grad = compute_gradient(loss_fn, params, data)

    # Estimate Fisher information (diagonal approx)
    fisher = estimate_fisher_diagonal(params, data)

    # Natural gradient step
    natural_grad = grad / (fisher + epsilon)

    # Update parameters
    params = params - learning_rate * natural_grad

    return params

Future Directions

Higher-Order Methods: Incorporating 3rd order geometry
Adaptive Approximations: Dynamic FIM approximation quality
Distributed Optimization: Fisher flow in federated learning
Non-Parametric: Extending to infinite-dimensional spaces

Information Geometry Primer

Fisher Information Matrix

Natural Gradient Descent

Fisher Flow Dynamics

Applications to Deep Learning

Neural Network Training

K-FAC Algorithm

Variational Inference

Theoretical Properties

Practical Implementation

Future Directions

Related Posts

MCTS-Reasoning: Tree Search for LLM Reasoning

FemtoGrad: A Minimal Automatic Differentiation Library

The AI Course: Everything is Utility Maximization

compositional.mle: SICP-Inspired Optimization

The Policy: S-Risk Scenarios - Worse Than Extinction

Discussion