Skip to main content

Teaching Linear Algebra with C++20 Concepts

The world has Eigen, Armadillo, Blaze. Why build another linear algebra library?

Because none of them are trying to teach. elementa exists to teach three things at once: linear algebra, modern C++, and numerical computing. Every design choice prioritizes clarity over cleverness. The code reads like a textbook that happens to compile.

The Matrix Concept

C++20 concepts let you express “what a matrix is” as a compile-time contract:

template <typename M>
concept Matrix = requires(M m, const M cm, std::size_t i, std::size_t j) {
    typename M::scalar_type;

    { cm.rows() } -> std::same_as<std::size_t>;
    { cm.cols() } -> std::same_as<std::size_t>;

    { m(i, j) } -> std::same_as<typename M::scalar_type&>;
    { cm(i, j) } -> std::same_as<const typename M::scalar_type&>;

    { cm + cm } -> std::same_as<M>;
    { cm - cm } -> std::same_as<M>;
    { -cm } -> std::same_as<M>;
};

This says: a type M is a Matrix if it has a scalar_type, dimension queries, element access (mutable and const), and basic arithmetic. Notice what’s absent: scalar multiplication. That omission is deliberate. Including it creates circular constraint issues with the operator* overload for matrix multiplication. Instead, there’s a scale() function for generic code.

The point of the concept is that any type satisfying these constraints works with all the algorithms. No inheritance. No virtual functions. You can write:

template <Matrix M>
auto det(const M& A) -> typename M::scalar_type;

and it works for matrix<double>, matrix<float>, or any future type that satisfies Matrix.

API Design

A pedagogical library needs a clean interface:

// Default: empty 0x0
matrix<double> empty;

// Filled with value
matrix<double> zeros(3, 4, 0.0);  // 3x4 of zeros

// Flat initializer list (row-major)
matrix<double> flat(2, 3, {1, 2, 3, 4, 5, 6});

// Nested initializer list (most natural)
matrix<double> natural{{1, 2, 3},
                       {4, 5, 6}};

Value semantics throughout. Operators like + and - return new matrices, marked [[nodiscard]] so you can’t accidentally discard a result.

LU Decomposition

LU decomposition is the workhorse. It factors A into a lower triangular L and upper triangular U such that PA = LU, where P is a permutation matrix capturing row swaps. This single factorization gives you determinants, inverses, and linear system solving.

The implementation uses partial pivoting: at each step, find the largest absolute value in the current column and use it as the pivot. This prevents division by small numbers that amplify rounding errors.

template <Arithmetic T>
struct lu_result {
    matrix<T> L;                     // Lower triangular (unit diagonal)
    matrix<T> U;                     // Upper triangular
    std::vector<std::size_t> perm;   // Permutation vector
    int sign;                        // Sign of permutation (+1 or -1)
    bool singular;                   // True if matrix is singular
};

Everything Follows from LU

Once you have the factorization, the rest falls out.

Determinant: det(PA) = det(L)det(U), and L has unit diagonal, so the determinant is just the product of U’s diagonal entries times the permutation sign:

template <Matrix M>
auto det(const M& A) -> typename M::scalar_type {
    auto [L, U, perm, sign, singular] = lu(A);
    if (singular) return 0;

    T result = static_cast<T>(sign);  // +/- 1
    for (std::size_t i = 0; i < U.rows(); ++i) {
        result *= U(i, i);  // Product of diagonal
    }
    return result;
}

Solve Ax = b: Apply the permutation to b, forward-substitute through L, back-substitute through U.

Inverse: Solve A * X = I column by column. Computing the inverse by solving n linear systems is more numerically stable than explicit formulas.

Numerical Realities

For large matrices, computing the determinant directly risks overflow or underflow. The fix is logdet, which works in log space:

template <Matrix M>
auto logdet(const M& A) -> std::pair<int, typename M::scalar_type> {
    auto [L, U, perm, sign, singular] = lu(A);
    if (singular) return {0, -std::numeric_limits<T>::infinity()};

    T log_abs_det{0};
    int result_sign = sign;

    for (std::size_t i = 0; i < U.rows(); ++i) {
        auto diag = U(i, i);
        if (diag < 0) {
            result_sign = -result_sign;
            diag = -diag;
        }
        log_abs_det += std::log(diag);  // Sum of logs = log of product
    }

    return {result_sign, log_abs_det};
}

Cache-Conscious Multiplication

The standard matrix multiplication formula:

C(i,j) = sum over k of A(i,k) * B(k,j)

A naive i-j-k loop order causes cache misses when accessing B(k,j) for varying k with row-major storage. You’re jumping through memory by column.

Reorder to i-k-j:

template <Matrix M1, Matrix M2>
auto matmul(const M1& A, const M2& B) -> matrix<typename M1::scalar_type> {
    matrix<T> C(A.rows(), B.cols(), T{0});

    for (std::size_t i = 0; i < A.rows(); ++i) {
        for (std::size_t k = 0; k < A.cols(); ++k) {
            auto aik = A(i, k);  // Load once, reuse across j loop
            for (std::size_t j = 0; j < B.cols(); ++j) {
                C(i, j) += aik * B(k, j);
            }
        }
    }
    return C;
}

Now A(i,k) loads once and reuses for all j. This reordering alone gives 2-10x speedups on real hardware. It’s the same computation, just friendlier to the cache hierarchy.

The Bigger Picture

elementa shows that linear algebra code can be readable without being slow. The C++20 concept defines what matrices are, not how they’re stored. LU serves as the algorithmic foundation. Numerical details like logdet acknowledge floating-point realities.

The payoff for generic design shows up downstream. The automatic differentiation library I built (the dual numbers post) works directly with elementa’s Matrix concept, computing gradients for matrix operations without knowing the concrete type. That’s what concept-based generic programming gives you: algorithms that work for any conforming type, now and in the future.

Discussion