Building Languages to Solve Problems

January 19, 2026 5 min read Updated: February 24, 2026

Chapter 4 of Structure and Interpretation of Computer Programs opens with one of the most important insights in programming: the most powerful technique for controlling complexity is metalinguistic abstraction, the establishment of new languages.

Not libraries. Not frameworks. Languages.

When you’ve abstracted enough of a problem domain into primitives, combination rules, and naming mechanisms, you haven’t just written code. You’ve created a new way of thinking about the problem. The domain becomes expressible. And once something is expressible, it becomes manipulable, debuggable, and shareable.

What Is Metalinguistic Abstraction?

The key distinction is between using a language and creating one. A library gives you functions to call. A language gives you a grammar for expressing ideas.

Consider the difference:

Library approach: Call db.execute("SELECT * FROM users WHERE age > 21")

Language approach: Write SELECT * FROM users WHERE age > 21

SQL isn’t a library. It’s a language, with primitives (tables, columns), means of combination (joins, unions, subqueries), and means of abstraction (views, CTEs). These three elements (primitives, combination, abstraction) are SICP’s fundamental criteria for any language, and they’re what separates a DSL from a mere API.

Other examples:

Regular expressions: primitives (characters, character classes), combination (concatenation, alternation), abstraction (groups, backreferences)
Make: primitives (targets, prerequisites), combination (dependency chains), abstraction (pattern rules, variables)
CSS selectors: primitives (elements, classes, IDs), combination (descendant, child, sibling), abstraction (custom properties, mixins in preprocessors)

In each case, the language captures the essential structure of the problem domain in a way that raw code cannot.

The Three Requirements

SICP identifies three necessary components for any language:

Primitives: What are the basic elements that cannot be broken down further?
Means of combination: How do you build compound elements from simpler ones?
Means of abstraction: How do you name and reuse patterns?

When designing a DSL, these questions guide everything. Get them wrong and you have a clunky API. Get them right and the domain becomes thinkable in your language.

Consider an expression language for symbolic math:

Primitives: numbers, symbols, operators
Combination: function application (+ x 1), nested expressions (* (+ x 1) 2)
Abstraction: named rules, rulesets, engines

Or a query language for JSON documents:

Primitives: field access, array indexing, literals
Combination: boolean operators, path composition
Abstraction: named queries, parameterized patterns

The Closure Property

Beyond the three requirements, SICP emphasizes a crucial design principle: the closure property. This isn’t about closures in the functional programming sense. It’s about algebraic closure: the result of combining things should be the same kind of thing.

In Scheme, combining procedures yields a procedure. You can pass it, return it, and combine it again without special cases. This enables arbitrary composition.

In a well-designed DSL:

Combining queries yields a query
Combining transformations yields a transformation
Combining rules yields a ruleset

When closure holds, your language has compositional depth. Users can build arbitrarily complex structures from simple pieces, and the cognitive load stays constant because they’re always working with the same kind of thing.

DSLs in My Projects

Several projects embody these principles:

Rerum: Rules as Data

Rerum is a pattern matching and term rewriting library with a declarative rule DSL:

@add-zero[100]: (+ ?x 0) => :x
@mul-one[100]:  (* ?x 1) => :x

Rules are data, loadable, combinable, inspectable. Engines compose:

normalize = expand >> simplify  # Sequence
combined = algebra | calculus   # Union

Combining engines yields an engine. Closure holds.

jsonl-algebra: Relational Algebra for JSON

jsonl-algebra lifts relational algebra concepts to nested JSON:

ja 'select(name, age) | where(.age > 21) | sort(name)' data.jsonl

Each operation takes a relation and returns a relation. Pipes compose operations. The Unix philosophy meets relational algebra, and closure means you can chain operations arbitrarily.

Accumux: Compositional Statistics

Accumux treats running statistics as composable monoids:

stats = count() & mean("x") & variance("x")
result = fold(stats, data_stream)

Combining accumulators yields an accumulator. Parallel streams can be folded independently and merged. The monoid laws guarantee correctness.

dotsuite/JAF: Path Expressions over Documents

The JAF path expression language provides boolean algebra over document structure:

.metadata.author & (.status == "published" | .featured)

Combining predicates yields a predicate. The algebra is closed.

When to Build a DSL

Not every problem needs a language. Here are signs you might need one:

You should consider a DSL when:

The same structural patterns keep appearing across your code
Domain experts need to express logic without writing code
Configuration has grown complex enough to need validation and tooling
You want transformations to be inspectable, serializable, or version-controlled

You probably don’t need a DSL when:

It’s a one-off script with no reuse
Simple CRUD operations suffice
The domain doesn’t have clear compositional structure
A configuration file would be simpler

The litmus test: if you find yourself building a recursive interpreter by accident (parsing strings, handling nested cases, managing state) you’re already building a language. You might as well do it intentionally.

The Payoff

The real benefit of metalinguistic abstraction isn’t cleaner code. It’s that the problem domain becomes thinkable.

When you have a language for expressing transformations, you can reason about transformations. You can ask: what happens if I apply this rule before that one? What’s the minimal set of rules that covers this domain? Are these two rulesets equivalent?

When you have a language for expressing queries, you can optimize queries, explain query plans, and catch errors at parse time rather than runtime.

When you have a language for expressing configuration, you can validate configurations, diff configurations, and generate configurations from higher-level specifications.

The language makes the domain explicit. And explicit domains are controllable domains.

This is why SICP puts metalinguistic abstraction at the pinnacle of the book. Not because it’s technically harder than everything else, but because it’s the technique that makes everything else possible.

This post is part of a series on SICP, exploring how the ideas from Structure and Interpretation of Computer Programs appear in modern programming practice.

What Is Metalinguistic Abstraction?

The Three Requirements

The Closure Property

DSLs in My Projects

Rerum: Rules as Data

jsonl-algebra: Relational Algebra for JSON

Accumux: Compositional Statistics

dotsuite/JAF: Path Expressions over Documents

When to Build a DSL

The Payoff

Related Posts

Rerum: Pattern Matching and Term Rewriting in Python

chop: When Every Command Returns the Same Kind of Thing

compositional.mle: SICP-Inspired Optimization

Accumux: Compositional Online Statistical Reductions in C++

Category Theory for Programmers

Discussion