Building Languages to Solve Problems

Chapter 4 of Structure and Interpretation of Computer Programs begins with one of the most profound insights in all of programming: the most powerful technique for controlling complexity is metalinguistic abstraction, the establishment of new languages.

Not libraries. Not frameworks. Languages.

When you've abstracted enough of a problem domain into primitives, combination rules, and naming mechanisms, you haven't just written code. You've created a new way of thinking about the problem. The domain becomes expressible. And once something is expressible, it becomes manipulable, debuggable, and shareable.

What Is Metalinguistic Abstraction?¶

The key distinction is between using a language and creating one. A library gives you functions to call. A language gives you a grammar for expressing ideas.

Consider the difference:

Library approach: Call db.execute("SELECT * FROM users WHERE age > 21")

Language approach: Write SELECT * FROM users WHERE age > 21

SQL isn't a library. It's a language, with primitives (tables, columns), means of combination (joins, unions, subqueries), and means of abstraction (views, CTEs). These three elements (primitives, combination, abstraction) are SICP's fundamental criteria for any language, and they're what separates a DSL from a mere API.

Other examples:

Regular expressions: primitives (characters, character classes), combination (concatenation, alternation), abstraction (groups, backreferences)
Make: primitives (targets, prerequisites), combination (dependency chains), abstraction (pattern rules, variables)
CSS selectors: primitives (elements, classes, IDs), combination (descendant, child, sibling), abstraction (custom properties, mixins in preprocessors)

In each case, the language captures the essential structure of the problem domain in a way that raw code cannot.

The Three Requirements¶

SICP identifies three necessary components for any language:

Primitives: What are the basic elements that cannot be broken down further?
Means of combination: How do you build compound elements from simpler ones?
Means of abstraction: How do you name and reuse patterns?

When designing a DSL, these questions guide everything. Get them wrong and you have a clunky API. Get them right and the domain becomes thinkable in your language.

Consider an expression language for symbolic math:

Primitives: numbers, symbols, operators
Combination: function application (+ x 1), nested expressions (* (+ x 1) 2)
Abstraction: named rules, rulesets, engines

Or a query language for JSON documents:

Primitives: field access, array indexing, literals
Combination: boolean operators, path composition
Abstraction: named queries, parameterized patterns

The Closure Property¶

Beyond the three requirements, SICP emphasizes a crucial design principle: the closure property. This isn't about closures in the functional programming sense. It's about algebraic closure: the result of combining things should be the same kind of thing.

In Scheme, combining procedures yields a procedure. You can pass it, return it, and combine it again without special cases. This enables arbitrary composition.

In a well-designed DSL:

Combining queries yields a query
Combining transformations yields a transformation
Combining rules yields a ruleset

When closure holds, your language has compositional depth. Users can build arbitrarily complex structures from simple pieces, and the cognitive load stays constant because they're always working with the same kind of thing.

DSLs in My Projects¶

Several of my projects embody these principles:

Rerum: Rules as Data¶

Rerum is a pattern matching and term rewriting library with a declarative rule DSL:

@add-zero[100]: (+ ?x 0) => :x
@mul-one[100]:  (* ?x 1) => :x

Rules are data: loadable, combinable, inspectable. Engines compose:

normalize = expand >> simplify  # Sequence
combined = algebra | calculus   # Union

Combining engines yields an engine. Closure holds.

jsonl-algebra: Relational Algebra for JSON¶

jsonl-algebra implements relational algebra as Unix pipeline stages:

ja join users.jsonl orders.jsonl --on user.id=customer_id \
  | ja select 'amount > 100' \
  | ja groupby user.name --agg sum:amount

Each operation takes a relation and returns a relation. Pipes compose operations. Closure means you can chain operations without limit.

compositional.mle: Composable Solvers¶

compositional.mle treats optimization solvers as first-class composable objects:

strategy <- grid_search(lower, upper, n=5) %>>%
  gradient_ascent(max_iter=50) %>>%
  newton_raphson(max_iter=20)

Combining solvers yields a solver. The %>>% operator sequences them; %|% races them in parallel. Either way, the result has the same interface as the inputs.

chop: Image Pipelines as JSON¶

chop forces every image command through a single type: PipelineState -> PipelineState. The pipeline is JSON, not pixels. Composition is free because the output of any command is valid input to any other.

When to Build a DSL¶

Not every problem needs a language. Signs you might need one:

The same structural patterns keep appearing across your code
Domain experts need to express logic without writing code
Configuration has grown complex enough to need validation and tooling
You want transformations to be inspectable, serializable, or version-controlled

Signs you don't:

It's a one-off script with no reuse
The domain doesn't have clear compositional structure
A configuration file would be simpler

The litmus test: if you find yourself building a recursive interpreter by accident (parsing strings, handling nested cases, managing state) you're already building a language. You might as well do it intentionally.

The Payoff¶

The real benefit of metalinguistic abstraction isn't cleaner code. It's that the problem domain becomes thinkable.

When you have a language for expressing transformations, you can reason about transformations. You can ask: what happens if I apply this rule before that one? What's the minimal set of rules that covers this domain? Are these two rulesets equivalent?

When you have a language for expressing queries, you can optimize queries, explain query plans, and catch errors at parse time rather than runtime.

The language makes the domain explicit. And explicit domains are controllable domains.

This is why SICP puts metalinguistic abstraction at the pinnacle of the book. Not because it's technically harder than everything else, but because it's the technique that makes everything else possible.

This post is part of a series on SICP, exploring how the ideas from Structure and Interpretation of Computer Programs appear in modern programming practice.