Skip to main content

The Dot Ecosystem: From Simple Paths to Data Algebras

dotsuite is a suite of composable tools for working with nested data structures like JSON, YAML, and Python dictionaries. What began as a single helper function evolved into a complete, mathematically grounded ecosystem—a journey in API design guided by purity, pedagogy, and the Unix philosophy.

The Origin Story

It always starts with a simple problem. You have a nested dictionary or JSON payload, and you need to get a value buried deep inside:

# Brittle code that crashes on missing keys
email = data['user']['contacts'][0]['email']  # KeyError? IndexError?

The first, obvious solution is a helper function:

# The essence of dotget - simple enough to copy
def get(data, path, default=None):
    try:
        for segment in path.split('.'):
            data = data[int(segment)] if segment.isdigit() else data[segment]
        return data
    except (KeyError, IndexError, TypeError):
        return default

This is where the story begins. What started as this single, humble function evolved through a series of questions and insights into a complete, coherent ecosystem for data manipulation.

The Three Pillars

The ecosystem is built on three fundamental pillars, each answering a core question about data:

🎯 Depth Pillar — “Where is the data?”

Tools for finding and extracting values from within documents.

ToolPurposeComplexity
dotgetSimple exact pathsget(data, "user.name")
dotstarWildcard patternssearch(data, "users.*.name")
dotselectAdvanced selection with predicatesfind_first(data, "users[role=admin].name")
dotpathExtensible path enginePowers all other tools, Turing-complete

Mathematical foundation: The addressing layer forms a free algebra on selectors, with operators being morphisms in the Kleisli category of the powerset monad. This ensures compositional reasoning: dotstar ∘ dotselect still yields a well-defined set of values.

✅ Truth Pillar — “Is this assertion true?”

Tools for asking boolean questions and validating data.

ToolPurposeLogic
dotexistsPath existencecheck(data, "user.email")
dotanyExistential quantifierany_match(data, "users.*.role", "admin")
dotallUniversal quantifierall_match(data, "users.*.status", "active")
dotqueryCompositional logic engineQuery("any equals role admin").check(data)

Mathematical foundation: Predicates form a Boolean algebra under ∧, ∨, ¬ that is homomorphic to set algebra on result subsets (intersection, union, complement). This enables short-circuit evaluation and distributive laws.

🔄 Shape Pillar — “How should the data be transformed?”

Tools for reshaping and modifying data structures.

ToolPurposeType
dotmodSurgical modificationsset_(data, "user.status", "inactive")
dotbatchAtomic transactionsApply multiple changes safely
dotpipeData transformation pipelinesReshape documents into new forms
dotpluckValue extractionCreate new structures from selections

Mathematical foundation: Transformations are endofunctors on document spaces with monoid composition. dotmod implements lenses with put-get laws, while dotpipe provides Kleisli composition of pure functions.

The Pedagogical Gradient

The ecosystem is designed as a learning journey:

1. Hello World — O(1) mental load

from depth.dotget.core import get
from truth.dotexists.core import check

data = {"users": [{"name": "Alice"}]}
name = get(data, "users.0.name")  # "Alice"
exists = check(data, "users.0.email")  # False

2. Pattern Phase — Wildcards and basic changes

from depth.dotstar.core import search
from shape.dotmod.core import set_

all_names = search(data, "users.*.name")  # ["Alice"]
new_data = set_(data, "users.0.status", "active")  # Immutable!

3. Power User — Complex operations

from truth.dotquery.core import Query

# Compositional boolean logic
query = Query("equals role admin and greater login_count 10")
admins = [user for user in users if query.check(user)]

4. Expert — Extensible engines

from depth.dotpath.core import PathEngine
from collections.dotrelate.core import join

# Custom selector primitives
engine = PathEngine()
engine.register_selector(FuzzyKeySelector())

# Relational operations
enriched = join(users, orders, left_on="id", right_on="user_id")

Each stage builds on the previous, with no tool becoming obsolete. A dotget call is still the right choice when you know the exact path.

The “Steal This Code” Philosophy

Many tools are intentionally simple enough that you can copy their core logic rather than add a dependency:

# The entire essence of dotget
def get(data, path, default=None):
    try:
        for segment in path.split('.'):
            data = data[int(segment)] if segment.isdigit() and isinstance(data, list) else data[segment]
        return data
    except (KeyError, IndexError, TypeError):
        return default

This embodies the Unix philosophy: Make each program do one thing so well that its implementation becomes obvious.

Lifting to Collections

Once you have operations on single documents, you can lift them to collections:

Boolean Wing — Filtering

from collections.dotfilter.core import filter_collection

# Filter documents matching a predicate
admins = filter_collection(
    users,
    Query("equals role admin")
)

# Boolean algebra on collections
active_admins = filter_collection(
    admins,
    Query("equals status active")
)

Mathematical property: Every collection operator is a monoid homomorphism with respect to multiset union, enabling parallelization and streaming.

Relational Wing — Joining and Transforming

from collections.dotrelate.core import join, project, union

# Relational algebra on document collections
user_orders = join(
    users, orders,
    left_on="id",
    right_on="user_id",
    how="left"
)

# Project to specific fields
summary = project(user_orders, ["name", "total_spent"])

This mirrors database theory: single-document logic ≈ tuple calculus; dotrelate ≈ relational algebra.

Command-Line First

Every tool works from the command line, making them perfect for shell scripts and data pipelines:

# Check if any user is an admin
cat users.json | dotquery "any equals role admin" && echo "Admin found"

# Extract all email addresses
cat contacts.json | dotstar "contacts.*.email" > emails.txt

# Join users with their orders
dotrelate join --on="user_id" users.jsonl orders.jsonl

# Filter pipeline
cat books.json \
  | dotquery "less price 10 or equals author 'Tolkien'" \
  | dotquery and "equals in_stock true" \
  | dotquery resolve

Dual APIs: Programmatic and Declarative

Many tools offer both Python APIs and serializable query formats:

# Programmatic (Python API)
from truth.dotquery.primitives import equals, greater
query = Query(equals('role', 'admin') & greater('login_count', 10))

# Declarative (string DSL)
query = Query("equals role admin and greater login_count 10")

# Both produce the same AST
assert query.ast == other_query.ast

This enables:

  • Serialization: Save queries to JSON/YAML
  • Network transmission: Send queries over REST APIs
  • Version control: Track query logic in git
  • CLI usage: Complex logic from command line

Mathematical Foundation

The ecosystem is built on solid mathematical principles:

Type Signatures

Depth Pillar (Addressing):

\[ \text{dotget}: \mathcal{D} \times \text{Path}_{\text{exact}} \to V \cup \{\emptyset\} \]

\[ \text{dotstar}: \mathcal{D} \times \text{Path}_{*} \to V^{*} \]

\[ \text{dotpath}: \text{Free algebra on selectors (Turing-complete)} \]

Truth Pillar (Predicates):

\[ \text{dotexists}: \mathcal{D} \times P \to \mathbb{B} \]

\[ \text{dotquery}: \mathcal{D} \times \mathcal{L}_{\text{BOOL}}(\varphi_i) \to \mathbb{B} \]

Shape Pillar (Transformations):

\[ \text{dotmod}: \mathcal{D} \times \delta \to \mathcal{D} \quad \text{(lens with put-get law)} \]

\[ \text{dotpipe}: \mathcal{D} \times F^{*} \to \mathcal{D} \quad \text{(Kleisli composition)} \]

Functorial Lifting

Operations lift to collections via functors:

\[ \text{map}: (\mathcal{D} \to X) \to (C \to X^{*}) \]

\[ \text{filter}: (\mathcal{D} \to \mathbb{B}) \to (C \to C) \]

Where \(C \subseteq \mathcal{D}\) is a finite multiset (collection).

Key Properties

  1. Compositionality: Operators form algebras (Boolean, monoidal, Kleisli) guaranteeing local reasoning
  2. Purity: Functions are referentially transparent—concurrency is trivial
  3. Totality: Missing paths yield ∅ rather than exceptions
  4. Orthogonality: No operator belongs to multiple pillars
  5. Homomorphism: Collection operators preserve structure under union

Real-World Examples

Data Pipeline Processing

from depth.dotstar.core import search
from truth.dotquery.core import Query
from shape.dotpipe.core import pipe

# Extract → Filter → Transform pipeline
pipeline = pipe(
    lambda doc: search(doc, "transactions.*.amount"),
    lambda amounts: [a for a in amounts if a > 100],
    lambda large: sum(large)
)

total_large_transactions = pipeline(financial_data)

ETL with Relational Operations

from collections.dotrelate.core import join, project

# Join users with orders
user_orders = join(users, orders,
                   left_on="user_id",
                   right_on="customer_id")

# Join with products
full_data = join(user_orders, products,
                 left_on="product_id",
                 right_on="id")

# Project to summary
summary = project(full_data, [
    "user.name",
    "order.date",
    "product.name",
    "order.total"
])

Complex Boolean Logic

from truth.dotquery.core import Query

# Find books that are:
# - Either cheap (<$10) OR by Tolkien
# - AND in stock
# - AND have good reviews (>4 stars)

query = Query("""
    (less price 10 or equals author 'Tolkien')
    and equals in_stock true
    and greater rating 4.0
""")

matching_books = [book for book in catalog if query.check(book)]

Design Principles

🧩 Compositionality: Every tool composes cleanly with others

🔒 Immutability: Original data is never modified

📚 Pedagogical: Simple tools graduate to powerful ones

🎯 Single Purpose: Each tool has one clear responsibility

🔗 Interoperability: Common patterns work across all tools

⚡ Performance: Lazy evaluation and efficient algorithms

🛡️ Safety: Graceful handling of missing paths and malformed data

Comparison: dotsuite vs JAF

For production use cases requiring advanced features, consider JAF (Just Another Flow):

FeaturedotsuiteJAF
PhilosophyPedagogy, simplicityProduction-ready, feature-complete
StreamingPlanned✅ Lazy streaming
Path SystemJSONPath-inspired✅ Regex, fuzzy, wildcards
Query LanguageBoolean DSL✅ S-expressions
Set OperationsPlanned✅ Index-preserving results
Data SourcesFiles✅ Files, directories, stdin, compressed
Target AudienceLearners, scriptersProduction systems

Think of dotsuite as the “learn by building” approach and JAF as the “battle-tested solution”—both valuable for different purposes.

The Unix Philosophy in Action

Each tool embodies the Unix philosophy:

  1. Make each program do one thing well

    • dotget only gets values
    • dotexists only checks existence
    • dotmod only modifies documents
  2. Expect the output of every program to become the input to another

    • All tools accept/produce JSON
    • Composable via pipes: dotstar | dotquery | dotrelate
  3. Write programs to work together

    • Common interfaces (paths, predicates, documents)
    • Shared semantics across tools
  4. Write programs to handle text streams

    • JSONL for collections
    • Streaming evaluation where possible

Category Theory for the Curious

For those interested in the theoretical foundations:

Paths as Optics: The addressing layer implements lenses and prisms from optics theory, providing composable getters and setters.

Predicates as Subobjects: Boolean queries correspond to subobjects in the topos of sets, with ∧/∨/¬ being categorical limits/colimits/exponentials.

Transformations as Endofunctors: Shape operations are endofunctors on the category of documents, with natural transformations providing the composition structure.

Collections as Functorial Lift: The map/filter operations on collections are functors from the category of single-document operations to the category of collection operations, preserving algebraic structure.

This isn’t just mathematical window dressing—these properties guarantee that composition behaves predictably and that parallel evaluation is always safe.

Quick Start

# Install from PyPI (once published)
pip install dotsuite

# Or install from source
git clone https://github.com/queelius/dotsuite.git
cd dotsuite
pip install -e .

Basic usage:

from depth.dotget.core import get
from depth.dotstar.core import search
from truth.dotquery.core import Query
from shape.dotmod.core import set_

# Simple addressing
data = {"users": [{"name": "Alice", "role": "admin"}]}
name = get(data, "users.0.name")  # "Alice"

# Pattern matching
all_names = search(data, "users.*.name")  # ["Alice"]

# Boolean queries
is_admin = Query("equals role admin").check(data["users"][0])  # True

# Immutable modifications
new_data = set_(data, "users.0.status", "active")

The Journey Continues

The dot ecosystem demonstrates that good API design is about more than convenience—it’s about mathematical coherence, pedagogical progression, and compositional guarantees.

What began as a simple dotget function revealed a complete algebra for data manipulation, where:

  • Every operation has a clear mathematical meaning
  • Composition is guaranteed to work
  • Parallelization is safe by construction
  • Learning proceeds from simple to sophisticated without abandoning earlier tools

This is software design as mathematics, pedagogy, and philosophy working together.

Resources

License

MIT


The dot ecosystem: from simple paths to sophisticated data algebras, one tool at a time. A journey in API design guided by purity, pedagogy, and the principle of least power.

Discussion