dotsuite is a suite of composable tools for working with nested data structures like JSON, YAML, and Python dictionaries. It started as a single helper function and grew into something with actual mathematical structure. That growth is the interesting part.
The Origin
It always starts with a simple problem. You have a nested dictionary and you need a value buried deep inside:
# Brittle code that crashes on missing keys
email = data['user']['contacts'][0]['email'] # KeyError? IndexError?
The first solution is a helper function:
# The essence of dotget - simple enough to copy
def get(data, path, default=None):
try:
for segment in path.split('.'):
data = data[int(segment)] if segment.isdigit() else data[segment]
return data
except (KeyError, IndexError, TypeError):
return default
This is where the story begins. That single function, once you start asking questions about what else you need, leads to a complete ecosystem for data manipulation. The trick is that the questions have a natural structure to them.
The Three Pillars
The ecosystem organizes around three fundamental questions about data:
Depth Pillar: “Where is the data?”
Tools for finding and extracting values from within documents.
| Tool | Purpose | Complexity |
|---|---|---|
| dotget | Simple exact paths | get(data, "user.name") |
| dotstar | Wildcard patterns | search(data, "users.*.name") |
| dotselect | Advanced selection with predicates | find_first(data, "users[role=admin].name") |
| dotpath | Extensible path engine | Powers all other tools, Turing-complete |
The addressing layer forms a free algebra on selectors, with operators being morphisms in the Kleisli category of the powerset monad. In practice this means dotstar composed with dotselect still yields a well-defined set of values. You can compose these things without worrying about edge cases blowing up.
Truth Pillar: “Is this assertion true?”
Tools for asking boolean questions and validating data.
| Tool | Purpose | Logic |
|---|---|---|
| dotexists | Path existence | check(data, "user.email") |
| dotany | Existential quantifier | any_match(data, "users.*.role", "admin") |
| dotall | Universal quantifier | all_match(data, "users.*.status", "active") |
| dotquery | Compositional logic engine | Query("any equals role admin").check(data) |
Predicates form a Boolean algebra under conjunction, disjunction, and negation that is homomorphic to set algebra on result subsets. This enables short-circuit evaluation and distributive laws. The math isn’t decoration; it’s what makes the composition reliable.
Shape Pillar: “How should the data be transformed?”
Tools for reshaping and modifying data structures.
| Tool | Purpose | Type |
|---|---|---|
| dotmod | Surgical modifications | set_(data, "user.status", "inactive") |
| dotbatch | Atomic transactions | Apply multiple changes safely |
| dotpipe | Data transformation pipelines | Reshape documents into new forms |
| dotpluck | Value extraction | Create new structures from selections |
Transformations are endofunctors on document spaces with monoid composition. dotmod implements lenses with put-get laws, while dotpipe provides Kleisli composition of pure functions.
The Pedagogical Gradient
The ecosystem is designed as a learning path. Each level builds on the previous, and no tool becomes obsolete.
1. Hello World (minimal mental load):
from depth.dotget.core import get
from truth.dotexists.core import check
data = {"users": [{"name": "Alice"}]}
name = get(data, "users.0.name") # "Alice"
exists = check(data, "users.0.email") # False
2. Pattern Phase (wildcards and basic changes):
from depth.dotstar.core import search
from shape.dotmod.core import set_
all_names = search(data, "users.*.name") # ["Alice"]
new_data = set_(data, "users.0.status", "active") # Immutable!
3. Power User (complex operations):
from truth.dotquery.core import Query
# Compositional boolean logic
query = Query("equals role admin and greater login_count 10")
admins = [user for user in users if query.check(user)]
4. Expert (extensible engines):
from depth.dotpath.core import PathEngine
from collections.dotrelate.core import join
# Custom selector primitives
engine = PathEngine()
engine.register_selector(FuzzyKeySelector())
# Relational operations
enriched = join(users, orders, left_on="id", right_on="user_id")
A dotget call is still the right choice when you know the exact path. You don’t outgrow the simple tools; you add more when you need them.
The “Steal This Code” Philosophy
Many tools are intentionally simple enough that you can copy their core logic rather than add a dependency:
# The entire essence of dotget
def get(data, path, default=None):
try:
for segment in path.split('.'):
data = data[int(segment)] if segment.isdigit() and isinstance(data, list) else data[segment]
return data
except (KeyError, IndexError, TypeError):
return default
Make each program do one thing so well that its implementation becomes obvious. If you don’t need the rest of the suite, just take what you need.
Lifting to Collections
Once you have operations on single documents, you can lift them to collections. This is where the category theory actually pays off.
Boolean Wing (Filtering)
from collections.dotfilter.core import filter_collection
# Filter documents matching a predicate
admins = filter_collection(
users,
Query("equals role admin")
)
# Boolean algebra on collections
active_admins = filter_collection(
admins,
Query("equals status active")
)
Every collection operator is a monoid homomorphism with respect to multiset union, which means you can parallelize and stream without thinking about correctness.
Relational Wing (Joining and Transforming)
from collections.dotrelate.core import join, project, union
# Relational algebra on document collections
user_orders = join(
users, orders,
left_on="id",
right_on="user_id",
how="left"
)
# Project to specific fields
summary = project(user_orders, ["name", "total_spent"])
This mirrors database theory: single-document logic is like tuple calculus; dotrelate is like relational algebra.
Command-Line First
Every tool works from the command line:
# Check if any user is an admin
cat users.json | dotquery "any equals role admin" && echo "Admin found"
# Extract all email addresses
cat contacts.json | dotstar "contacts.*.email" > emails.txt
# Join users with their orders
dotrelate join --on="user_id" users.jsonl orders.jsonl
# Filter pipeline
cat books.json \
| dotquery "less price 10 or equals author 'Tolkien'" \
| dotquery and "equals in_stock true" \
| dotquery resolve
Dual APIs: Programmatic and Declarative
Many tools offer both Python APIs and serializable query formats:
# Programmatic (Python API)
from truth.dotquery.primitives import equals, greater
query = Query(equals('role', 'admin') & greater('login_count', 10))
# Declarative (string DSL)
query = Query("equals role admin and greater login_count 10")
# Both produce the same AST
assert query.ast == other_query.ast
This means you can serialize queries to JSON, send them over APIs, version control them in git, and use them from the command line. Same semantics everywhere.
Mathematical Foundation
For those who care about the formal structure (I do):
Type Signatures
Depth Pillar (Addressing):
\[ \text{dotget}: \mathcal{D} \times \text{Path}_{\text{exact}} \to V \cup \lbrace \emptyset\rbrace \]\[ \text{dotstar}: \mathcal{D} \times \text{Path}_{*} \to V^{*} \]
\[ \text{dotpath}: \text{Free algebra on selectors (Turing-complete)} \]
Truth Pillar (Predicates):
\[ \text{dotexists}: \mathcal{D} \times P \to \mathbb{B} \]\[ \text{dotquery}: \mathcal{D} \times \mathcal{L}_{\text{BOOL}}(\varphi_i) \to \mathbb{B} \]
Shape Pillar (Transformations):
\[ \text{dotmod}: \mathcal{D} \times \delta \to \mathcal{D} \quad \text{(lens with put-get law)} \]\[ \text{dotpipe}: \mathcal{D} \times F^{*} \to \mathcal{D} \quad \text{(Kleisli composition)} \]
Functorial Lifting
Operations lift to collections via functors:
\[ \text{map}: (\mathcal{D} \to X) \to (C \to X^{*}) \]\[ \text{filter}: (\mathcal{D} \to \mathbb{B}) \to (C \to C) \]
Where \(C \subseteq \mathcal{D}\) is a finite multiset (collection).
Key Properties
- Compositionality: Operators form algebras (Boolean, monoidal, Kleisli) guaranteeing local reasoning
- Purity: Functions are referentially transparent, so concurrency is trivial
- Totality: Missing paths yield the empty set rather than exceptions
- Orthogonality: No operator belongs to multiple pillars
- Homomorphism: Collection operators preserve structure under union
Real-World Examples
Data Pipeline Processing
from depth.dotstar.core import search
from truth.dotquery.core import Query
from shape.dotpipe.core import pipe
# Extract -> Filter -> Transform pipeline
pipeline = pipe(
lambda doc: search(doc, "transactions.*.amount"),
lambda amounts: [a for a in amounts if a > 100],
lambda large: sum(large)
)
total_large_transactions = pipeline(financial_data)
ETL with Relational Operations
from collections.dotrelate.core import join, project
# Join users with orders
user_orders = join(users, orders,
left_on="user_id",
right_on="customer_id")
# Join with products
full_data = join(user_orders, products,
left_on="product_id",
right_on="id")
# Project to summary
summary = project(full_data, [
"user.name",
"order.date",
"product.name",
"order.total"
])
Complex Boolean Logic
from truth.dotquery.core import Query
# Find books that are:
# - Either cheap (<$10) OR by Tolkien
# - AND in stock
# - AND have good reviews (>4 stars)
query = Query("""
(less price 10 or equals author 'Tolkien')
and equals in_stock true
and greater rating 4.0
""")
matching_books = [book for book in catalog if query.check(book)]
Design Principles
- Compositionality: Every tool composes cleanly with others
- Immutability: Original data is never modified
- Pedagogical: Simple tools graduate to powerful ones
- Single Purpose: Each tool has one clear responsibility
- Interoperability: Common patterns work across all tools
- Safety: Graceful handling of missing paths and malformed data
Category Theory for the Curious
For those interested in the theoretical foundations:
Paths as Optics: The addressing layer implements lenses and prisms from optics theory, providing composable getters and setters.
Predicates as Subobjects: Boolean queries correspond to subobjects in the topos of sets, with conjunction/disjunction/negation being categorical limits/colimits/exponentials.
Transformations as Endofunctors: Shape operations are endofunctors on the category of documents, with natural transformations providing the composition structure.
Collections as Functorial Lift: The map/filter operations on collections are functors from the category of single-document operations to the category of collection operations, preserving algebraic structure.
This isn’t mathematical window dressing. These properties guarantee that composition behaves predictably and that parallel evaluation is always safe. When I say “this composes,” I mean it in the category-theoretic sense, not the marketing sense.
Quick Start
# Install from PyPI (once published)
pip install dotsuite
# Or install from source
git clone https://github.com/queelius/dotsuite.git
cd dotsuite
pip install -e .
Basic usage:
from depth.dotget.core import get
from depth.dotstar.core import search
from truth.dotquery.core import Query
from shape.dotmod.core import set_
# Simple addressing
data = {"users": [{"name": "Alice", "role": "admin"}]}
name = get(data, "users.0.name") # "Alice"
# Pattern matching
all_names = search(data, "users.*.name") # ["Alice"]
# Boolean queries
is_admin = Query("equals role admin").check(data["users"][0]) # True
# Immutable modifications
new_data = set_(data, "users.0.status", "active")
Resources
- Repository: github.com/queelius/dotsuite
- Philosophy: docs/philosophy.md
- Formal Spec: docs/dot-formalism.md
- Tool Docs: Individual README for each tool
- JAF Comparison: docs/jaf-comparison.md
License
MIT
Discussion