dotsuite is a suite of composable tools for working with nested data structures like JSON, YAML, and Python dictionaries. What began as a single helper function evolved into a complete, mathematically grounded ecosystem—a journey in API design guided by purity, pedagogy, and the Unix philosophy.
The Origin Story
It always starts with a simple problem. You have a nested dictionary or JSON payload, and you need to get a value buried deep inside:
# Brittle code that crashes on missing keys
email = data['user']['contacts'][0]['email'] # KeyError? IndexError?
The first, obvious solution is a helper function:
# The essence of dotget - simple enough to copy
def get(data, path, default=None):
try:
for segment in path.split('.'):
data = data[int(segment)] if segment.isdigit() else data[segment]
return data
except (KeyError, IndexError, TypeError):
return default
This is where the story begins. What started as this single, humble function evolved through a series of questions and insights into a complete, coherent ecosystem for data manipulation.
The Three Pillars
The ecosystem is built on three fundamental pillars, each answering a core question about data:
🎯 Depth Pillar — “Where is the data?”
Tools for finding and extracting values from within documents.
| Tool | Purpose | Complexity |
|---|---|---|
| dotget | Simple exact paths | get(data, "user.name") |
| dotstar | Wildcard patterns | search(data, "users.*.name") |
| dotselect | Advanced selection with predicates | find_first(data, "users[role=admin].name") |
| dotpath | Extensible path engine | Powers all other tools, Turing-complete |
Mathematical foundation: The addressing layer forms a free algebra on selectors, with operators being morphisms in the Kleisli category of the powerset monad. This ensures compositional reasoning: dotstar ∘ dotselect still yields a well-defined set of values.
✅ Truth Pillar — “Is this assertion true?”
Tools for asking boolean questions and validating data.
| Tool | Purpose | Logic |
|---|---|---|
| dotexists | Path existence | check(data, "user.email") |
| dotany | Existential quantifier | any_match(data, "users.*.role", "admin") |
| dotall | Universal quantifier | all_match(data, "users.*.status", "active") |
| dotquery | Compositional logic engine | Query("any equals role admin").check(data) |
Mathematical foundation: Predicates form a Boolean algebra under ∧, ∨, ¬ that is homomorphic to set algebra on result subsets (intersection, union, complement). This enables short-circuit evaluation and distributive laws.
🔄 Shape Pillar — “How should the data be transformed?”
Tools for reshaping and modifying data structures.
| Tool | Purpose | Type |
|---|---|---|
| dotmod | Surgical modifications | set_(data, "user.status", "inactive") |
| dotbatch | Atomic transactions | Apply multiple changes safely |
| dotpipe | Data transformation pipelines | Reshape documents into new forms |
| dotpluck | Value extraction | Create new structures from selections |
Mathematical foundation: Transformations are endofunctors on document spaces with monoid composition. dotmod implements lenses with put-get laws, while dotpipe provides Kleisli composition of pure functions.
The Pedagogical Gradient
The ecosystem is designed as a learning journey:
1. Hello World — O(1) mental load
from depth.dotget.core import get
from truth.dotexists.core import check
data = {"users": [{"name": "Alice"}]}
name = get(data, "users.0.name") # "Alice"
exists = check(data, "users.0.email") # False
2. Pattern Phase — Wildcards and basic changes
from depth.dotstar.core import search
from shape.dotmod.core import set_
all_names = search(data, "users.*.name") # ["Alice"]
new_data = set_(data, "users.0.status", "active") # Immutable!
3. Power User — Complex operations
from truth.dotquery.core import Query
# Compositional boolean logic
query = Query("equals role admin and greater login_count 10")
admins = [user for user in users if query.check(user)]
4. Expert — Extensible engines
from depth.dotpath.core import PathEngine
from collections.dotrelate.core import join
# Custom selector primitives
engine = PathEngine()
engine.register_selector(FuzzyKeySelector())
# Relational operations
enriched = join(users, orders, left_on="id", right_on="user_id")
Each stage builds on the previous, with no tool becoming obsolete. A dotget call is still the right choice when you know the exact path.
The “Steal This Code” Philosophy
Many tools are intentionally simple enough that you can copy their core logic rather than add a dependency:
# The entire essence of dotget
def get(data, path, default=None):
try:
for segment in path.split('.'):
data = data[int(segment)] if segment.isdigit() and isinstance(data, list) else data[segment]
return data
except (KeyError, IndexError, TypeError):
return default
This embodies the Unix philosophy: Make each program do one thing so well that its implementation becomes obvious.
Lifting to Collections
Once you have operations on single documents, you can lift them to collections:
Boolean Wing — Filtering
from collections.dotfilter.core import filter_collection
# Filter documents matching a predicate
admins = filter_collection(
users,
Query("equals role admin")
)
# Boolean algebra on collections
active_admins = filter_collection(
admins,
Query("equals status active")
)
Mathematical property: Every collection operator is a monoid homomorphism with respect to multiset union, enabling parallelization and streaming.
Relational Wing — Joining and Transforming
from collections.dotrelate.core import join, project, union
# Relational algebra on document collections
user_orders = join(
users, orders,
left_on="id",
right_on="user_id",
how="left"
)
# Project to specific fields
summary = project(user_orders, ["name", "total_spent"])
This mirrors database theory: single-document logic ≈ tuple calculus; dotrelate ≈ relational algebra.
Command-Line First
Every tool works from the command line, making them perfect for shell scripts and data pipelines:
# Check if any user is an admin
cat users.json | dotquery "any equals role admin" && echo "Admin found"
# Extract all email addresses
cat contacts.json | dotstar "contacts.*.email" > emails.txt
# Join users with their orders
dotrelate join --on="user_id" users.jsonl orders.jsonl
# Filter pipeline
cat books.json \
| dotquery "less price 10 or equals author 'Tolkien'" \
| dotquery and "equals in_stock true" \
| dotquery resolve
Dual APIs: Programmatic and Declarative
Many tools offer both Python APIs and serializable query formats:
# Programmatic (Python API)
from truth.dotquery.primitives import equals, greater
query = Query(equals('role', 'admin') & greater('login_count', 10))
# Declarative (string DSL)
query = Query("equals role admin and greater login_count 10")
# Both produce the same AST
assert query.ast == other_query.ast
This enables:
- Serialization: Save queries to JSON/YAML
- Network transmission: Send queries over REST APIs
- Version control: Track query logic in git
- CLI usage: Complex logic from command line
Mathematical Foundation
The ecosystem is built on solid mathematical principles:
Type Signatures
Depth Pillar (Addressing):
\[ \text{dotget}: \mathcal{D} \times \text{Path}_{\text{exact}} \to V \cup \{\emptyset\} \]\[ \text{dotstar}: \mathcal{D} \times \text{Path}_{*} \to V^{*} \]\[ \text{dotpath}: \text{Free algebra on selectors (Turing-complete)} \]Truth Pillar (Predicates):
\[ \text{dotexists}: \mathcal{D} \times P \to \mathbb{B} \]\[ \text{dotquery}: \mathcal{D} \times \mathcal{L}_{\text{BOOL}}(\varphi_i) \to \mathbb{B} \]Shape Pillar (Transformations):
\[ \text{dotmod}: \mathcal{D} \times \delta \to \mathcal{D} \quad \text{(lens with put-get law)} \]\[ \text{dotpipe}: \mathcal{D} \times F^{*} \to \mathcal{D} \quad \text{(Kleisli composition)} \]Functorial Lifting
Operations lift to collections via functors:
\[ \text{map}: (\mathcal{D} \to X) \to (C \to X^{*}) \]\[ \text{filter}: (\mathcal{D} \to \mathbb{B}) \to (C \to C) \]Where \(C \subseteq \mathcal{D}\) is a finite multiset (collection).
Key Properties
- Compositionality: Operators form algebras (Boolean, monoidal, Kleisli) guaranteeing local reasoning
- Purity: Functions are referentially transparent—concurrency is trivial
- Totality: Missing paths yield ∅ rather than exceptions
- Orthogonality: No operator belongs to multiple pillars
- Homomorphism: Collection operators preserve structure under union
Real-World Examples
Data Pipeline Processing
from depth.dotstar.core import search
from truth.dotquery.core import Query
from shape.dotpipe.core import pipe
# Extract → Filter → Transform pipeline
pipeline = pipe(
lambda doc: search(doc, "transactions.*.amount"),
lambda amounts: [a for a in amounts if a > 100],
lambda large: sum(large)
)
total_large_transactions = pipeline(financial_data)
ETL with Relational Operations
from collections.dotrelate.core import join, project
# Join users with orders
user_orders = join(users, orders,
left_on="user_id",
right_on="customer_id")
# Join with products
full_data = join(user_orders, products,
left_on="product_id",
right_on="id")
# Project to summary
summary = project(full_data, [
"user.name",
"order.date",
"product.name",
"order.total"
])
Complex Boolean Logic
from truth.dotquery.core import Query
# Find books that are:
# - Either cheap (<$10) OR by Tolkien
# - AND in stock
# - AND have good reviews (>4 stars)
query = Query("""
(less price 10 or equals author 'Tolkien')
and equals in_stock true
and greater rating 4.0
""")
matching_books = [book for book in catalog if query.check(book)]
Design Principles
🧩 Compositionality: Every tool composes cleanly with others
🔒 Immutability: Original data is never modified
📚 Pedagogical: Simple tools graduate to powerful ones
🎯 Single Purpose: Each tool has one clear responsibility
🔗 Interoperability: Common patterns work across all tools
⚡ Performance: Lazy evaluation and efficient algorithms
🛡️ Safety: Graceful handling of missing paths and malformed data
Comparison: dotsuite vs JAF
For production use cases requiring advanced features, consider JAF (Just Another Flow):
| Feature | dotsuite | JAF |
|---|---|---|
| Philosophy | Pedagogy, simplicity | Production-ready, feature-complete |
| Streaming | Planned | ✅ Lazy streaming |
| Path System | JSONPath-inspired | ✅ Regex, fuzzy, wildcards |
| Query Language | Boolean DSL | ✅ S-expressions |
| Set Operations | Planned | ✅ Index-preserving results |
| Data Sources | Files | ✅ Files, directories, stdin, compressed |
| Target Audience | Learners, scripters | Production systems |
Think of dotsuite as the “learn by building” approach and JAF as the “battle-tested solution”—both valuable for different purposes.
The Unix Philosophy in Action
Each tool embodies the Unix philosophy:
Make each program do one thing well
dotgetonly gets valuesdotexistsonly checks existencedotmodonly modifies documents
Expect the output of every program to become the input to another
- All tools accept/produce JSON
- Composable via pipes:
dotstar | dotquery | dotrelate
Write programs to work together
- Common interfaces (paths, predicates, documents)
- Shared semantics across tools
Write programs to handle text streams
- JSONL for collections
- Streaming evaluation where possible
Category Theory for the Curious
For those interested in the theoretical foundations:
Paths as Optics: The addressing layer implements lenses and prisms from optics theory, providing composable getters and setters.
Predicates as Subobjects: Boolean queries correspond to subobjects in the topos of sets, with ∧/∨/¬ being categorical limits/colimits/exponentials.
Transformations as Endofunctors: Shape operations are endofunctors on the category of documents, with natural transformations providing the composition structure.
Collections as Functorial Lift: The map/filter operations on collections are functors from the category of single-document operations to the category of collection operations, preserving algebraic structure.
This isn’t just mathematical window dressing—these properties guarantee that composition behaves predictably and that parallel evaluation is always safe.
Quick Start
# Install from PyPI (once published)
pip install dotsuite
# Or install from source
git clone https://github.com/queelius/dotsuite.git
cd dotsuite
pip install -e .
Basic usage:
from depth.dotget.core import get
from depth.dotstar.core import search
from truth.dotquery.core import Query
from shape.dotmod.core import set_
# Simple addressing
data = {"users": [{"name": "Alice", "role": "admin"}]}
name = get(data, "users.0.name") # "Alice"
# Pattern matching
all_names = search(data, "users.*.name") # ["Alice"]
# Boolean queries
is_admin = Query("equals role admin").check(data["users"][0]) # True
# Immutable modifications
new_data = set_(data, "users.0.status", "active")
The Journey Continues
The dot ecosystem demonstrates that good API design is about more than convenience—it’s about mathematical coherence, pedagogical progression, and compositional guarantees.
What began as a simple dotget function revealed a complete algebra for data manipulation, where:
- Every operation has a clear mathematical meaning
- Composition is guaranteed to work
- Parallelization is safe by construction
- Learning proceeds from simple to sophisticated without abandoning earlier tools
This is software design as mathematics, pedagogy, and philosophy working together.
Resources
- Repository: github.com/queelius/dotsuite
- Philosophy: docs/philosophy.md
- Formal Spec: docs/dot-formalism.md
- Tool Docs: Individual README for each tool
- JAF Comparison: docs/jaf-comparison.md
License
MIT
The dot ecosystem: from simple paths to sophisticated data algebras, one tool at a time. A journey in API design guided by purity, pedagogy, and the principle of least power.
Discussion