ja: JSONL Algebra¶
Relational algebra meets JSON streaming. Transform your data with the power of mathematical principles and the simplicity of Unix pipes.
What is ja?¶
ja (JSONL Algebra) is a command-line tool that brings the elegance of relational algebra to JSON data processing. It treats JSONL files as relations (tables) and provides operations that can be composed into powerful data pipelines.
# A taste of ja
cat orders.jsonl \
| ja select 'status == "shipped"' \
| ja join customers.jsonl --on customer_id=id \
| ja groupby region \
| ja agg revenue=sum(amount),orders=count
Why ja?¶
- 🧮 Algebraic Foundation: Based on mathematical principles that guarantee composability
- 🔗 Unix Philosophy: Small, focused tools that do one thing well
- 📊 Streaming Architecture: Process gigabytes without loading into memory
- 🎯 Nested Data Support: First-class support for real-world JSON structures
- ⚡ Zero Dependencies: Pure Python implementation (with optional enhancements)
Quick Links¶
- Quickstart → Get running in 5 minutes
- Concepts → Understand the theory
- Operations → Learn each operation
- Cookbook → Real-world examples
At a Glance¶
The Operations¶
| Operation | Symbol | Purpose | Example |
|---|---|---|---|
| select | σ | Filter rows | ja select 'age > 30' |
| project | π | Select columns | ja project name,email |
| join | ⋈ | Combine relations | ja join users.jsonl orders.jsonl --on id=user_id |
| groupby | γ | Group rows | ja groupby department |
| union | ∪ | Combine all rows | ja union file1.jsonl file2.jsonl |
| distinct | δ | Remove duplicates | ja distinct |
| window | ω | Window functions | ja window rank --partition-by dept |
The Philosophy¶
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌────────┐
│ Data │ --> │ Filter │ --> │ Join │ --> │ Result │
│ (JSONL) │ │(select) │ │ (join) │ │(JSONL) │
└─────────┘ └─────────┘ └─────────┘ └────────┘
↓ ↓ ↓ ↓
Relation --> Relation --> Relation --> Relation
Every operation takes relations and produces relations. This closure property enables infinite composability.
Installation¶
That's it! You now have the ja command available.
Your First Pipeline¶
Let's analyze some order data. Create orders.jsonl:
{"order_id": 1, "customer": "Alice", "amount": 99.99, "status": "shipped"}
{"order_id": 2, "customer": "Bob", "amount": 149.99, "status": "pending"}
{"order_id": 3, "customer": "Alice", "amount": 79.99, "status": "shipped"}
{"order_id": 4, "customer": "Charlie", "amount": 199.99, "status": "shipped"}
{"order_id": 5, "customer": "Bob", "amount": 59.99, "status": "cancelled"}
1. Filter Orders¶
Get only shipped orders:
2. Calculate Totals¶
Total revenue from shipped orders:
Output:
3. Group by Customer¶
Revenue per customer (shipped only):
ja select 'status == "shipped"' orders.jsonl \
| ja groupby customer \
| ja agg revenue=sum(amount),orders=count
Output:
{"customer": "Alice", "revenue": 179.98, "orders": 2}
{"customer": "Charlie", "revenue": 199.99, "orders": 1}
4. Multi-Level Grouping¶
Our innovative chained groupby enables complex analytics:
cat sales.jsonl \
| ja groupby region \ # First level grouping
| ja groupby product \ # Second level grouping
| ja agg total=sum(amount) # Final aggregation
This produces results like:
{"region": "North", "product": "Widget", "total": 1250}
{"region": "North", "product": "Gadget", "total": 850}
{"region": "South", "product": "Widget", "total": 900}
Key Features¶
- Relational Operations: select, project, join, union, intersection, difference, distinct, and more
- Multiple Join Types: inner, left, right, outer, and cross joins via
--howflag - Window Functions: 10 SQL-style window functions (row_number, rank, dense_rank, lag, lead, first_value, last_value, ntile, percent_rank, cume_dist)
- Chained Grouping: Multi-level grouping that preserves composability
- Nested Data Support: Access and manipulate nested fields using intuitive dot notation
- Streaming Architecture: Process large datasets without loading into memory
- Expression Language: Safe and expressive filtering with ExprEval
- Interactive Shell: Navigate JSON/JSONL like a filesystem with
ja-shell - Interactive REPL: Build data pipelines step-by-step interactively
- Format Conversion: Import/export CSV, JSON arrays, and directory structures
- Unix Philosophy: Designed for pipes and command composition
Working with Nested Data¶
ja makes working with nested JSON objects effortless:
# Project nested fields
ja project user.name,user.email,order.total data.jsonl
# Group by nested values
ja groupby user.region orders.jsonl | ja agg revenue=sum(amount)
# Filter on nested conditions
ja select 'user.age > 30 and order.status == "shipped"' data.jsonl
Interactive Mode¶
Want to explore? Try the REPL:
ja repl
ja> from orders.jsonl
ja> select amount > 100
ja> groupby customer
ja> agg total=sum(amount)
ja> execute
Next Steps¶
- Read the Quickstart - Get hands-on in 5 minutes
- Explore the Concepts - Understand the theory
- Browse the Cookbook - See real examples
- Join the Community - Contribute and get help
Dependencies and Setup¶
ja includes optional dependencies for enhanced functionality:
- jmespath: For safe and expressive filtering (replaces eval)
- jsonschema: For schema validation features
- All other features work without external dependencies
For users (from PyPI)¶
This automatically installs the required dependencies.
For developers (from local repository)¶
Ready to transform your JSON data? Start with the quickstart guide or dive into the concepts to understand the theory behind the tool.