Operations Overview¶

ja provides a complete set of relational algebra operations designed for JSON data. Each operation follows the principle of taking relations (JSONL data) and producing relations, enabling infinite composability.

Core Operations¶

Selection and Filtering¶

Operation	Symbol	Purpose	Example
select	σ	Filter rows based on conditions	`ja select 'age > 30'`
distinct	δ	Remove duplicate rows	`ja distinct`

Projection and Transformation¶

Operation	Symbol	Purpose	Example
project	π	Select/compute columns	`ja project name,total=price*qty`
rename	ρ	Rename fields	`ja rename old_name=new_name`

Set Operations¶

Operation	Symbol	Purpose	Example
union	∪	Combine all rows from relations	`ja union file1.jsonl file2.jsonl`
intersection	∩	Keep only rows in both relations	`ja intersection file1.jsonl file2.jsonl`
difference	-	Keep rows in left but not right	`ja difference file1.jsonl file2.jsonl`

Join Operations¶

Operation	Symbol	Purpose	Example
join	⋈	Inner join on condition	`ja join users.jsonl orders.jsonl --on id=user_id`
product	×	Cartesian product	`ja product features.jsonl options.jsonl`

Grouping and Aggregation¶

Operation	Symbol	Purpose	Example
groupby	γ	Group rows (chainable)	`ja groupby department`
agg	γ	Aggregate data	`ja agg count,total=sum(amount)`

Ordering¶

Operation	Symbol	Purpose	Example
sort	τ	Sort rows	`ja sort name,age --desc`

Quick Reference¶

Selection (Filter Rows)¶

# Basic filtering
ja select 'age > 30' people.jsonl

# Complex conditions
ja select 'status == "active" and score >= 80' users.jsonl

# Nested field filtering
ja select 'user.location.country == "US"' data.jsonl

# Using group metadata
ja groupby department data.jsonl | ja select '_group_size > 10'

Projection (Select/Transform Columns)¶

# Select specific fields
ja project name,email people.jsonl

# Compute new fields
ja project name,total=price*quantity orders.jsonl

# Nested field selection
ja project user.name,user.email,order.total data.jsonl

# Flatten nested structures
ja project user.name,user.email --flatten data.jsonl

Aggregation¶

# Simple aggregation
ja agg count,total=sum(amount) orders.jsonl

# After grouping
ja groupby department employees.jsonl | ja agg avg_salary=avg(salary)

# Chained grouping
ja groupby region sales.jsonl | ja groupby product | ja agg revenue=sum(amount)

# Conditional aggregation
ja agg shipped=count_if(status=="shipped"),total=sum(amount) orders.jsonl

Joins¶

# Inner join
ja join users.jsonl orders.jsonl --on id=user_id

# Join with nested fields
ja join customers.jsonl orders.jsonl --on customer.id=customer_id

# Multiple files in pipeline
cat orders.jsonl | ja join customers.jsonl --on customer_id=id

Set Operations¶

# Combine files
ja union current.jsonl historical.jsonl

# Find common records
ja intersection whitelist.jsonl data.jsonl

# Find differences
ja difference all_users.jsonl inactive_users.jsonl

Advanced Features¶

Nested Data Support¶

All operations support dot notation for nested fields:

# Works with any level of nesting
ja select 'user.profile.preferences.notifications == true'
ja project user.profile.name,user.profile.email
ja groupby user.location.country
ja join file1.jsonl file2.jsonl --on user.id=customer.id

Expression Language¶

Rich expression support in select and project:

# Arithmetic
ja project name,annual_salary=monthly_salary*12

# String operations  
ja select 'name.startswith("A")'

# Comparisons
ja select 'created_date > "2024-01-01"'

# Logical operations
ja select 'age >= 18 and status == "active"'

Aggregation Functions¶

Function	Purpose	Example
`count`	Count rows	`count`
`sum(field)`	Sum numeric values	`sum(amount)`
`avg(field)`	Average of values	`avg(score)`
`min(field)`	Minimum value	`min(date)`
`max(field)`	Maximum value	`max(price)`
`list(field)`	Collect values in list	`list(tag)`
`first(field)`	First value	`first(name)`
`last(field)`	Last value	`last(status)`

Conditional Aggregations¶

Function	Purpose	Example
`count_if(condition)`	Count matching rows	`count_if(status=="paid")`
`sum_if(field, condition)`	Sum matching values	`sum_if(amount, region=="North")`
`avg_if(field, condition)`	Average matching values	`avg_if(score, active==true)`

Composition Examples¶

Building Complex Pipelines¶

# E-commerce analysis pipeline
cat orders.jsonl \
  | ja select 'status == "completed"' \
  | ja join products.jsonl --on product_id=id \
  | ja join customers.jsonl --on customer_id=id \
  | ja project \
      customer.name, \
      product.category, \
      total=quantity*price, \
      month=date[0:7] \
  | ja groupby customer.tier \
  | ja groupby product.category \
  | ja agg \
      revenue=sum(total), \
      orders=count, \
      avg_order=avg(total)

Data Quality Pipeline¶

# Clean and validate data
cat raw_data.jsonl \
  | ja select 'email != null and age > 0' \
  | ja project \
      name, \
      email=email.lower(), \
      age, \
      valid=email.contains("@") \
  | ja select 'valid == true' \
  | ja distinct \
  | ja sort name

Time Series Analysis¶

# Analyze user engagement over time
cat events.jsonl \
  | ja project \
      user_id, \
      event_type, \
      date=timestamp[0:10] \
  | ja groupby date \
  | ja groupby event_type \
  | ja agg \
      events=count, \
      unique_users=count_distinct(user_id) \
  | ja sort date

Performance Considerations¶

Optimization Tips¶

Filter Early: Use select before expensive operations
Project Only Needed Fields: Reduce data size with project
Use Direct Aggregation: Use --agg for simple grouping
Order Operations: Put most selective filters first

# Optimized pipeline
cat large_file.jsonl \
  | ja select 'timestamp > "2024-01-01"' \
  | ja project user_id,amount,category \
  | ja groupby category \
  | ja agg total=sum(amount)

Memory Efficiency¶

All operations are streaming by design:

Process arbitrarily large files
Constant memory usage (except for operations requiring sorting/grouping)
Natural integration with Unix pipes

Error Handling¶

Common Issues¶

Invalid Expressions:

# Check expression syntax
ja select 'age > thirty'  # Error: 'thirty' not defined
ja select 'age > 30'      # Correct

Missing Fields:

# Handle missing fields gracefully
ja select 'field != null'  # Only rows where field exists

Type Mismatches:

# Ensure consistent types
ja select 'amount > 0'     # Assumes amount is numeric

Next Steps¶

Deep Dive: Selection - Advanced filtering techniques
Deep Dive: Grouping - Master chained groupby
Deep Dive: Joins - Complex join patterns
Cookbook - Real-world examples