Getting Started with JAF¶
This guide will walk you through installing JAF and using its core features.
Installation¶
From PyPI¶
From Source¶
Development Installation¶
Your First JAF Pipeline¶
Let's start with a simple example. Suppose you have a file users.jsonl
with user data:
{"name": "Alice", "age": 30, "role": "admin", "active": true}
{"name": "Bob", "age": 25, "role": "user", "active": true}
{"name": "Charlie", "age": 35, "role": "user", "active": false}
{"name": "Diana", "age": 28, "role": "admin", "active": true}
Using the CLI¶
Basic Filtering¶
# Find all active users (outputs a stream descriptor)
jaf filter users.jsonl '["eq?", "@active", true]'
# To see the actual data, use --eval
jaf filter users.jsonl '["eq?", "@active", true]' --eval
Chaining Operations¶
# Find active admins and get their names
jaf filter users.jsonl '["and", ["eq?", "@active", true], ["eq?", "@role", "admin"]]' | \
jaf map - "@name" | \
jaf eval -
Using the Stream Command¶
The stream
command evaluates by default:
# Same as above but in one command
jaf stream users.jsonl \
--filter '["and", ["eq?", "@active", true], ["eq?", "@role", "admin"]]' \
--map "@name"
Using the Python API¶
from jaf import stream
# Load the data
users = stream("users.jsonl")
# Find active admins
active_admins = users.filter(["and",
["eq?", "@active", True],
["eq?", "@role", "admin"]
])
# Get their names
admin_names = active_admins.map("@name")
# Execute the pipeline
for name in admin_names.evaluate():
print(f"Admin: {name}")
Understanding Lazy Evaluation¶
JAF uses lazy evaluation, meaning operations don't execute until you explicitly request results:
# This creates a pipeline but doesn't read any data
pipeline = stream("huge_file.jsonl") \
.filter(["gt?", "@score", 90]) \
.map(["dict", "id", "@id", "score", "@score"]) \
.take(10)
# Data is only read when we evaluate
results = list(pipeline.evaluate()) # Reads just enough to get 10 matches
Working with Different Data Sources¶
Files¶
# JSON array file
s1 = stream("data.json")
# JSONL (newline-delimited JSON)
s2 = stream("data.jsonl")
# Gzipped files
s3 = stream("data.jsonl.gz")
Directories¶
# Process all JSON/JSONL files in a directory
s = stream({
"type": "directory",
"path": "/path/to/data",
"recursive": True # Include subdirectories
})
In-Memory Data¶
data = [
{"id": 1, "value": 100},
{"id": 2, "value": 200}
]
s = stream({"type": "memory", "data": data})
Standard Input¶
# From command line
echo '[{"x": 1}, {"x": 2}]' | jaf filter - '["gt?", "@x", 1]' --eval
# In Python
s = stream({"type": "stdin"})
Basic Query Patterns¶
Simple Comparisons¶
# Equality
["eq?", "@status", "active"]
# Greater than
["gt?", "@age", 25]
# Contains (for arrays/strings)
["contains?", "@tags", "python"]
# Exists
["exists?", "@email"]
Boolean Logic¶
# AND - all conditions must be true
["and",
["gt?", "@age", 18],
["eq?", "@verified", True]
]
# OR - at least one condition must be true
["or",
["eq?", "@role", "admin"],
["eq?", "@role", "moderator"]
]
# NOT - negation
["not", ["eq?", "@status", "deleted"]]
Working with Nested Data¶
# Access nested fields
["eq?", "@address.city", "New York"]
# Check array elements
["contains?", "@skills", "Python"]
# Wildcard access
["any", ["eq?", "@orders.*.status", "pending"]]
Common Operations¶
Filtering¶
Mapping/Transformation¶
# Extract specific fields
names = users.map("@name")
# Create new structure
summaries = users.map(["dict",
"name", "@name",
"age_group", ["if", ["gt?", "@age", 30], "senior", "junior"]
])
Limiting Results¶
# Take first 10
first_ten = users.take(10)
# Skip first 5, then take 10
paginated = users.skip(5).take(10)
# Take while condition is true
young_users = users.take_while(["lt?", "@age", 30])
Batching¶
# Process in batches of 100
batches = users.batch(100)
for batch in batches.evaluate():
# batch is a list of up to 100 items
process_batch(batch)
Error Handling¶
JAF distinguishes between two types of errors:
- Query Errors: Invalid queries fail immediately
- Item Errors: Errors processing individual items are logged but don't stop the stream
try:
# This fails immediately - invalid operator
result = stream("data.jsonl").filter(["invalid-op", "@x"])
except UnknownOperatorError as e:
print(f"Query error: {e}")
# Item errors are handled gracefully
pipeline = stream("mixed_data.jsonl").map(["div", "@value", "@divisor"])
for result in pipeline.evaluate():
# Items with divisor=0 are skipped with a warning
print(result)
Next Steps¶
- Learn about Advanced Filtering with the full query language
- Master the Fluent API for complex pipelines
- Explore Boolean Operations for combining filters
- See practical examples in the Cookbook