Frequently Asked Questions (FAQ)¶
Common questions about jsonl-algebra and their answers.
General Questions¶
What is jsonl-algebra?¶
jsonl-algebra (command: ja) is a command-line tool and Python library for manipulating JSONL (JSON Lines) data using relational algebra operations like select, project, join, and groupby.
What is JSONL?¶
JSONL (JSON Lines) is a format where each line is a valid JSON object:
Unlike JSON arrays, JSONL files can be processed line-by-line, making them ideal for streaming and large datasets.
Do I need to know relational algebra to use ja?¶
No! While ja is based on relational algebra principles, you don't need mathematical knowledge to use it. The commands are intuitive:
select= filter rowsproject= choose columnsjoin= combine datasetsgroupby= aggregate data
What's the difference between ja and jq?¶
| Feature | ja | jq |
|---|---|---|
| Data format | JSONL (one object per line) | JSON (any structure) |
| Operations | Relational algebra | Query language |
| Learning curve | Low (SQL-like) | Medium (custom syntax) |
| Streaming | Built-in | Partial |
| Joins | Native support | Complex |
| Best for | Tabular data, logs, datasets | Tree transformations |
Use ja for: Filtering, joining, and aggregating structured data Use jq for: Complex JSON transformations and restructuring
Installation & Setup¶
How do I install jsonl-algebra?¶
See the Installation Guide for detailed instructions.
What Python version do I need?¶
Python 3.8 or higher is required.
Can I use ja without installing Python?¶
No, ja is a Python-based tool and requires Python to be installed. However, once Python is installed, setup is just one command: pip install jsonl-algebra.
How do I upgrade to the latest version?¶
Is ja available for Windows?¶
Yes! ja works on Windows, macOS, and Linux. On Windows, we recommend using WSL2 for the best experience, but it also works in PowerShell and Command Prompt.
Usage Questions¶
How do I filter rows in a JSONL file?¶
Use the select command:
How do I choose specific fields?¶
Use the project command:
How do I join two JSONL files?¶
Use the join command:
Can I pipe ja commands together?¶
Yes! That's the recommended way to build complex operations:
How do I save output to a file?¶
Use shell redirection:
Or use the --output flag:
How do I work with nested fields?¶
Use dot notation:
# Access nested field
ja project user.profile.email data.jsonl
# Filter on nested field
ja select 'user.age > 30' data.jsonl
Can I use ja with stdin?¶
Yes! ja reads from stdin when no file is specified:
cat data.jsonl | ja select 'x > 0'
echo '{"name": "Alice"}' | ja project name
curl https://api.example.com/data | ja select 'status == "active"'
Data Format Questions¶
What's the difference between .json and .jsonl files?¶
JSON (.json):
JSONL (.jsonl):
JSONL is better for: - Streaming (process line-by-line) - Appending (just add new lines) - Large files (constant memory usage) - Log files
Can ja work with regular JSON files?¶
ja is designed for JSONL, but you can convert:
# JSON array to JSONL
cat array.json | jq -c '.[]' > data.jsonl
# JSONL to JSON array
ja collect data.jsonl > array.json
How do I convert CSV to JSONL?¶
Use the import command:
How do I convert JSONL to CSV?¶
Use the export command:
Can ja handle nested JSON structures?¶
Yes! Use dot notation to access nested fields:
Performance Questions¶
Can ja handle large files?¶
Yes! ja uses streaming, so it can process files larger than available RAM. Memory usage is constant per operation.
How do I make operations faster?¶
-
Filter early - Reduce data size before expensive operations
-
Use specific commands - Don't use pipes unnecessarily
-
Limit output - Use
headfor sampling
Why is sort/groupby slower than select?¶
Some operations require seeing all data:
- Streaming (fast, constant memory):
select,project,rename - Buffering (slower, grows with data):
sort,distinct,groupby,join
Can I process multiple files in parallel?¶
Yes, using GNU parallel or xargs:
Expression & Syntax Questions¶
What operators can I use in expressions?¶
| Operator | Purpose | Example |
|---|---|---|
== |
Equal | status == "active" |
!= |
Not equal | role != "admin" |
> |
Greater than | age > 30 |
< |
Less than | price < 100 |
>= |
Greater or equal | score >= 90 |
<= |
Less or equal | count <= 10 |
and |
Logical AND | age > 18 and status == "active" |
or |
Logical OR | role == "admin" or role == "owner" |
How do I check for null values?¶
How do I use quotes in expressions?¶
Use different quote types:
# Outer single quotes, inner double quotes
ja select 'name == "Alice"' data.jsonl
# Or escape
ja select "name == \"Alice\"" data.jsonl
Can I use regular expressions?¶
Not directly in expressions, but you can use grep:
How do I compare strings?¶
Strings use lexicographic (dictionary) ordering:
Feature Questions¶
Does ja support aggregations?¶
Yes! Use groupby with --agg:
Available aggregations:
- count - Count rows
- sum:field - Sum values
- avg:field - Average
- min:field - Minimum
- max:field - Maximum
- list:field - Collect into array
Can I do left/right/outer joins?¶
Yes! Use the --left, --right, or --outer flags:
Is there an interactive mode?¶
Yes! Use the REPL:
Or try ja-shell for filesystem-like navigation:
Can I validate data schemas?¶
Yes! Infer and validate JSON schemas:
# Infer schema
ja schema infer data.jsonl > schema.json
# Validate data
ja schema validate schema.json new_data.jsonl
Troubleshooting¶
Command not found: ja¶
The installation directory may not be in your PATH. Try:
# Check if installed
pip show jsonl-algebra
# Add to PATH (Linux/Mac)
export PATH="$HOME/.local/bin:$PATH"
# Or use full path
python -m ja.cli select 'x > 0' data.jsonl
Invalid expression error¶
Check your expression syntax:
# Wrong - missing quotes around strings
ja select 'name == Alice' data.jsonl
# Right - strings need quotes
ja select 'name == "Alice"' data.jsonl
Memory error with large files¶
Some operations buffer data. Solutions:
- Filter first to reduce size
- Use sampling with
head - Split file into chunks
- Increase system memory
JSON decode error¶
Check that your file is valid JSONL:
# Validate each line
cat data.jsonl | python -m json.tool
# Find problematic lines
awk 'NR==1 || !system("echo " $0 " | python -m json.tool > /dev/null 2>&1")' data.jsonl
Output looks wrong¶
ja outputs JSONL by default (one object per line). For pretty printing:
# Pretty print with jq
ja select 'x > 0' data.jsonl | jq '.'
# Or convert to JSON array
ja collect data.jsonl | jq '.'
Advanced Usage¶
Can I use ja in scripts?¶
Yes! ja is designed for scripting:
#!/bin/bash
if ja select 'status == "active"' users.jsonl > active.jsonl; then
echo "Found $(wc -l < active.jsonl) active users"
else
echo "Error filtering users" >&2
exit 1
fi
How do I use ja programmatically in Python?¶
Import the library:
from ja.core import read_jsonl, select, project, join
users = read_jsonl("users.jsonl")
filtered = select(users, "age > 30")
projected = project(filtered, ["name", "email"])
for record in projected:
print(record)
Can I extend ja with custom operations?¶
Yes! You can:
- Use the Python API to create custom functions
- Build integrations (see Integrations)
- Contribute to the project
How do I process streaming data?¶
ja works with streaming inputs:
# Process logs in real-time
tail -f /var/log/app.log | ja select 'level == "ERROR"'
# From API stream
curl -N https://api.example.com/stream | ja project id,timestamp
Integration Questions¶
What is the MCP server?¶
The Model Context Protocol server lets AI assistants use ja operations. See MCP Integration.
Can I use ja with other tools?¶
Yes! ja works great with:
- jq - For complex JSON transformations
- awk/sed - For text processing
- grep - For pattern matching
- parallel - For parallel processing
- curl - For API data
- pandas - For data analysis
Does ja work with databases?¶
Not directly, but you can:
- Export database to JSONL
- Process with ja
- Import results back
Many databases support JSON export.
Contributing & Development¶
How can I contribute?¶
See the Contributing Guide for details:
- Report bugs
- Suggest features
- Submit pull requests
- Improve documentation
- Share use cases
Where is the source code?¶
On GitHub: github.com/queelius/jsonl-algebra
How do I run tests?¶
Is there a roadmap?¶
Check the GitHub issues and project boards for planned features.
Getting Help¶
Where can I get help?¶
- Read the documentation
- Check this FAQ
- Search GitHub issues
- Open a new issue
How do I report a bug?¶
Open an issue on GitHub with:
- ja version (
ja --version) - Python version
- Operating system
- Minimal example to reproduce
- Expected vs actual behavior
Can I request a feature?¶
Yes! Open a feature request on GitHub. Include:
- Use case description
- Example of desired behavior
- Why existing features don't work
Still Have Questions?¶
- Check the Tutorials for examples
- Read the CLI Reference for all commands
- Visit the GitHub Discussions
- Open an issue if you found a bug