Integrations Overview¶

jsonl-algebra comes with powerful integrations that extend its capabilities beyond the core CLI and library. These integrations demonstrate real-world applications and provide ready-to-use tools for common data processing tasks.

Available Integrations¶

1. MCP Server - AI Assistant Integration¶

Model Context Protocol server for AI assistants and agentic coders

The MCP server exposes jsonl-algebra operations as structured tools that AI assistants (like Claude, ChatGPT, etc.) can use to manipulate JSONL files through natural language.

# Setup
pip install mcp
python -m integrations.mcp_server

Features:

9 specialized tools for JSONL manipulation
Natural language query interface
Automatic file discovery as resources
Multiple output formats (JSONL, JSON, table, summary)
Complex transformation pipelines
JMESPath expression support

Use Cases:

"Show me all users older than 25 from users.jsonl"
"Calculate average salary by department"
"Join orders and customers files on customer_id"
"Get statistics about the sales data"

Learn More →

2. Log Analyzer - Real-time Monitoring¶

Streaming log analysis with alerts and dashboards

Analyze log files in real-time with sliding windows, alert systems, and terminal visualization.

python integrations/log_analyzer.py /var/log/app.log --window 60 --threshold 10

Features:

Streaming log processing
Sliding window analysis
Error rate monitoring
Performance anomaly detection
Terminal dashboard
Customizable alert system
Pattern detection

Use Cases:

Monitor application error rates
Detect performance degradation
Real-time log filtering
Alert on specific patterns
Track request rates

Learn More →

3. Data Explorer - Interactive REPL¶

SQL-like interactive exploration for JSONL files

Explore JSONL files interactively with SQL-like syntax, tab completion, and data profiling.

python integrations/data_explorer.py data.jsonl

Features:

SQL-like query syntax
Tab completion for commands and fields
Data profiling and statistics
Export to multiple formats
Command history and recall
Visual result formatting
Schema inference

Example Session:

> SELECT name, age WHERE age > 25
> GROUP BY city AGGREGATE count, avg(age)
> PROFILE
> EXPORT results.csv

Use Cases:

Ad-hoc data exploration
Quick data quality checks
Interactive analysis
Export filtered datasets
Schema discovery

Learn More →

4. ML Pipeline - Machine Learning Integration¶

Feature engineering with scikit-learn integration

Use jsonl-algebra for ML data preprocessing and feature engineering.

from integrations.ml_pipeline import JSONLFeatureEngine

engine = JSONLFeatureEngine("training.jsonl")
engine.add_feature("age_squared", lambda r: r["age"] ** 2)
X, y = engine.prepare_features(["age", "age_squared"], "target")

Features:

Feature engineering with ja transformations
scikit-learn pipeline integration
Automated preprocessing
Model evaluation utilities
Cross-validation support
Feature selection helpers

Use Cases:

Clean training data
Engineer features from JSONL
Build ML pipelines
Preprocess for scikit-learn
Feature transformation

Learn More →

5. Composability Module - Functional Pipelines¶

Built into core library - functional programming patterns

The composability module provides Pipeline classes and functional operators for elegant data transformations.

from ja.compose import Pipeline, Select, Project, Sort

pipeline = (
    Pipeline()
    | Select("age > 25")
    | Project(["name", "email"])
    | Sort("name")
)

results = pipeline.run("data.jsonl")

Features:

Pipeline class for composition
Unix pipe operator support (|)
Lazy evaluation for large datasets
Functional helpers (compose, pipe)
Operation classes (Select, Project, Sort, etc.)
Chainable transformations

Use Cases:

Build reusable pipelines
Functional data processing
Lazy evaluation for big data
Elegant API usage
Complex transformations

Learn More →

Comparison Matrix¶

Integration	Use Case	Language	Interactive	AI-Ready
MCP Server	AI assistant integration	Python	No	✅ Yes
Log Analyzer	Real-time monitoring	Python	Yes (Dashboard)	No
Data Explorer	Ad-hoc exploration	Python	Yes (REPL)	No
ML Pipeline	Machine learning	Python	No	No
Composability	Library usage	Python	No	No

Installation¶

All Integrations¶

Install jsonl-algebra with all optional dependencies:

pip install jsonl-algebra[integrations]

Individual Integrations¶

MCP ServerLog AnalyzerData ExplorerML PipelineComposability

pip install mcp
./integrations/setup_mcp.sh

pip install rich  # For terminal UI
python integrations/log_analyzer.py

pip install prompt-toolkit rich
python integrations/data_explorer.py

pip install scikit-learn pandas
python -c "from integrations.ml_pipeline import JSONLFeatureEngine"

# Built-in - no extra install needed
pip install jsonl-algebra
python -c "from ja.compose import Pipeline"

Common Workflows¶

Workflow 1: AI-Powered Data Analysis¶

Use the MCP server with an AI assistant:

User: "Analyze the sales data and find top products by revenue"

AI: *Uses MCP server*
    1. jsonl_query: "SELECT * FROM sales.jsonl"
    2. jsonl_aggregate: group_by=['product'], agg={'revenue': 'sum'}
    3. jsonl_sort: by='revenue', reverse=true

Result: Top products with revenue displayed

Workflow 2: Real-time Monitoring → Alert → Analysis¶

# 1. Monitor logs in real-time
python integrations/log_analyzer.py /var/log/app.log --alert-threshold 10

# 2. When alert triggers, explore interactively
python integrations/data_explorer.py /var/log/app.log

# 3. Export filtered data for deeper analysis
> SELECT * WHERE level = 'ERROR' AND timestamp > '2025-10-27'
> EXPORT error_logs.jsonl

Workflow 3: ETL with ML Training¶

from ja.compose import Pipeline, Select, Project
from integrations.ml_pipeline import JSONLFeatureEngine

# 1. ETL pipeline
pipeline = (
    Pipeline()
    | Select("status == 'complete'")
    | Project(["user_id", "score", "timestamp"])
)

cleaned = pipeline.run("raw_data.jsonl")

# 2. Feature engineering
engine = JSONLFeatureEngine(cleaned)
engine.add_feature("score_squared", lambda r: r["score"] ** 2)
engine.add_feature("hour", lambda r: extract_hour(r["timestamp"]))

# 3. Prepare for ML
X, y = engine.prepare_features(["score", "score_squared", "hour"], "target")

# 4. Train model
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X, y)

Design Philosophy¶

All integrations follow these principles:

1. Unix Philosophy¶

Do one thing well, compose easily:

# Each tool focuses on one aspect
ja select ... | python integrations/log_analyzer.py -

2. Pythonic Patterns¶

Explicit, simple, readable:

# Clear, idiomatic Python
pipeline = Pipeline() | Select("x > 0") | Project(["name"])

3. Streaming First¶

Handle massive datasets efficiently:

# Processes data line-by-line
for record in pipeline.run_lazy("huge.jsonl"):
    process(record)

4. Real-World Utility¶

Solve actual problems elegantly:

# Real monitoring use case
python integrations/log_analyzer.py production.log --alert-email admin@example.com

5. Comprehensive Documentation¶

Examples and guides for everything:

Detailed README for each integration
Usage examples
API documentation
Tutorial walkthroughs

Creating Your Own Integration¶

Want to build a custom integration? Here's the pattern:

# integrations/my_custom_tool.py

from ja.core import read_jsonl, select, project
from typing import Iterator, Dict

class MyCustomTool:
    """
    My custom integration for jsonl-algebra.

    Does something specific and useful.
    """

    def __init__(self, input_file: str):
        self.input_file = input_file

    def process(self) -> Iterator[Dict]:
        """Process data using ja operations."""
        data = read_jsonl(self.input_file)

        # Use core operations
        filtered = select(data, "status == 'active'")
        projected = project(filtered, ["id", "name"])

        return projected

    def run(self):
        """Main entry point."""
        for record in self.process():
            # Do something with record
            print(record)

if __name__ == "__main__":
    import sys
    tool = MyCustomTool(sys.argv[1])
    tool.run()

Integration Checklist¶

Testing Integrations¶

Each integration has tests:

# Run all integration tests
pytest integrations/

# Run specific integration tests
pytest integrations/test_mcp_minimal.py
pytest integrations/test_log_analyzer.py

Test Coverage¶

pytest --cov=integrations integrations/

Dependencies¶

Core Dependencies¶

All integrations require:

Python 3.8+
jsonl-algebra

Optional Dependencies¶

Integration	Requires
MCP Server	`mcp`
Log Analyzer	`rich`
Data Explorer	`prompt-toolkit`, `rich`, `pandas` (optional)
ML Pipeline	`scikit-learn`, `pandas`

Installing All¶

# Everything in one command
pip install jsonl-algebra[integrations]

Performance Considerations¶

Memory Usage¶

Integration	Memory Pattern	Best For
MCP Server	Per-request	Small to medium datasets
Log Analyzer	Sliding window	Streaming data
Data Explorer	Buffered	Interactive exploration
ML Pipeline	Dataset-sized	Training data preparation
Composability	Configurable (lazy/eager)	Any size

Optimization Tips¶

Use lazy pipelines for large datasets
Filter early to reduce data size
Adjust cache sizes for your workload
Monitor memory with large aggregations
Stream when possible instead of buffering

Troubleshooting¶

Common Issues¶

Import Error

ModuleNotFoundError: No module named 'mcp'

Solution: pip install mcp

Memory Error with Large Files

MemoryError: Unable to allocate array

Solution: Use lazy evaluation or streaming mode

MCP Server Not Starting

Error: Cannot start MCP server

Solution: Check MCP SDK installation and configuration

Community Integrations¶

Have you built an integration? Share it!

Create a PR to add it to integrations/
Follow the integration checklist
Add documentation
Include tests

Next Steps¶

Explore each integration in detail:

MCP Server Guide - AI assistant integration
Log Analyzer Guide - Real-time monitoring
Data Explorer Guide - Interactive REPL
ML Pipeline Guide - Machine learning
Composability API - Functional patterns

Or try a tutorial: