Unix Philosophy & Pipes¶
FuzzyInfer embraces the Unix philosophy: small, composable tools that do one thing well and work together through text streams.
The Unix Way¶
Core Principles¶
- Do One Thing Well: FuzzyInfer does fuzzy inference, nothing more
- Text Streams: JSONL format for universal compatibility
- Composable: Chain with any tool that speaks JSON
- No Surprises: Predictable behavior, standard I/O
Why JSONL?¶
JSON Lines (JSONL) is perfect for streaming:
- One object per line: Easy to process line-by-line
- Streaming-friendly: No need to load entire file
- Self-describing: Each line is valid JSON
- Tool-compatible: Works with
jq,grep,awk, etc.
Example JSONL:
{"type":"fact","pred":"is-bird","args":["robin"],"deg":0.9}
{"type":"fact","pred":"is-bird","args":["eagle"],"deg":1.0}
{"type":"rule","name":"birds-fly","cond":[{"pred":"is-bird","args":["?x"]}],"actions":[{"action":"add","fact":{"pred":"can-fly","args":["?x"],"deg":0.9}}]}
Basic Pipelines¶
stdin → fuzzy-infer → stdout¶
Multi-stage Pipeline¶
# Facts → Inference → Query → Format
cat facts.jsonl | \
fuzzy-infer run rules.jsonl | \
fuzzy-infer query "can-fly" | \
jq -r '.args[0]'
Parallel Execution¶
# Run inference on multiple files in parallel
ls facts_*.jsonl | \
xargs -P 4 -I {} bash -c "fuzzy-infer run rules.jsonl {} > results_{}"
Integration Patterns¶
With jq (JSON Processing)¶
Filter Results¶
Transform Data¶
# Convert to different format
fuzzy-infer query "is-mammal" < kb.jsonl | \
jq '{animal: .args[0], confidence: .deg}'
Aggregate Results¶
# Count facts by predicate
fuzzy-infer facts < kb.jsonl | \
jq -r '.pred' | sort | uniq -c | sort -rn
Complex Queries¶
# Get top 5 high-confidence mammals
fuzzy-infer query "is-mammal" < kb.jsonl | \
jq -s 'sort_by(.deg) | reverse | .[0:5]'
With grep (Pattern Matching)¶
Find Facts¶
# Find all bird-related facts
fuzzy-infer facts < kb.jsonl | grep '"pred":"is-bird"'
# Case-insensitive search
fuzzy-infer facts < kb.jsonl | grep -i "mammal"
Filter Rules¶
Exclude Patterns¶
With awk (Text Processing)¶
Extract Fields¶
# Extract predicates
fuzzy-infer facts < kb.jsonl | \
awk -F'"pred":"' '{print $2}' | awk -F'"' '{print $1}'
Calculate Statistics¶
# Average degree
fuzzy-infer facts < kb.jsonl | \
awk -F'"deg":' '{sum+=$2} END {print "Average:", sum/NR}'
Conditional Processing¶
With sed (Stream Editing)¶
Modify Degrees¶
# Boost all degrees by 10% (demo only - use Python for real math)
fuzzy-infer facts < kb.jsonl | \
sed 's/"deg":\([0-9.]*\)/"deg":\1*1.1/g'
Replace Predicates¶
With sort & uniq¶
Unique Predicates¶
Count Duplicates¶
# Count how many times each animal appears
fuzzy-infer facts < kb.jsonl | \
jq -r '.args[0]' | sort | uniq -c | sort -rn
Sort by Degree¶
# Sort facts by confidence (requires jq)
fuzzy-infer facts < kb.jsonl | \
jq -s 'sort_by(.deg) | reverse | .[]'
Real-World Pipelines¶
IoT Sensor Processing¶
#!/bin/bash
# Process streaming sensor data
# Tail sensor log → Convert to JSONL → Inference → Alerts
tail -f /var/log/sensors.log | \
./log_to_jsonl.py | \
fuzzy-infer run sensor_rules.jsonl | \
fuzzy-infer query "alert" --min-degree 0.8 | \
while IFS= read -r alert; do
sensor=$(echo "$alert" | jq -r '.args[0]')
confidence=$(echo "$alert" | jq -r '.deg')
echo "[$(date)] ALERT: Sensor $sensor (confidence: $confidence)"
# Send notification
./send_alert.sh "$sensor" "$confidence"
done
Log Analysis¶
#!/bin/bash
# Analyze application logs for anomalies
# Parse logs → Extract features → Classify → Report
cat app.log | \
./parse_log.py | \
fuzzy-infer run anomaly_rules.jsonl | \
fuzzy-infer query "anomaly" --min-degree 0.7 | \
jq -r '"[\(.args[1])] Anomaly in \(.args[0]): \(.deg * 100)%"' | \
tee anomalies.txt | \
mail -s "Anomaly Report" ops@example.com
Data Classification¶
#!/bin/bash
# Classify and categorize documents
# For each document
for doc in documents/*.txt; do
# Extract features → Classify → Format output
./extract_features.py "$doc" | \
fuzzy-infer run classification_rules.jsonl | \
fuzzy-infer query "category" | \
jq -s --arg file "$doc" \
'map({file: $file, category: .args[1], confidence: .deg}) |
sort_by(.confidence) | reverse | .[0]'
done | jq -s '.' > classification_results.json
# Summarize results
jq 'group_by(.category) |
map({category: .[0].category, count: length})' \
classification_results.json
Knowledge Base Merging¶
#!/bin/bash
# Merge multiple knowledge bases
# Combine sources → Deduplicate → Run inference → Validate
cat kb1.jsonl kb2.jsonl kb3.jsonl | \
jq -s 'unique_by(.pred, .args) | .[]' | \
fuzzy-infer run consolidation_rules.jsonl | \
fuzzy-infer validate --strict | \
tee merged_kb.jsonl | \
fuzzy-infer facts --format json | \
jq -s 'length' | \
xargs -I {} echo "Merged KB contains {} facts"
Streaming Analytics¶
#!/bin/bash
# Real-time analytics pipeline
# Stream data → Window → Aggregate → Inference → Dashboard
mkfifo /tmp/stream
# Producer: Generate events
./event_generator.py > /tmp/stream &
# Consumer: Process stream
cat /tmp/stream | \
./window.py --size 100 | \
./aggregate.py | \
fuzzy-infer run analytics_rules.jsonl | \
fuzzy-infer query "metric" | \
./dashboard_update.py
Advanced Patterns¶
Conditional Execution¶
# Run inference only if conditions met
if fuzzy-infer facts < kb.jsonl | jq 'select(.pred=="ready")' | grep -q "ready"; then
fuzzy-infer run complex_rules.jsonl kb.jsonl
else
echo "Prerequisites not met"
fi
Error Handling¶
# Robust pipeline with error handling
fuzzy-infer validate input.jsonl && \
fuzzy-infer run rules.jsonl input.jsonl 2>errors.log | \
fuzzy-infer facts --format json > results.json || \
(echo "Pipeline failed" && cat errors.log && exit 1)
Parallel Processing¶
# Split → Process in parallel → Merge
split -l 1000 huge_facts.jsonl chunk_
ls chunk_* | \
parallel -j 8 "fuzzy-infer run rules.jsonl {} > {}.result"
cat chunk_*.result > final_results.jsonl
rm chunk_*
Incremental Updates¶
# Process only new facts
comm -13 <(sort old_kb.jsonl) <(sort new_kb.jsonl) | \
fuzzy-infer run rules.jsonl | \
cat old_results.jsonl - | \
jq -s 'unique_by(.pred, .args) | .[]' > updated_results.jsonl
Monitoring & Metrics¶
# Monitor inference performance
while true; do
start=$(date +%s)
fuzzy-infer run rules.jsonl facts.jsonl > /dev/null
end=$(date +%s)
duration=$((end - start))
echo "[$(date)] Inference took ${duration}s" >> performance.log
sleep 60
done
Integration with Other Tools¶
Database Export¶
# PostgreSQL → JSONL → Inference → Import back
psql -t -A -F"," -c "SELECT * FROM observations" | \
./csv_to_jsonl.py | \
fuzzy-infer run inference_rules.jsonl | \
./jsonl_to_csv.py | \
psql -c "COPY inferences FROM STDIN CSV"
API Integration¶
# Fetch from API → Inference → POST results
curl -s https://api.example.com/sensors | \
jq -c '.[]' | \
fuzzy-infer run rules.jsonl | \
fuzzy-infer query "action" --min-degree 0.8 | \
while IFS= read -r action; do
curl -X POST https://api.example.com/actions \
-H "Content-Type: application/json" \
-d "$action"
done
Message Queue Processing¶
# Consume from Kafka → Inference → Produce to output topic
kafka-console-consumer --topic input | \
fuzzy-infer run rules.jsonl | \
kafka-console-producer --topic output
Cloud Storage¶
# S3 → Inference → S3
aws s3 cp s3://bucket/facts.jsonl - | \
fuzzy-infer run rules.jsonl | \
aws s3 cp - s3://bucket/results.jsonl
Performance Optimization¶
Streaming vs. Batch¶
# Streaming: Process as data arrives
tail -f stream.jsonl | fuzzy-infer run rules.jsonl
# Batch: Process all at once
fuzzy-infer run rules.jsonl large_dataset.jsonl
Filtering Early¶
# Good: Filter before inference
grep "high-priority" facts.jsonl | fuzzy-infer run rules.jsonl
# Less efficient: Filter after inference
fuzzy-infer run rules.jsonl facts.jsonl | grep "high-priority"
Compression¶
# Compress large outputs
fuzzy-infer run rules.jsonl facts.jsonl | gzip > results.jsonl.gz
# Process compressed inputs
zcat facts.jsonl.gz | fuzzy-infer run rules.jsonl
Caching¶
# Cache intermediate results
if [ ! -f inference_cache.jsonl ]; then
fuzzy-infer run expensive_rules.jsonl facts.jsonl | \
tee inference_cache.jsonl
else
cat inference_cache.jsonl
fi | fuzzy-infer query "result"
Debugging Pipelines¶
Inspect Intermediate Stages¶
# Tee to inspect intermediate results
cat facts.jsonl | \
tee facts_input.log | \
fuzzy-infer run rules.jsonl | \
tee inference_output.log | \
fuzzy-infer query "result" | \
tee query_output.log
Count Records at Each Stage¶
# Count facts at each stage
echo "Input: $(cat facts.jsonl | wc -l)"
echo "After inference: $(fuzzy-infer run rules.jsonl facts.jsonl | wc -l)"
echo "Query results: $(fuzzy-infer run rules.jsonl facts.jsonl | fuzzy-infer query "result" | wc -l)"
Validate Pipeline¶
# Validate each stage
cat facts.jsonl | fuzzy-infer validate --strict && \
fuzzy-infer validate --strict rules.jsonl && \
fuzzy-infer run rules.jsonl facts.jsonl | fuzzy-infer validate --strict
Best Practices¶
1. Validate Inputs¶
2. Use Pipelines for Complex Workflows¶
3. Handle Errors Gracefully¶
# Check exit codes
if ! fuzzy-infer run rules.jsonl facts.jsonl > results.jsonl; then
echo "Inference failed" >&2
exit 1
fi
4. Log Important Steps¶
# Log with timestamps
echo "[$(date)] Starting inference" >> pipeline.log
fuzzy-infer run rules.jsonl facts.jsonl 2>> pipeline.log
echo "[$(date)] Inference complete" >> pipeline.log
5. Use Version Control¶
# Track rules and pipelines
git add rules.jsonl pipeline.sh
git commit -m "Update inference pipeline"
Shell Scripts¶
Complete Pipeline Script¶
#!/bin/bash
set -euo pipefail
# Configuration
RULES="rules.jsonl"
INPUT="${1:-facts.jsonl}"
OUTPUT="${2:-results.jsonl}"
LOG="pipeline.log"
# Functions
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG"
}
error() {
echo "[ERROR] $*" >&2
exit 1
}
# Validate inputs
log "Validating inputs"
fuzzy-infer validate "$RULES" || error "Invalid rules file"
fuzzy-infer validate "$INPUT" || error "Invalid facts file"
# Run inference
log "Running inference"
if ! fuzzy-infer run "$RULES" "$INPUT" > "$OUTPUT" 2>> "$LOG"; then
error "Inference failed"
fi
# Generate report
log "Generating report"
TOTAL=$(fuzzy-infer facts < "$OUTPUT" | wc -l)
HIGH_CONF=$(fuzzy-infer facts < "$OUTPUT" | jq 'select(.deg > 0.9)' | wc -l)
log "Results: $TOTAL facts ($HIGH_CONF high-confidence)"
log "Pipeline complete"
Summary¶
- FuzzyInfer follows Unix philosophy: simple, composable, text-based
- JSONL format enables streaming and tool integration
- Pipe with
jq,grep,awk,sedfor powerful workflows - Build complex pipelines by chaining simple operations
- Real-world applications: IoT, log analysis, classification
- Best practices: validate inputs, handle errors, log steps
Next: Examples - Practical code examples