Skip to content

Unix Philosophy & Pipes

FuzzyInfer embraces the Unix philosophy: small, composable tools that do one thing well and work together through text streams.

The Unix Way

Core Principles

  1. Do One Thing Well: FuzzyInfer does fuzzy inference, nothing more
  2. Text Streams: JSONL format for universal compatibility
  3. Composable: Chain with any tool that speaks JSON
  4. No Surprises: Predictable behavior, standard I/O

Why JSONL?

JSON Lines (JSONL) is perfect for streaming:

  • One object per line: Easy to process line-by-line
  • Streaming-friendly: No need to load entire file
  • Self-describing: Each line is valid JSON
  • Tool-compatible: Works with jq, grep, awk, etc.

Example JSONL:

{"type":"fact","pred":"is-bird","args":["robin"],"deg":0.9}
{"type":"fact","pred":"is-bird","args":["eagle"],"deg":1.0}
{"type":"rule","name":"birds-fly","cond":[{"pred":"is-bird","args":["?x"]}],"actions":[{"action":"add","fact":{"pred":"can-fly","args":["?x"],"deg":0.9}}]}

Basic Pipelines

stdin → fuzzy-infer → stdout

# Input facts, run inference, output results
cat facts.jsonl | fuzzy-infer run rules.jsonl

Multi-stage Pipeline

# Facts → Inference → Query → Format
cat facts.jsonl | \
  fuzzy-infer run rules.jsonl | \
  fuzzy-infer query "can-fly" | \
  jq -r '.args[0]'

Parallel Execution

# Run inference on multiple files in parallel
ls facts_*.jsonl | \
  xargs -P 4 -I {} bash -c "fuzzy-infer run rules.jsonl {} > results_{}"

Integration Patterns

With jq (JSON Processing)

Filter Results

# Get high-confidence facts
fuzzy-infer facts < kb.jsonl | \
  jq 'select(.deg > 0.9)'

Transform Data

# Convert to different format
fuzzy-infer query "is-mammal" < kb.jsonl | \
  jq '{animal: .args[0], confidence: .deg}'

Aggregate Results

# Count facts by predicate
fuzzy-infer facts < kb.jsonl | \
  jq -r '.pred' | sort | uniq -c | sort -rn

Complex Queries

# Get top 5 high-confidence mammals
fuzzy-infer query "is-mammal" < kb.jsonl | \
  jq -s 'sort_by(.deg) | reverse | .[0:5]'

With grep (Pattern Matching)

Find Facts

# Find all bird-related facts
fuzzy-infer facts < kb.jsonl | grep '"pred":"is-bird"'

# Case-insensitive search
fuzzy-infer facts < kb.jsonl | grep -i "mammal"

Filter Rules

# Find rules mentioning specific predicate
grep '"pred":"temperature"' rules.jsonl

Exclude Patterns

# Remove low-confidence facts
fuzzy-infer facts < kb.jsonl | grep -v '"deg":0\.[0-5]'

With awk (Text Processing)

Extract Fields

# Extract predicates
fuzzy-infer facts < kb.jsonl | \
  awk -F'"pred":"' '{print $2}' | awk -F'"' '{print $1}'

Calculate Statistics

# Average degree
fuzzy-infer facts < kb.jsonl | \
  awk -F'"deg":' '{sum+=$2} END {print "Average:", sum/NR}'

Conditional Processing

# Process based on degree
fuzzy-infer facts < kb.jsonl | \
  awk -F'"deg":' '$2 > 0.8 {print}'

With sed (Stream Editing)

Modify Degrees

# Boost all degrees by 10% (demo only - use Python for real math)
fuzzy-infer facts < kb.jsonl | \
  sed 's/"deg":\([0-9.]*\)/"deg":\1*1.1/g'

Replace Predicates

# Rename predicate (be careful!)
sed 's/"pred":"old-name"/"pred":"new-name"/g' kb.jsonl

With sort & uniq

Unique Predicates

# List all predicates
fuzzy-infer facts < kb.jsonl | \
  jq -r '.pred' | sort -u

Count Duplicates

# Count how many times each animal appears
fuzzy-infer facts < kb.jsonl | \
  jq -r '.args[0]' | sort | uniq -c | sort -rn

Sort by Degree

# Sort facts by confidence (requires jq)
fuzzy-infer facts < kb.jsonl | \
  jq -s 'sort_by(.deg) | reverse | .[]'

Real-World Pipelines

IoT Sensor Processing

#!/bin/bash
# Process streaming sensor data

# Tail sensor log → Convert to JSONL → Inference → Alerts
tail -f /var/log/sensors.log | \
  ./log_to_jsonl.py | \
  fuzzy-infer run sensor_rules.jsonl | \
  fuzzy-infer query "alert" --min-degree 0.8 | \
  while IFS= read -r alert; do
    sensor=$(echo "$alert" | jq -r '.args[0]')
    confidence=$(echo "$alert" | jq -r '.deg')
    echo "[$(date)] ALERT: Sensor $sensor (confidence: $confidence)"
    # Send notification
    ./send_alert.sh "$sensor" "$confidence"
  done

Log Analysis

#!/bin/bash
# Analyze application logs for anomalies

# Parse logs → Extract features → Classify → Report
cat app.log | \
  ./parse_log.py | \
  fuzzy-infer run anomaly_rules.jsonl | \
  fuzzy-infer query "anomaly" --min-degree 0.7 | \
  jq -r '"[\(.args[1])] Anomaly in \(.args[0]): \(.deg * 100)%"' | \
  tee anomalies.txt | \
  mail -s "Anomaly Report" ops@example.com

Data Classification

#!/bin/bash
# Classify and categorize documents

# For each document
for doc in documents/*.txt; do
  # Extract features → Classify → Format output
  ./extract_features.py "$doc" | \
    fuzzy-infer run classification_rules.jsonl | \
    fuzzy-infer query "category" | \
    jq -s --arg file "$doc" \
      'map({file: $file, category: .args[1], confidence: .deg}) |
       sort_by(.confidence) | reverse | .[0]'
done | jq -s '.' > classification_results.json

# Summarize results
jq 'group_by(.category) |
    map({category: .[0].category, count: length})' \
  classification_results.json

Knowledge Base Merging

#!/bin/bash
# Merge multiple knowledge bases

# Combine sources → Deduplicate → Run inference → Validate
cat kb1.jsonl kb2.jsonl kb3.jsonl | \
  jq -s 'unique_by(.pred, .args) | .[]' | \
  fuzzy-infer run consolidation_rules.jsonl | \
  fuzzy-infer validate --strict | \
  tee merged_kb.jsonl | \
  fuzzy-infer facts --format json | \
  jq -s 'length' | \
  xargs -I {} echo "Merged KB contains {} facts"

Streaming Analytics

#!/bin/bash
# Real-time analytics pipeline

# Stream data → Window → Aggregate → Inference → Dashboard
mkfifo /tmp/stream

# Producer: Generate events
./event_generator.py > /tmp/stream &

# Consumer: Process stream
cat /tmp/stream | \
  ./window.py --size 100 | \
  ./aggregate.py | \
  fuzzy-infer run analytics_rules.jsonl | \
  fuzzy-infer query "metric" | \
  ./dashboard_update.py

Advanced Patterns

Conditional Execution

# Run inference only if conditions met
if fuzzy-infer facts < kb.jsonl | jq 'select(.pred=="ready")' | grep -q "ready"; then
  fuzzy-infer run complex_rules.jsonl kb.jsonl
else
  echo "Prerequisites not met"
fi

Error Handling

# Robust pipeline with error handling
fuzzy-infer validate input.jsonl && \
  fuzzy-infer run rules.jsonl input.jsonl 2>errors.log | \
  fuzzy-infer facts --format json > results.json || \
  (echo "Pipeline failed" && cat errors.log && exit 1)

Parallel Processing

# Split → Process in parallel → Merge
split -l 1000 huge_facts.jsonl chunk_
ls chunk_* | \
  parallel -j 8 "fuzzy-infer run rules.jsonl {} > {}.result"
cat chunk_*.result > final_results.jsonl
rm chunk_*

Incremental Updates

# Process only new facts
comm -13 <(sort old_kb.jsonl) <(sort new_kb.jsonl) | \
  fuzzy-infer run rules.jsonl | \
  cat old_results.jsonl - | \
  jq -s 'unique_by(.pred, .args) | .[]' > updated_results.jsonl

Monitoring & Metrics

# Monitor inference performance
while true; do
  start=$(date +%s)
  fuzzy-infer run rules.jsonl facts.jsonl > /dev/null
  end=$(date +%s)
  duration=$((end - start))
  echo "[$(date)] Inference took ${duration}s" >> performance.log
  sleep 60
done

Integration with Other Tools

Database Export

# PostgreSQL → JSONL → Inference → Import back
psql -t -A -F"," -c "SELECT * FROM observations" | \
  ./csv_to_jsonl.py | \
  fuzzy-infer run inference_rules.jsonl | \
  ./jsonl_to_csv.py | \
  psql -c "COPY inferences FROM STDIN CSV"

API Integration

# Fetch from API → Inference → POST results
curl -s https://api.example.com/sensors | \
  jq -c '.[]' | \
  fuzzy-infer run rules.jsonl | \
  fuzzy-infer query "action" --min-degree 0.8 | \
  while IFS= read -r action; do
    curl -X POST https://api.example.com/actions \
      -H "Content-Type: application/json" \
      -d "$action"
  done

Message Queue Processing

# Consume from Kafka → Inference → Produce to output topic
kafka-console-consumer --topic input | \
  fuzzy-infer run rules.jsonl | \
  kafka-console-producer --topic output

Cloud Storage

# S3 → Inference → S3
aws s3 cp s3://bucket/facts.jsonl - | \
  fuzzy-infer run rules.jsonl | \
  aws s3 cp - s3://bucket/results.jsonl

Performance Optimization

Streaming vs. Batch

# Streaming: Process as data arrives
tail -f stream.jsonl | fuzzy-infer run rules.jsonl

# Batch: Process all at once
fuzzy-infer run rules.jsonl large_dataset.jsonl

Filtering Early

# Good: Filter before inference
grep "high-priority" facts.jsonl | fuzzy-infer run rules.jsonl

# Less efficient: Filter after inference
fuzzy-infer run rules.jsonl facts.jsonl | grep "high-priority"

Compression

# Compress large outputs
fuzzy-infer run rules.jsonl facts.jsonl | gzip > results.jsonl.gz

# Process compressed inputs
zcat facts.jsonl.gz | fuzzy-infer run rules.jsonl

Caching

# Cache intermediate results
if [ ! -f inference_cache.jsonl ]; then
  fuzzy-infer run expensive_rules.jsonl facts.jsonl | \
    tee inference_cache.jsonl
else
  cat inference_cache.jsonl
fi | fuzzy-infer query "result"

Debugging Pipelines

Inspect Intermediate Stages

# Tee to inspect intermediate results
cat facts.jsonl | \
  tee facts_input.log | \
  fuzzy-infer run rules.jsonl | \
  tee inference_output.log | \
  fuzzy-infer query "result" | \
  tee query_output.log

Count Records at Each Stage

# Count facts at each stage
echo "Input: $(cat facts.jsonl | wc -l)"
echo "After inference: $(fuzzy-infer run rules.jsonl facts.jsonl | wc -l)"
echo "Query results: $(fuzzy-infer run rules.jsonl facts.jsonl | fuzzy-infer query "result" | wc -l)"

Validate Pipeline

# Validate each stage
cat facts.jsonl | fuzzy-infer validate --strict && \
  fuzzy-infer validate --strict rules.jsonl && \
  fuzzy-infer run rules.jsonl facts.jsonl | fuzzy-infer validate --strict

Best Practices

1. Validate Inputs

# Always validate before processing
fuzzy-infer validate input.jsonl || exit 1

2. Use Pipelines for Complex Workflows

# Break complex tasks into stages
cat data.jsonl | \
  stage1 | \
  stage2 | \
  stage3

3. Handle Errors Gracefully

# Check exit codes
if ! fuzzy-infer run rules.jsonl facts.jsonl > results.jsonl; then
  echo "Inference failed" >&2
  exit 1
fi

4. Log Important Steps

# Log with timestamps
echo "[$(date)] Starting inference" >> pipeline.log
fuzzy-infer run rules.jsonl facts.jsonl 2>> pipeline.log
echo "[$(date)] Inference complete" >> pipeline.log

5. Use Version Control

# Track rules and pipelines
git add rules.jsonl pipeline.sh
git commit -m "Update inference pipeline"

Shell Scripts

Complete Pipeline Script

#!/bin/bash
set -euo pipefail

# Configuration
RULES="rules.jsonl"
INPUT="${1:-facts.jsonl}"
OUTPUT="${2:-results.jsonl}"
LOG="pipeline.log"

# Functions
log() {
    echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG"
}

error() {
    echo "[ERROR] $*" >&2
    exit 1
}

# Validate inputs
log "Validating inputs"
fuzzy-infer validate "$RULES" || error "Invalid rules file"
fuzzy-infer validate "$INPUT" || error "Invalid facts file"

# Run inference
log "Running inference"
if ! fuzzy-infer run "$RULES" "$INPUT" > "$OUTPUT" 2>> "$LOG"; then
    error "Inference failed"
fi

# Generate report
log "Generating report"
TOTAL=$(fuzzy-infer facts < "$OUTPUT" | wc -l)
HIGH_CONF=$(fuzzy-infer facts < "$OUTPUT" | jq 'select(.deg > 0.9)' | wc -l)

log "Results: $TOTAL facts ($HIGH_CONF high-confidence)"
log "Pipeline complete"

Summary

  • FuzzyInfer follows Unix philosophy: simple, composable, text-based
  • JSONL format enables streaming and tool integration
  • Pipe with jq, grep, awk, sed for powerful workflows
  • Build complex pipelines by chaining simple operations
  • Real-world applications: IoT, log analysis, classification
  • Best practices: validate inputs, handle errors, log steps

Next: Examples - Practical code examples