Tutorial Quick Reference¶

Quick command reference for the Distributed Data Pipeline Tutorial.

Basic Commands¶

Submit Job¶

tar czf job.tar.gz <files>
curl -X POST http://localhost:8443/submit \
  -F "files=@job.tar.gz" \
  -F "manifest=$(cat job.json)"

Check Status¶

curl http://localhost:8443/status/<job_id> | jq

Get Logs¶

curl http://localhost:8443/logs/<job_id>

Download Output¶

curl http://localhost:8443/outputs/<job_id>/<filename> -o <filename>

Job Manifest Templates¶

Basic Python Job¶

{
  "entrypoint": "main.py",
  "interpreter": "python3"
}

With Dependencies¶

{
  "entrypoint": "main.py",
  "interpreter": "python3",
  "requirements": "requirements.txt",
  "outputs": ["results/", "*.png"]
}

With Pre-Built Environment¶

{
  "entrypoint": "main.py",
  "interpreter": "python3",
  "environment": "ml-basic",
  "outputs": ["report.txt"]
}

Full Configuration¶

{
  "entrypoint": "main.py",
  "interpreter": "python3",
  "environment": "ml-basic",
  "args": ["--input", "data.csv"],
  "outputs": ["results/", "*.json"],
  "timeout": 600,
  "memory_mb": 1024
}

Pool Commands¶

Check Pool Status¶

curl http://localhost:9000/pool | jq

Submit to Pool¶

curl -X POST http://localhost:9000/submit \
  -F "files=@job.tar.gz" \
  -F "manifest=$(cat job.json)"

Get Pool Job Status¶

curl http://localhost:9000/status/<pool_job_id> | jq

WebSocket Streaming¶

Python Client¶

import asyncio
import websockets

async def stream_logs(job_id):
    uri = f"ws://localhost:8443/logs/{job_id}/stream"
    async with websockets.connect(uri) as ws:
        async for message in ws:
            print(message, end='')

asyncio.run(stream_logs("job-abc123"))

JavaScript Client¶

const ws = new WebSocket('ws://localhost:8443/logs/job-abc123/stream');
ws.onmessage = (event) => console.log(event.data);

Worker Setup¶

Generate Key¶

sudo ./build/sandrun --generate-key /etc/sandrun/worker.pem

Start Worker¶

sudo ./build/sandrun --port 8443 --worker-key /etc/sandrun/worker.pem

Start Pool Coordinator¶

python coordinator.py --port 9000 --workers workers.json

Useful Aliases¶

Add these to your ~/.bashrc:

# Sandrun aliases
alias sandrun-submit='curl -X POST http://localhost:8443/submit'
alias sandrun-status='curl -s http://localhost:8443/status/'
alias sandrun-logs='curl http://localhost:8443/logs/'

# With jq formatting
alias ss='sandrun-status "$1" | jq'
alias sl='sandrun-logs "$1"'

# Pool aliases
alias pool-status='curl -s http://localhost:9000/pool | jq'
alias pool-submit='curl -X POST http://localhost:9000/submit'

Common Patterns¶

Extract Job ID¶

RESPONSE=$(curl -s -X POST http://localhost:8443/submit ...)
JOB_ID=$(echo $RESPONSE | jq -r '.job_id')

Wait for Completion¶

while true; do
    STATUS=$(curl -s http://localhost:8443/status/$JOB_ID | jq -r '.status')
    [ "$STATUS" = "completed" ] && break
    sleep 2
done

Batch Submit¶

for file in data/*.csv; do
    # Create job for each file
    tar czf "job_$file.tar.gz" "$file" process.py job.json
    curl -X POST http://localhost:8443/submit \
        -F "files=@job_$file.tar.gz" \
        -F "manifest=$(cat job.json)"
done

Environments¶

Name	Packages	Use Case
`ml-basic`	NumPy, Pandas, Scikit-learn	Traditional ML
`vision`	PyTorch, OpenCV, Pillow	Computer vision
`nlp`	PyTorch, Transformers	NLP/LLMs
`data-science`	Pandas, Matplotlib, Jupyter	Data analysis
`scientific`	NumPy, SciPy, SymPy	Scientific computing

Troubleshooting¶

Check Server Health¶

curl http://localhost:8443/health

Check Quota¶

curl http://localhost:8443/stats | jq '.your_quota'

Verify Worker Identity¶

curl http://localhost:8443/health | jq -r '.worker_id'

List Available Environments¶

curl http://localhost:8443/environments | jq

Performance Tips¶

Use pre-built environments - 20x faster than pip install
Specify outputs - Only download what you need
Increase memory - For large datasets (up to 2048MB)
Distribute work - Use pool for parallel processing
Stream logs - Monitor long-running jobs in real-time

Security¶

Verify Result Signature¶

from cryptography.hazmat.primitives.asymmetric import ed25519
import base64, json

def verify(pubkey_b64, signed_data, signature_b64):
    pubkey_bytes = base64.b64decode(pubkey_b64)
    pubkey = ed25519.Ed25519PublicKey.from_public_bytes(pubkey_bytes)
    signature = base64.b64decode(signature_b64)
    message = json.dumps(signed_data, sort_keys=True).encode()
    pubkey.verify(signature, message)

Resources¶

Full Tutorial - Complete walkthrough
API Reference - Endpoint documentation
Job Manifest - Configuration options
Troubleshooting - Common issues

Print this page for quick reference during the tutorial!