Getting Started¶
This guide will walk you through installing Sandrun, running your first job, and understanding the core workflow.
Estimated Time
5-10 minutes to install and run your first job
Prerequisites¶
Before installing Sandrun, ensure your system meets these requirements:
System Requirements¶
| Requirement | Specification |
|---|---|
| Operating System | Linux (Ubuntu 20.04+, Debian 11+, or equivalent) |
| Kernel Version | 4.6+ (for namespace support) |
| RAM | 2GB minimum (4GB+ recommended) |
| Disk Space | 500MB for build artifacts |
| Root Access | Required for namespace creation |
Check Your System¶
# Verify kernel version (should be 4.6+)
uname -r
# Check if namespaces are supported
ls /proc/self/ns/
# Verify seccomp support
cat /proc/sys/kernel/seccomp # Should output: 2
Root Permissions Required
Sandrun requires root privileges to create Linux namespaces. You'll need to run it with sudo or grant the CAP_SYS_ADMIN capability.
Installation¶
1. Install Dependencies¶
2. Clone and Build¶
# Clone repository
git clone https://github.com/yourusername/sandrun.git
cd sandrun
# Configure build
cmake -B build -DCMAKE_BUILD_TYPE=Release
# Build (use -j for parallel compilation)
cmake --build build -j$(nproc)
# Verify build
./build/sandrun --help
Build Output
If successful, you should see:
3. Run Server¶
Expected Output:
[INFO] Sandrun v1.0.0 starting...
[INFO] Initializing sandbox environment
[INFO] Loading environment templates
[INFO] Server listening on http://0.0.0.0:8443
[INFO] Press Ctrl+C to stop
Running Without Sudo
For production deployments, you can grant specific capabilities instead of full root:
Your First Job¶
Simple Python Script¶
# Create script
cat > hello.py <<'EOF'
print("Hello from Sandrun!")
import sys
print(f"Python version: {sys.version}")
EOF
# Create manifest
cat > job.json <<'EOF'
{
"entrypoint": "hello.py",
"interpreter": "python3"
}
EOF
# Package files
tar czf job.tar.gz hello.py
# Submit job
curl -X POST http://localhost:8443/submit \
-F "files=@job.tar.gz" \
-F "manifest=$(cat job.json)"
# Response:
# {
# "job_id": "job-abc123def456",
# "status": "queued"
# }
Check Job Status¶
# Get status
curl http://localhost:8443/status/job-abc123def456
# Response:
# {
# "job_id": "job-abc123def456",
# "status": "completed",
# "execution_metadata": {
# "cpu_seconds": 0.05,
# "memory_peak_bytes": 12582912,
# "exit_code": 0
# },
# "output_files": {
# ...
# }
# }
Get Logs¶
# Get stdout
curl http://localhost:8443/logs/job-abc123def456
# Response:
# Hello from Sandrun!
# Python version: 3.10.12 (main, ...)
Multi-File Projects¶
Upload Entire Directory¶
# Create project
mkdir my-project
cd my-project
cat > main.py <<'EOF'
from utils import greet
greet("World")
EOF
cat > utils.py <<'EOF'
def greet(name):
print(f"Hello, {name}!")
EOF
cat > job.json <<'EOF'
{
"entrypoint": "main.py",
"interpreter": "python3",
"outputs": ["*.txt", "results/"]
}
EOF
# Package and submit
tar czf ../my-project.tar.gz .
cd ..
curl -X POST http://localhost:8443/submit \
-F "files=@my-project.tar.gz" \
-F "manifest=$(cat my-project/job.json)"
Using Environments¶
Sandrun supports pre-built environments with common packages:
cat > job.json <<'EOF'
{
"entrypoint": "ml_script.py",
"interpreter": "python3",
"environment": "ml-basic"
}
EOF
Available environments:
ml-basic- NumPy, SciPy, pandasvision- OpenCV, Pillow, scikit-imagenlp- NLTK, spaCy, transformersdata-science- Jupyter, matplotlib, seabornscientific- SymPy, NetworkX, statsmodels
List available environments:
Worker Identity (Optional)¶
Generate a worker identity for signed results:
# Generate key
sudo ./build/sandrun --generate-key /etc/sandrun/worker.pem
# Output:
# ✅ Saved worker key to: /etc/sandrun/worker.pem
# Worker ID: <base64-encoded-public-key>
# Start with identity
sudo ./build/sandrun --port 8443 --worker-key /etc/sandrun/worker.pem
Jobs executed by this worker will include:
{
"worker_metadata": {
"worker_id": "base64-encoded-public-key",
"signature": "base64-encoded-signature"
}
}
Rate Limits¶
Sandrun enforces IP-based rate limits:
- 10 CPU-seconds per minute per IP
- 512MB RAM per job
- 5 minute timeout per job
- 2 concurrent jobs per IP
Check your quota:
curl http://localhost:8443/stats
# Response:
# {
# "your_quota": {
# "used": 2.5,
# "limit": 10.0,
# "available": 7.5,
# "active_jobs": 1,
# "can_submit": true
# },
# "system": {
# "queue_length": 3,
# "active_jobs": 5
# }
# }
Understanding Rate Limits¶
Sandrun uses IP-based rate limiting to ensure fair resource sharing:
Default Limits¶
| Limit Type | Value | Window |
|---|---|---|
| CPU Quota | 10 CPU-seconds | Per minute |
| Concurrent Jobs | 2 jobs | At a time |
| Memory per Job | 512MB | Per job |
| Timeout | 5 minutes | Per job |
Check Your Quota¶
Response:
{
"your_quota": {
"used": 2.5,
"limit": 10.0,
"available": 7.5,
"active_jobs": 1,
"can_submit": true
},
"system": {
"queue_length": 3,
"active_jobs": 5
}
}
Quota Management
- Quota resets after 1 hour of inactivity
- CPU time is measured in actual CPU seconds, not wall-clock time
- Use different IP addresses for separate quotas (if needed for testing)
Troubleshooting¶
Permission Denied¶
Symptom:
Solutions:
Port Already in Use¶
Symptom:
Solutions:
# Option 1: Use a different port
sudo ./build/sandrun --port 9000
# Option 2: Find and kill the process using the port
sudo lsof -i :8443
sudo kill <PID>
# Option 3: Let the OS assign a port
sudo ./build/sandrun --port 0 # Uses random available port
Build Failures¶
Symptom:
Solution:
# Ensure all dependencies are installed
sudo apt-get install -y libseccomp-dev libcap-dev libssl-dev
# Clean and rebuild
rm -rf build
cmake -B build
cmake --build build
Job Failed to Execute¶
Symptom: Job status shows "status": "failed" with non-zero exit code.
Debugging Steps:
-
Check logs for errors:
-
Verify manifest syntax:
-
Check file permissions:
-
Test locally first:
Rate Limit Exceeded¶
Symptom:
{
"error": "Rate limit exceeded",
"reason": "CPU quota exhausted (10.2/10.0 seconds used)",
"retry_after": 45
}
Solutions:
- Wait for quota refresh (shown in
retry_afterfield) - Optimize your code to use less CPU time
- Split jobs into smaller chunks
- Use different IP (for testing only)
Cannot Access Server¶
Symptom:
Checklist:
-
Verify server is running:
-
Check if port is listening:
-
Check firewall rules:
-
Verify server logs:
Using Pre-built Environments¶
Sandrun includes pre-built Python environments with common packages for faster execution.
Example: ML Job with ml-basic Environment¶
# Create training script
cat > train.py <<'EOF'
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
print("Packages loaded successfully!")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")
# Create sample data
X = np.random.rand(100, 5)
y = np.random.randint(0, 2, 100)
# Train model
model = RandomForestClassifier()
model.fit(X, y)
print(f"Model trained! Accuracy: {model.score(X, y):.2f}")
EOF
# Create manifest with environment
cat > job.json <<'EOF'
{
"entrypoint": "train.py",
"interpreter": "python3",
"environment": "ml-basic"
}
EOF
# Package and submit
tar czf ml-job.tar.gz train.py
curl -X POST http://localhost:8443/submit \
-F "files=@ml-job.tar.gz" \
-F "manifest=$(cat job.json)"
Benefits: - ✅ No pip install time (packages pre-installed) - ✅ Reproducible versions - ✅ Cached across jobs for efficiency
Available Environments¶
| Environment | Packages | Use Case |
|---|---|---|
ml-basic | NumPy, Pandas, Scikit-learn, Matplotlib | Traditional ML |
vision | PyTorch, Torchvision, OpenCV, Pillow | Computer vision |
nlp | PyTorch, Transformers, Tokenizers | NLP/language models |
data-science | NumPy, Pandas, Matplotlib, Seaborn, Jupyter | Data analysis |
scientific | NumPy, SciPy, SymPy, Matplotlib | Scientific computing |
Check available environments:
Next Steps¶
Ready for More?
Check out our comprehensive Distributed Data Pipeline Tutorial that takes you from basics to building a production-ready distributed processing system with:
- Multi-file projects with dependencies
- Pre-built environments for faster execution
- Distributed computing across worker pools
- Real-time WebSocket monitoring
- Cryptographic result verification
Time: 45-60 minutes | Level: Beginner to Advanced
Additional Resources¶
- API Reference - Complete API documentation
- Job Manifest - Advanced job configuration
- Integrations - Pool coordinator, MCP server
- Architecture - Understand the internals