DagShell implements a complete virtual filesystem using a content-addressable directed acyclic graph (DAG) structure. Unlike traditional filesystems that organize data by location, DagShell organizes by content—identical files automatically share storage through SHA256 hashing.
The implementation includes 583 tests with 77% code coverage.
Why a Virtual Filesystem?
Sometimes you need filesystem semantics without touching the actual disk:
- Testing: Simulate complex directory structures without cleanup
- Sandboxing: Run untrusted code against a virtual filesystem
- Versioning: Track filesystem state changes over time
- Portability: Serialize entire directory trees to JSON
The DAG Structure
Traditional filesystems use trees: each file has exactly one parent directory. DagShell uses a DAG where content is stored once and referenced by hash:
/project/
├── src/
│ └── main.py ──────┐
├── backup/ │
│ └── main.py ──────┼──▶ [SHA256: abc123...] → "print('hello')"
└── archive/ │
└── main.py ──────┘
Three files, one storage block. Automatic deduplication.
Fluent Python API
DagShell provides a chainable API that mirrors shell commands:
from dagshell.dagshell_fluent import DagShell
shell = DagShell()
# Create project structure
(shell
.mkdir("/project/src")
.mkdir("/project/docs")
.cd("/project/src")
.echo("def main(): pass").out("main.py")
.echo("# My Project").out("../docs/README.md"))
# Navigate with directory stack
shell.pushd("/tmp")
shell.touch("scratch.txt")
shell.popd() # Back to /project/src
# Save entire filesystem to JSON
shell.save("project_snapshot.json")
Terminal Emulator
For interactive exploration, DagShell includes a terminal:
python -m dagshell.terminal
dagshell:/$ mkdir /home/user
dagshell:/$ cd /home/user
dagshell:/home/user$ echo "Hello" > greeting.txt
dagshell:/home/user$ cat greeting.txt
Hello
dagshell:/home/user$ ls -la
total 1
drwxr-xr-x 2 user user 4096 Aug 15 10:00 .
drwxr-xr-x 3 user user 4096 Aug 15 10:00 ..
-rw-r--r-- 1 user user 6 Aug 15 10:00 greeting.txt
Virtual Devices
Standard Unix special files work as expected:
shell.echo("garbage").out("/dev/null") # Discarded
random_bytes = shell.cat("/dev/random") # Random data
zeros = shell.head("/dev/zero", 100) # 100 zero bytes
Import/Export
Move files between real and virtual filesystems:
# Import from real filesystem
shell.import_file("/real/path/data.csv", "/virtual/data.csv")
# Export to real filesystem
shell.export_file("/virtual/results.json", "/real/path/results.json")
# Import entire directory
shell.import_dir("/real/project", "/virtual/project")
Persistence
The entire filesystem state serializes to JSON:
# Save state
shell.save("filesystem.json")
# Load state (creates new shell with same structure)
restored = DagShell.load("filesystem.json")
# Or get JSON directly
state = shell.to_json()
The JSON format is human-readable:
{
"root": {
"type": "directory",
"children": {
"project": {
"type": "directory",
"children": {
"README.md": {
"type": "file",
"content_hash": "abc123..."
}
}
}
}
},
"content_store": {
"abc123...": "# My Project\n..."
}
}
Scheme DSL
For Lisp enthusiasts, DagShell includes a Scheme interface:
(mkdir "/project")
(cd "/project")
(echo "Hello" "greeting.txt")
(define files (ls))
Use Cases
- Build Systems: Track input/output files without disk I/O
- Testing Frameworks: Create fixture filesystems programmatically
- Documentation: Generate example directory structures
- Backup Tools: Represent filesystem snapshots efficiently
- Educational: Teach filesystem concepts without system access
Installation
pip install dagshell
# Or from source
pip install -e .
Resources
- GitHub: github.com/queelius/dagshell
- Paper: DagShell Technical Report
DagShell: When you need a filesystem, but not the disk.
Discussion