Skip to main content

DagShell: A Content-Addressable Virtual Filesystem

DagShell implements a complete virtual filesystem using a content-addressable directed acyclic graph (DAG) structure. Unlike traditional filesystems that organize data by location, DagShell organizes by content—identical files automatically share storage through SHA256 hashing.

The implementation includes 583 tests with 77% code coverage.

Why a Virtual Filesystem?

Sometimes you need filesystem semantics without touching the actual disk:

  • Testing: Simulate complex directory structures without cleanup
  • Sandboxing: Run untrusted code against a virtual filesystem
  • Versioning: Track filesystem state changes over time
  • Portability: Serialize entire directory trees to JSON

The DAG Structure

Traditional filesystems use trees: each file has exactly one parent directory. DagShell uses a DAG where content is stored once and referenced by hash:

/project/
├── src/
│   └── main.py  ──────┐
├── backup/            │
│   └── main.py  ──────┼──▶ [SHA256: abc123...] → "print('hello')"
└── archive/           │
    └── main.py  ──────┘

Three files, one storage block. Automatic deduplication.

Fluent Python API

DagShell provides a chainable API that mirrors shell commands:

from dagshell.dagshell_fluent import DagShell

shell = DagShell()

# Create project structure
(shell
    .mkdir("/project/src")
    .mkdir("/project/docs")
    .cd("/project/src")
    .echo("def main(): pass").out("main.py")
    .echo("# My Project").out("../docs/README.md"))

# Navigate with directory stack
shell.pushd("/tmp")
shell.touch("scratch.txt")
shell.popd()  # Back to /project/src

# Save entire filesystem to JSON
shell.save("project_snapshot.json")

Terminal Emulator

For interactive exploration, DagShell includes a terminal:

python -m dagshell.terminal

dagshell:/$ mkdir /home/user
dagshell:/$ cd /home/user
dagshell:/home/user$ echo "Hello" > greeting.txt
dagshell:/home/user$ cat greeting.txt
Hello
dagshell:/home/user$ ls -la
total 1
drwxr-xr-x  2 user user  4096 Aug 15 10:00 .
drwxr-xr-x  3 user user  4096 Aug 15 10:00 ..
-rw-r--r--  1 user user     6 Aug 15 10:00 greeting.txt

Virtual Devices

Standard Unix special files work as expected:

shell.echo("garbage").out("/dev/null")  # Discarded
random_bytes = shell.cat("/dev/random")  # Random data
zeros = shell.head("/dev/zero", 100)     # 100 zero bytes

Import/Export

Move files between real and virtual filesystems:

# Import from real filesystem
shell.import_file("/real/path/data.csv", "/virtual/data.csv")

# Export to real filesystem
shell.export_file("/virtual/results.json", "/real/path/results.json")

# Import entire directory
shell.import_dir("/real/project", "/virtual/project")

Persistence

The entire filesystem state serializes to JSON:

# Save state
shell.save("filesystem.json")

# Load state (creates new shell with same structure)
restored = DagShell.load("filesystem.json")

# Or get JSON directly
state = shell.to_json()

The JSON format is human-readable:

{
  "root": {
    "type": "directory",
    "children": {
      "project": {
        "type": "directory",
        "children": {
          "README.md": {
            "type": "file",
            "content_hash": "abc123..."
          }
        }
      }
    }
  },
  "content_store": {
    "abc123...": "# My Project\n..."
  }
}

Scheme DSL

For Lisp enthusiasts, DagShell includes a Scheme interface:

(mkdir "/project")
(cd "/project")
(echo "Hello" "greeting.txt")
(define files (ls))

Use Cases

  • Build Systems: Track input/output files without disk I/O
  • Testing Frameworks: Create fixture filesystems programmatically
  • Documentation: Generate example directory structures
  • Backup Tools: Represent filesystem snapshots efficiently
  • Educational: Teach filesystem concepts without system access

Installation

pip install dagshell
# Or from source
pip install -e .

Resources


DagShell: When you need a filesystem, but not the disk.

Discussion