dotstar

Pattern matching with wildcards for nested data

Part of the Depth pillar, dotstar extends dotget with wildcard support, enabling you to search for multiple values matching a pattern.

Overview

While dotget retrieves a single value at an exact path, dotstar can find all values matching a pattern with wildcards, making it perfect for data exploration and bulk operations.

Wildcard Patterns

The * wildcard matches any key or index:

from depth.dotstar import search

data = {
    "users": [
        {"name": "Alice", "age": 30},
        {"name": "Bob", "age": 25},
        {"name": "Charlie", "age": 35}
    ]
}

# Get all user names
names = search(data, "users.*.name")
# ["Alice", "Bob", "Charlie"]

# Get all ages
ages = search(data, "users.*.age")  
# [30, 25, 35]

Multiple Wildcards

Use multiple wildcards to search deeper:

data = {
    "departments": {
        "engineering": {
            "teams": {
                "backend": {"members": 5},
                "frontend": {"members": 3}
            }
        },
        "sales": {
            "teams": {
                "inbound": {"members": 4},
                "outbound": {"members": 6}
            }
        }
    }
}

# Get all team member counts
counts = search(data, "departments.*.teams.*.members")
# [5, 3, 4, 6]

Dictionary Wildcards

Wildcards work with dictionary keys:

data = {
    "server1": {"status": "active", "cpu": 45},
    "server2": {"status": "active", "cpu": 67},
    "server3": {"status": "inactive", "cpu": 12}
}

# Get all server statuses
statuses = search(data, "*.status")
# ["active", "active", "inactive"]

# Get all CPU values
cpus = search(data, "*.cpu")
# [45, 67, 12]

Finding Paths with find_all

Get both paths and values:

from depth.dotstar import find_all

data = {
    "users": [
        {"name": "Alice", "role": "admin"},
        {"name": "Bob", "role": "user"}
    ]
}

# Get paths and values
results = find_all(data, "users.*.role")
# [
#   ("users.0.role", "admin"),
#   ("users.1.role", "user")
# ]

# Useful for updates
for path, value in results:
    if value == "admin":
        print(f"Admin found at: {path}")

Pattern Class

Build reusable patterns:

from depth.dotstar import Pattern

# Create reusable patterns
user_emails = Pattern("users.*.email")
user_names = Pattern("users.*.name")

# Apply to different datasets
emails1 = user_emails.search(dataset1)
emails2 = user_emails.search(dataset2)

# Compose patterns
admins = Pattern("users.*[role=admin]")  # Future feature

Real-World Examples

Extract All Email Addresses

organization = {
    "departments": [
        {
            "name": "Engineering",
            "employees": [
                {"name": "Alice", "email": "alice@example.com"},
                {"name": "Bob", "email": "bob@example.com"}
            ]
        },
        {
            "name": "Sales",
            "employees": [
                {"name": "Charlie", "email": "charlie@example.com"}
            ]
        }
    ]
}

# Get all employee emails
emails = search(organization, "departments.*.employees.*.email")
# ["alice@example.com", "bob@example.com", "charlie@example.com"]

Aggregate Metrics

metrics = {
    "services": {
        "api": {
            "endpoints": {
                "users": {"requests": 1000, "errors": 5},
                "posts": {"requests": 500, "errors": 2}
            }
        },
        "web": {
            "endpoints": {
                "home": {"requests": 5000, "errors": 10},
                "about": {"requests": 200, "errors": 0}
            }
        }
    }
}

# Get all request counts
total_requests = sum(search(metrics, "services.*.endpoints.*.requests"))
# 6700

# Get all error counts
total_errors = sum(search(metrics, "services.*.endpoints.*.errors"))
# 17

Find Configuration Values

config = {
    "environments": {
        "dev": {
            "database": {"host": "localhost", "port": 5432},
            "cache": {"host": "localhost", "port": 6379}
        },
        "prod": {
            "database": {"host": "db.prod.com", "port": 5432},
            "cache": {"host": "cache.prod.com", "port": 6379}
        }
    }
}

# Find all database hosts
db_hosts = search(config, "environments.*.database.host")
# ["localhost", "db.prod.com"]

# Find all ports
all_ports = search(config, "environments.*.*.port")
# [5432, 6379, 5432, 6379]

Data Validation

# Check if all required fields exist
products = [
    {"id": 1, "name": "Widget", "price": 9.99},
    {"id": 2, "name": "Gadget", "price": 19.99},
    {"id": 3, "name": "Gizmo"}  # Missing price!
]

prices = search({"products": products}, "products.*.price")
if len(prices) != len(products):
    print("Warning: Some products missing prices!")

Performance Considerations

  • Linear search: O(n) where n is total number of nodes
  • No indexing: Each search traverses the entire structure
  • Memory efficient: Returns list of values, not full paths
  • Use find_all: When you need paths for later updates

Comparison with dotget

Feature dotget dotstar
Purpose Single value Multiple values
Wildcards No Yes (*)
Return type Single value or None List of values
Performance O(depth) O(nodes)
Use case Known paths Pattern search

Advanced Patterns (Future)

Planned enhancements: - ** for recursive descent - [n:m] for slice notation - [?filter] for inline filtering - {key1,key2} for multiple specific keys

  • dotget - Simple exact path access
  • dotselect - Advanced selection with predicates
  • dotpath - Underlying path engine
  • dotmod - Modify values found by patterns