Pattern Matching

AlgoTree v1.0 introduces powerful pattern matching capabilities for finding and manipulating specific structural patterns within trees. The dot notation support is inspired by the dotsuite project, which provides a comprehensive algebra for querying and transforming nested data structures.

Basic Usage

The simplest way to use pattern matching is through the pattern_match function:

from AlgoTree import Node, pattern_match

# Create a tree
root = Node("root")
branch1 = root.add_child("branch1", type="container")
leaf1 = branch1.add_child("leaf1", type="data")
branch2 = root.add_child("branch2", type="container")

# Find all nodes named "branch1"
matches = pattern_match(root, "branch1")

# Find all nodes with type="container"
matches = pattern_match(root, {"attributes": {"type": "container"}})

Pattern Syntax

String Patterns

String patterns provide a concise way to express common matching scenarios:

# Match by name
pattern = "nodename"

# Match any single node (wildcard)
pattern = "*"

# Match any subtree (deep wildcard)
pattern = "**"

# Match with attributes
pattern = "node[type=container,size=10]"

# Match with specific children
pattern = "parent(child1, child2, *)"

Dictionary Patterns

Dictionary patterns provide more control:

pattern = {
    "name": "parent",
    "attributes": {"type": "container"},
    "children": [
        {"name": "child1"},
        {"name": "*"}  # Wildcard child
    ],
    "min_children": 2,
    "max_children": 5
}

Pattern Objects

For maximum flexibility, use Pattern objects directly:

from AlgoTree import Pattern

# Pattern with custom predicate
pattern = Pattern(
    predicate=lambda n: n.name.startswith("test_") and len(n.children) > 2
)

# Pattern with mixed constraints
pattern = Pattern(
    name="container",
    attributes={"type": "branch"},
    min_children=1
)

Advanced Matching

Wildcards

Wildcards allow flexible matching:

# Single wildcard - matches exactly one node
pattern = Pattern.from_string("parent(*, child2, *)")

# Deep wildcard - matches zero or more nodes
pattern = Pattern.from_string("root(**, target)")

# This matches:
# root -> target (direct child)
# root -> a -> target
# root -> a -> b -> c -> target

Match Types

Control how patterns match tree structures:

from AlgoTree import PatternMatcher, MatchType

# Exact matching - children must match exactly
matcher = PatternMatcher(MatchType.EXACT)
pattern = Pattern.from_string("parent(child1, child2)")
# Only matches if parent has exactly child1 and child2 in that order

# Partial matching (default) - pattern children must exist
matcher = PatternMatcher(MatchType.PARTIAL)
# Matches if parent contains child1 and child2 among its children

Custom Predicates

Use custom functions for complex matching logic:

def is_large_branch(node):
    return (node.payload.get("type") == "branch" and
            len(node.children) > 5 and
            sum(1 for c in node.children if c.is_leaf) > 3)

pattern = Pattern(predicate=is_large_branch)
matches = pattern_match(tree, pattern)

Fluent API Integration

Pattern matching integrates seamlessly with the fluent API:

from AlgoTree import TreeBuilder, FluentNode

# Build a tree
tree = (TreeBuilder()
    .root("app")
    .child("models")
        .child("user.py", type="file")
        .sibling("post.py", type="file")
        .up()
    .sibling("views")
        .child("index.py", type="file")
        .sibling("admin.py", type="file")
    .build())

# Find all Python files
(FluentNode(tree)
    .match("*[type=file]")
    .each(lambda n: print(f"Found: {n.name}")))

# Replace all admin files with secured versions
(FluentNode(tree)
    .replace_matches(
        "admin.py",
        lambda n: Node("admin_secured.py", **n.payload)
    ))

Pattern Replacement

Replace nodes matching patterns:

from AlgoTree import PatternMatcher, Pattern

matcher = PatternMatcher()
pattern = Pattern(name="old_node")

# Replace with a fixed node
new_node = Node("new_node", value=42)
count = matcher.replace(tree, pattern, new_node)

# Replace with a transformation function
def transform(old_node):
    return Node(
        name=f"{old_node.name}_v2",
        value=old_node.payload.get("value", 0) * 2
    )

count = matcher.replace(tree, pattern, transform)
print(f"Replaced {count} nodes")

Real-World Examples

Finding Test Files

# Find all test files in a project structure
pattern = Pattern.from_string("*[name.endswith('_test.py')]")
test_files = pattern_match(project_tree, pattern)

Refactoring Components

# Find React components with specific structure
pattern = {
    "attributes": {"type": "component"},
    "children": [
        {"name": "render"},
        {"name": "state"}
    ]
}

# Convert to functional components
def modernize_component(old_component):
    new_comp = Node(
        name=old_component.name,
        type="functional_component",
        hooks=["useState", "useEffect"]
    )
    # Copy non-lifecycle methods
    for child in old_component.children:
        if child.name not in ["render", "componentDidMount"]:
            new_comp.add_child(child.name, **child.payload)
    return new_comp

matcher = PatternMatcher()
matcher.replace(component_tree, Pattern.from_dict(pattern), modernize_component)

Analyzing Tree Structure

# Find all deeply nested nodes
deep_pattern = Pattern(
    predicate=lambda n: len(list(n.get_path())) > 5
)

deep_nodes = pattern_match(tree, deep_pattern)

# Find unbalanced branches
def is_unbalanced(node):
    if len(node.children) < 2:
        return False
    heights = [n.height for n in node.children]
    return max(heights) - min(heights) > 2

unbalanced = pattern_match(tree, Pattern(predicate=is_unbalanced))

Dot Notation (dotsuite-inspired)

AlgoTree implements dot notation for tree navigation inspired by the dotsuite project’s approach to querying nested data structures.

Basic Dot Paths

from AlgoTree import dotmatch, dotexists, dotpluck

# Navigate paths like object properties
users = dotmatch(tree, "app.models.user")

# Check if path exists
if dotexists(tree, "app.config.database"):
    db_config = dotmatch(tree, "app.config.database")[0]

# Extract values from paths
values = dotpluck(tree, "app.version", "app.config.port")

Advanced Path Features

# Wildcards - match any single node
dotmatch(tree, "app.*.config")  # Any child's config

# Deep wildcards - match at any depth
dotmatch(tree, "app.**.test_*")  # All test files anywhere

# Attribute filters
dotmatch(tree, "app.*[type=model]")  # Children with type=model

# Predicate filters (JSONPath-like)
dotmatch(tree, "app.**[?(@.size > 100)]")  # Nodes with size > 100

# Array indexing and slicing
dotmatch(tree, "app.children[0]")  # First child
dotmatch(tree, "app.children[-1]")  # Last child
dotmatch(tree, "app.children[1:3]")  # Slice of children

# Regex matching
dotmatch(tree, "app.~test_.*")  # Regex pattern on names

# Fuzzy matching
dotmatch(tree, "app.%usr:0.8")  # Fuzzy match "usr" with 80% threshold

Filter Expressions

from AlgoTree import dotfilter

# Complex filter expressions
nodes = dotfilter(tree, "@.size > 100 and @.type == 'model'")

# Using functions
nodes = dotfilter(tree, "contains(@.name, 'test') and @.children.length > 0")

# Checking properties
leaves = dotfilter(tree, "@.is_leaf")
branches = dotfilter(tree, "not @.is_leaf and @.children.length > 2")

dotsuite Philosophy

Following dotsuite’s design principles:

  1. Depth - Address data with paths (dotmatch, dotexists)

  2. Truth - Query and validate (dotfilter, dotexists)

  3. Shape - Transform structures (future: dotmod, dotpipe)

Each operation answers exactly one question and composes cleanly with others, providing a minimal yet complete algebra for tree manipulation.

Command-Line Usage

The jt tool supports pattern matching:

# Find nodes matching a pattern
jt tree.json --match "container(*)"

# Filter by pattern and transform
jt tree.json --match "*[type=test]" --map "{'status': 'pending'}"

# Replace matching nodes
jt tree.json --replace-pattern "old_*" --with "new_node"

Performance Considerations

  • Pattern matching is optimized for common cases (name and attribute matching)

  • Deep wildcards can be expensive on large trees - use specific patterns when possible

  • Custom predicates are called for every node - keep them efficient

  • Consider using indices for frequently searched attributes in large trees

See Also