FlatForest Notebook

Introduction

In this notebook, we explore the FlatForest data structure, which is a forest structure with a flat (non-nested) data structure. It is a dict with special methods to access the root nodes and other tree-like operations.

The main implementation detail is the proxy class FlatForestNode, which allows us to access the dict with a node-centric abstraction. This allows us to implement tree-like operations in a flat data structure. As a proxy, it also modifies the dict in place, so it is a mutable data structure.

Creating a FlatForest

Let’s load the required libraries and create a FlatForest data structure using the node interface.

from AlgoTree.flat_forest_node import FlatForestNode
from AlgoTree.flat_forest import FlatForest
from AlgoTree.tree_converter import TreeConverter
from IPython.display import display, Markdown
from AlgoTree.pretty_tree import pretty_tree
import json
#from AlgoTree.treenode import TreeNode
from copy import deepcopy


def monotext(txt):
    display(Markdown(f"<pre>{txt}</pre>"))

data = {
    "1": { "data": 1, "parent": None},
    "2": { "parent": "1", "data": 2},
    "3": { "parent": "1", "data": 3},
    "4": { "parent": "3", "data": 4},
    "5": { "parent": "3", "data": 5},
    "A": { "data": "Data for A", "parent": None },
    "B": { "parent": "A", "data": "Data for B" },
    "C": { "parent": "A", "data": "Data for C" },
    "D": { "parent": "C", "data": "Data for D" },
    "E": { "parent": "C", "data": "Data for E" },
    "F": { "parent": "E", "data": "Data for F" },
    "G": { "parent": "E", "data": "Data for G" },
    "H": { "parent": "B", "data": "Data for H" },
    "I": { "parent": "A", "data": "Data for I" },
    "J": { "parent": "I", "data": "Data for J" },
    "K": { "parent": "G", "data": "Data for K" },
    "L": { "parent": "G", "data": "Data for L" },
    "M": { "parent": "C", "data": "Data for M" },
}

forest = FlatForest()
nodes = []
for key, value in data.items():
    par_key = value.pop("parent", None)
    nodes.append(FlatForestNode(name=key, parent=par_key, forest=forest, data=value["data"]))

for node in nodes:
    try:
        print(node.name, node.payload, node.parent.name if node.parent is not None else None)
    except ValueError as e:
        print(f"ValueError: {e}")
    except KeyError as e:
        print(f"KeyError: {e}")
    print()

Output

1 {'data': 1} None

2 {'data': 2} 1

3 {'data': 3} 1

4 {'data': 4} 3

5 {'data': 5} 3

A {'data': 'Data for A'} None

B {'data': 'Data for B'} A

C {'data': 'Data for C'} A

D {'data': 'Data for D'} C

E {'data': 'Data for E'} C

F {'data': 'Data for F'} E

G {'data': 'Data for G'} E

H {'data': 'Data for H'} B

I {'data': 'Data for I'} A

J {'data': 'Data for J'} I

K {'data': 'Data for K'} G

L {'data': 'Data for L'} G

M {'data': 'Data for M'} C

Storing and Transmitting Trees

It’s easy to regenerate any JSON files that may have been used to generate the FlatForest object. So, JSON is a good format for storing and transmitting trees. And, of course, FlatForest is a dictionary.

Note that when we load a dictionary, the tree is providing a view of it in-place. So, if we modify the dictionary, we modify the tree, and vice versa.

Creating a FlatForest from a JSON

# load a forest from data
forest2 = FlatForest(deepcopy(data))
# load a forest from a forest
forest3 = FlatForest(forest)

print(forest == forest2)

print(json.dumps(dict(forest), indent=2, sort_keys=True) == json.dumps(dict(forest2), indent=2, sort_keys=True))
print(json.dumps(dict(forest), indent=2, sort_keys=True) == json.dumps(dict(forest3), indent=2, sort_keys=True))
print(json.dumps(dict(forest2), indent=2, sort_keys=True) == json.dumps(dict(forest3), indent=2, sort_keys=True))
False
False
True
False
forest.logical_root_names()
[None, None, '__DETACHED__']
print(forest.detached)
FlatForestNode(name=__DETACHED__, parent=None, payload={}, root=__DETACHED__, children=[])
from copy import deepcopy
new_forest = deepcopy(forest)
new_forest.detach("1")
FlatForestNode(name=1, parent=None, payload={'data': 1}, root=1, children=['2', '3'])

If we try to detach a node that was already detached, we get a KeyError. Note that this can happen in two ways:

  1. The node was detached and then we try to detach it again.

  2. The node was detached and then we try to detach a child of it.

Any child of a detached node is also detached, so we can’t detach a child of a detached node.

try:
    new_forest.detach("2")
except KeyError as e:
    print(f"KeyError: {e}")
forest.as_tree()
FlatForestNode(name=__ROOT__, parent=None, payload={}, root=__ROOT__, children=['1', 'A'])
forest.preferred_root = "1"
print(pretty_tree(forest.subtree()))
forest.preferred_root = "A"
print(pretty_tree(forest.subtree()))
print(pretty_tree(forest.subtree("C")))
1
├───── 2
└───── 3
       ├───── 4
       └───── 5

A
├───── B
│      └───── H
├───── C
│      ├───── D
│      ├───── E
│      │      ├───── F
│      │      └───── G
│      │             ├───── K
│      │             └───── L
│      └───── M
└───── I
       └───── J

C
├───── D
├───── E
│      ├───── F
│      └───── G
│             ├───── K
│             └───── L
└───── M

For visualizing trees, we can use the PrettyTree class and the pretty_tree function. The PrettyTree class is a simple tree data structure that can be used to visualize trees in pretty text format, optionally with the ability to mark nodes for highlighting.

monotext(pretty_tree(forest.subtree("A"), mark=["A", "G"], node_details=lambda node: node.payload['data']))
A ◄ Data for A 🔵
├───── B ◄ Data for B
│      └───── H ◄ Data for H
├───── C ◄ Data for C
│      ├───── D ◄ Data for D
│      ├───── E ◄ Data for E
│      │      ├───── F ◄ Data for F
│      │      └───── G ◄ Data for G 🟣
│      │             ├───── K ◄ Data for K
│      │             └───── L ◄ Data for L
│      └───── M ◄ Data for M
└───── I ◄ Data for I
       └───── J ◄ Data for J
monotext(pretty_tree(forest.subtree("A"), mark=["H", "D"]))
A
├───── B
│      └───── H 🟡
├───── C
│      ├───── D ⭕
│      ├───── E
│      │      ├───── F
│      │      └───── G
│      │             ├───── K
│      │             └───── L
│      └───── M
└───── I
       └───── J
from pprint import pprint
from AlgoTree import utils
pprint(utils.node_stats(forest.subtree("C")))
{'node_info': {'ancestors': [],
               'children': ['D', 'E', 'M'],
               'depth': 0,
               'descendants': ['D', 'E', 'F', 'G', 'K', 'L', 'M'],
               'is_internal': True,
               'is_leaf': False,
               'is_root': True,
               'leaves_under': ['D', 'F', 'K', 'L', 'M'],
               'name': 'C',
               'parent': None,
               'path': ['C'],
               'payload': {'data': 'Data for C'},
               'root_distance': 0,
               'siblings': [],
               'type': "<class 'AlgoTree.flat_forest_node.FlatForestNode'>"},
 'subtree_info': {'height': 3,
                  'leaves': ['D', 'F', 'K', 'L', 'M'],
                  'root': 'C',
                  'size': 8}}

The FlatForest class provides a view of a dict object as a forest. We do not modify the dict passed into it (and you can create a dict through the FlatForest API). Since it’s just a view of a dict we have all the normal operations on it that we would have on a dict object.

FlatForest also implements the concept of a node, which is a view of a particular node in our node-centric API. In order to do this, we specify a preferred root node, which by default is the first root node in the forest. This is the node that will be used as the root node in the FlatForestNode API. If you want to change the root node, you can do so by calling FlatForest.preferred_root with the name of the node you want to be the preferred root.

We also provide as as_tree method that unifies any dict object representing a flat forest structure into a flat forest structure with just a single root node, where all the root nodes are children of this root node. This is no longer a view, however, as we return a new dict object.

print(forest["C"])
C = forest.subtree("C")
print(C)
print(C["parent"])
print(C.children)
{'data': 'Data for C', 'parent': 'A'}
FlatForestNode(name=C, parent=None, payload={'data': 'Data for C'}, root=C, children=['D', 'E', 'M'])
A
[FlatForestNode(name=D, parent=C, payload={'data': 'Data for D'}, root=C, children=[]), FlatForestNode(name=E, parent=C, payload={'data': 'Data for E'}, root=C, children=['F', 'G']), FlatForestNode(name=M, parent=C, payload={'data': 'Data for M'}, root=C, children=[])]
N = forest.root.add_child(name="N", data="Data for N")
print(N)
forest.subtree("A").add_child(name="O", data="Data for O")
monotext(pretty_tree(forest.root.node("A"), mark=["O"]))
FlatForestNode(name=N, parent=A, payload={'data': 'Data for N'}, root=A, children=[])
A
├───── B
│      └───── H
├───── C
│      ├───── D
│      ├───── E
│      │      ├───── F
│      │      └───── G
│      │             ├───── K
│      │             └───── L
│      └───── M
├───── I
│      └───── J
├───── N
└───── O 🟡

If we try too add a non-unique node key to the tree, we will get a KeyError.

try:
    forest.subtree("A").add_child(name="B")
except KeyError as e:
    print(e)
'key already exists in the tree: B'

Let’s add some more nodes.

P = N.add_child(name="P", data="Data for P")
N.add_child(name="Q", data="Data for Q")
P.add_child(name="R", data="Data for R").add_child(
    name="S", data="Data for S"
)
monotext(pretty_tree(forest.root, mark=["N", "P", "Q", "R", "S"]))

print(forest.root.node("A"))
print(forest.root.node("A").parent)
A
├───── B
│      └───── H
├───── C
│      ├───── D
│      ├───── E
│      │      ├───── F
│      │      └───── G
│      │             ├───── K
│      │             └───── L
│      └───── M
├───── I
│      └───── J
├───── N ⭕
│      ├───── P 🟤
│      │      └───── R 🔵
│      │             └───── S 🟤
│      └───── Q 🔘
└───── O
FlatForestNode(name=A, parent=None, payload={'data': 'Data for A'}, root=A, children=['B', 'C', 'I', 'N', 'O'])
None
f_nodes = utils.breadth_first_undirected(forest.node("F"), 2)
print([n.name for n in f_nodes])
['F', 'E', 'G', 'C']
monotext(pretty_tree(utils.subtree_rooted_at(forest.node("C"), 2), mark=["C"]))
C ⭕
├───── D
├───── E
│      ├───── F
│      └───── G
└───── M
center_C = utils.subtree_centered_at(forest.node("C"), 2)
monotext(pretty_tree(center_C, mark=["C"]))
monotext(pretty_tree(forest.root, mark=["C"]))
A
├───── B
├───── C ⭕
│      ├───── D
│      ├───── E
│      │      ├───── F
│      │      └───── G
│      └───── M
├───── I
├───── N
└───── O
A
├───── B
│      └───── H
├───── C ⭕
│      ├───── D
│      ├───── E
│      │      ├───── F
│      │      └───── G
│      │             ├───── K
│      │             └───── L
│      └───── M
├───── I
│      └───── J
├───── N
│      ├───── P
│      │      └───── R
│      │             └───── S
│      └───── Q
└───── O

We also support conversions to and from any tree-like structure that supports the node-centric API, including FlatForest and a simple (but far more flexible) TreeNode class that we also provide for illustrative purposes.

The function is called TreeConverter.copy_under which accepts a source and target object, and copies the source object under the target object. The source is normally a node of some kind, and the target is another node, and the result is the tree structure under the source node is copied under the target node. The source node is not modified in any way.

from AlgoTree.treenode import TreeNode
treeNodeMe = TreeNode(name="treenode", payload={"data": "Data for treenode"})
treeNodeMe.add_child(name="child1", payload={"data": "Data for child1"})
treeNodeMe.add_child(name="child2", payload={"data": "Data for child2"})
tree1 = TreeConverter.copy_under(forest.subtree("C"), treeNodeMe.children[0])
print(pretty_tree(tree1.root, mark=["child1"]))
treenode
├───── child1 🟡
│      └───── C
│             ├───── D
│             ├───── E
│             │      ├───── F
│             │      └───── G
│             │             ├───── K
│             │             └───── L
│             └───── M
└───── child2
tree2 = TreeConverter.copy_under(tree1.root, forest.subtree("D"), node_name=lambda n: n.name + "_2")
monotext(pretty_tree(tree2.forest.root))
A
├───── B
│      └───── H
├───── C
│      ├───── D
│      │      └───── treenode_2
│      │             ├───── child1_2
│      │             │      └───── C_2
│      │             │             ├───── D_2
│      │             │             ├───── E_2
│      │             │             │      ├───── F_2
│      │             │             │      └───── G_2
│      │             │             │             ├───── K_2
│      │             │             │             └───── L_2
│      │             │             └───── M_2
│      │             └───── child2_2
│      ├───── E
│      │      ├───── F
│      │      └───── G
│      │             ├───── K
│      │             └───── L
│      └───── M
├───── I
│      └───── J
├───── N
│      ├───── P
│      │      └───── R
│      │             └───── S
│      └───── Q
└───── O

We can iterate over the items of the child and we can modify/delete its data.

for k, v in forest.subtree("N").items():
    print(k, "<--", v)

N["new_data"] = "Some new data for G"
print(N)

del N["new_data"]
N["other_new_data"] = "Some other data for G"
print(N)
data <-- Data for N
parent <-- A
FlatForestNode(name=N, parent=A, payload={'data': 'Data for N', 'new_data': 'Some new data for G'}, root=A, children=['P', 'Q'])
FlatForestNode(name=N, parent=A, payload={'data': 'Data for N', 'other_new_data': 'Some other data for G'}, root=A, children=['P', 'Q'])

Let’s create a tree from a dictionary that refers to a non-existent parent.

try:
    non_existent_parent_tree = FlatForest(
        {
            "A": {
                "parent": "non_existent_parent",
                "data": "Data for A",
            }
        }
    )
    FlatForest.check_valid(non_existent_parent_tree)
except KeyError as e:
    print(e)
"Parent 'non_existent_parent' not in forest for node 'A'"

We see that the node is disconnected from the logical root, since it refers to a non-existent parent.

try:
    cycle_tree = FlatForest(
        {
            "x": {"parent": None, "data": "Data for x"},
            "A": {"parent": "C", "data": "Data for A"},
            "B": {"parent": "A", "data": "Data for B"},
            "C": {"parent": "B", "data": "Data for C"},
            "D": {"parent": "x", "data": "Data for D"},
        }
    )

    monotext(pretty_tree(cycle_tree.root))
    FlatForest.check_valid(cycle_tree)
except ValueError as e:
    print(e)
x
└───── D
Cycle detected: {'C', 'A', 'B'}

We see that the tree was in an invalid state. In particular, nodes 1, 2, and 3 are from any root and in a cycle. We can fix this by breaking the cycle and setting the parent of node 3 to, for instance, to x. However, we can also fix it by setting the parent to None, so that it is a seperate tree in the forest.

cycle_tree["C"]["parent"] = None
FlatForest.check_valid(cycle_tree)
monotext(pretty_tree(cycle_tree.subtree("C"), mark=["C"]))
print(cycle_tree.root_names)
C ⭕
└───── A
       └───── B
['x', 'C', None, None, '__DETACHED__']

Let’s look at the tree again, and see about creating a cycle.

We will make node 1 the parent of node 5, to create a cycle:

try:
    new_tree = deepcopy(forest.root)
    new_tree.node("A")["parent"] = "E"
    FlatForest.check_valid(new_tree)
except ValueError as e:
    print(e)
Data is not a dictionary: data=FlatForestNode(name=A, parent=None, payload={'data': 'Data for A'}, root=A, children=['B', 'C', 'I', 'N', 'O'])

Notice that we use deepcopy to avoid modifying the original tree with these invalid operations. We chose to do it this way so as to not incur the overhead of reverting the tree to a valid state after an invalid operation. This way, we can keep the tree in an invalid state for as long as we want, and only revert it to a valid state when we want to.

Each node is a key-value pair in the FlatForest. We have the FlatForestNode so that we can have an API focused on the nodes and not the underlying dictionary. However, we stiill permit access to the underlying dictionary. When you modify the tree in this way, we still maintain the integrity of the tree.

Since the FlatForest represents nodes as key-value pairs, and the value may have a parent key, along with any other arbitrary data, each value for a node must be a dictionary.

Below, we see that trying to add a test node with a non-dictionary value generates an error.

try:
    error_tree = deepcopy(forest)
    # this will raise a ValueError because the node with key `test` maps to
    # string instead of a dict.
    error_tree["test"] = "Some test data"
    FlatForest.check_valid(error_tree)
except ValueError as e:
    print(e)
Node 'test' does not have a payload dictionary: 'Some test data'

Let’s manipulate the tree a bit more using the dict API. We’re just going to add a new_node with some data.

forest["T"] = {
    "parent": "B",
    "data": "Data for T"
}

print(forest.node("T"))
print(pretty_tree(forest.subtree("B"), mark=["T"]))
FlatForestNode(name=T, parent=B, payload={'data': 'Data for T'}, root=A, children=[])
B
├───── H
└───── T 🔵

Logical roots are not a part of the underlying dictionary, so we can’t access it through the dict API. It’s non-children data are also immutable through the FlatForestNode API. Right now, we use FlatForest.DETACHED_KEY as a logical root for detached nodes.

print(forest.detached)
FlatForestNode(name=__DETACHED__, parent=None, payload={}, root=__DETACHED__, children=[])

We see that there are no detached nodes in the forest right now.

try:
    forest.detached["data"] = "Some new data for root node"
except TypeError as e:
    print(e)

try:
    forest.detached["parent"] = None
except TypeError as e:
    print(e)
__DETACHED__ is an immutable logical root
__DETACHED__ is an immutable logical root

We can detach nodes. Let’s first view the full tree, pre-detachment.

monotext(pretty_tree(forest.root))
A
├───── B
│      ├───── H
│      └───── T
├───── C
│      ├───── D
│      │      └───── treenode_2
│      │             ├───── child1_2
│      │             │      └───── C_2
│      │             │             ├───── D_2
│      │             │             ├───── E_2
│      │             │             │      ├───── F_2
│      │             │             │      └───── G_2
│      │             │             │             ├───── K_2
│      │             │             │             └───── L_2
│      │             │             └───── M_2
│      │             └───── child2_2
│      ├───── E
│      │      ├───── F
│      │      └───── G
│      │             ├───── K
│      │             └───── L
│      └───── M
├───── I
│      └───── J
├───── N
│      ├───── P
│      │      └───── R
│      │             └───── S
│      └───── Q
└───── O
forest.node("D").detach()
forest.detach("G")
monotext(pretty_tree(forest.root))
A
├───── B
│      ├───── H
│      └───── T
├───── C
│      ├───── E
│      │      └───── F
│      └───── M
├───── I
│      └───── J
├───── N
│      ├───── P
│      │      └───── R
│      │             └───── S
│      └───── Q
└───── O

Let’s view the detached tree.

monotext(pretty_tree(forest.detached, mark=["B", "C"]))
__DETACHED__
├───── D
│      └───── treenode_2
│             ├───── child1_2
│             │      └───── C_2
│             │             ├───── D_2
│             │             ├───── E_2
│             │             │      ├───── F_2
│             │             │      └───── G_2
│             │             │             ├───── K_2
│             │             │             └───── L_2
│             │             └───── M_2
│             └───── child2_2
└───── G
       ├───── K
       └───── L

We can purge detached nodes (and their descendants) from the tree with the purge method. Let’s purge the detached nodes. Note that when we do this, through the node-centric API, nothing will seem different (unless we look at the tree rooted at the detached logical root). However, if we look at the underlying dictionary, we will see that the detached nodes are gone.

forest.purge()
print(json.dumps(forest, indent=2))
{
  "1": {
    "data": 1,
    "parent": null
  },
  "2": {
    "data": 2,
    "parent": "1"
  },
  "3": {
    "data": 3,
    "parent": "1"
  },
  "4": {
    "data": 4,
    "parent": "3"
  },
  "5": {
    "data": 5,
    "parent": "3"
  },
  "A": {
    "data": "Data for A",
    "parent": null
  },
  "B": {
    "data": "Data for B",
    "parent": "A"
  },
  "C": {
    "data": "Data for C",
    "parent": "A"
  },
  "E": {
    "data": "Data for E",
    "parent": "C"
  },
  "F": {
    "data": "Data for F",
    "parent": "E"
  },
  "H": {
    "data": "Data for H",
    "parent": "B"
  },
  "I": {
    "data": "Data for I",
    "parent": "A"
  },
  "J": {
    "data": "Data for J",
    "parent": "I"
  },
  "M": {
    "data": "Data for M",
    "parent": "C"
  },
  "N": {
    "data": "Data for N",
    "parent": "A",
    "other_new_data": "Some other data for G"
  },
  "O": {
    "data": "Data for O",
    "parent": "A"
  },
  "P": {
    "data": "Data for P",
    "parent": "N"
  },
  "Q": {
    "data": "Data for Q",
    "parent": "N"
  },
  "R": {
    "data": "Data for R",
    "parent": "P"
  },
  "S": {
    "data": "Data for S",
    "parent": "R"
  },
  "T": {
    "parent": "B",
    "data": "Data for T"
  }
}

We have a fairly complete API for manipulating the forest. Let’s explore some additional methods. Let’s first create a node itrator to node A, and then access or modify the payload data for node A. Since payload data is mutable, and it must be a dictionary, we can access or modify it using the dict API.

forest.node("A").clear()
forest.node("A")["new_data"] = "Some new data for A"
forest.node("A")["other_new_data"] = "Some other data for A"
print(forest["A"])
{'new_data': 'Some new data for A', 'other_new_data': 'Some other data for A'}

This is fairly self-expalanatory.

Let’s add some more nodes without specifying a key name for them, since often we don’t care about the key name and it’s only for bookkeeping purposes.

forest.root.add_child(whatever=3).add_child(
    name="U", whatever=4).add_child(whatever=5)
FlatForestNode(name=779cc759-36a9-4dae-a1c9-e021d65aa1d4, parent=U, payload={'whatever': 5}, root=A, children=[])
FlatForestNode(whatever=1000, parent=forest.root.children[0])
FlatForestNode(name="V", whatever=2000, parent=forest.root.children[0].children[1])
FlatForestNode(whatever=3000, more_data="yes", parent=forest.node("V"))
FlatForestNode(name="W", parent=forest.root, whatever=200)
FlatForestNode(name=W, parent=A, payload={'whatever': 200}, root=A, children=[])
forest.node("V").parent = forest.node("W")
monotext(pretty_tree(forest.root, mark=["U", "V", "W"], node_details=lambda n: n.payload))
A ◄ {'new_data': 'Some new data for A', 'other_new_data': 'Some other data for A'}
├───── B ◄ {'data': 'Data for B'}
│      ├───── H ◄ {'data': 'Data for H'}
│      ├───── T ◄ {'data': 'Data for T'}
│      └───── d433395a-27bc-457d-b6d9-c7034d020978 ◄ {'whatever': 1000}
├───── C ◄ {'data': 'Data for C'}
│      ├───── E ◄ {'data': 'Data for E'}
│      │      └───── F ◄ {'data': 'Data for F'}
│      └───── M ◄ {'data': 'Data for M'}
├───── I ◄ {'data': 'Data for I'}
│      └───── J ◄ {'data': 'Data for J'}
├───── N ◄ {'data': 'Data for N', 'other_new_data': 'Some other data for G'}
│      ├───── P ◄ {'data': 'Data for P'}
│      │      └───── R ◄ {'data': 'Data for R'}
│      │             └───── S ◄ {'data': 'Data for S'}
│      └───── Q ◄ {'data': 'Data for Q'}
├───── O ◄ {'data': 'Data for O'}
├───── 945ef525-fece-45f0-a499-b5449d28ef1e ◄ {'whatever': 3}
│      └───── U ◄ {'whatever': 4} 🔴
│             └───── 779cc759-36a9-4dae-a1c9-e021d65aa1d4 ◄ {'whatever': 5}
└───── W ◄ {'whatever': 200} ⚫
       └───── V ◄ {'whatever': 2000} 🔘
              └───── 14e057df-8004-47c3-9415-b7c7235ea4d8 ◄ {'whatever': 3000, 'more_data': 'yes'}

Let’s look at some tree conversions. We can convert between different tree representations and data structures.

new_tree = TreeConverter.convert(forest.root, TreeNode)
monotext(pretty_tree(new_tree, node_details=lambda n: n.payload))
A ◄ {'new_data': 'Some new data for A', 'other_new_data': 'Some other data for A'}
├───── B ◄ None
│      ├───── H ◄ None
│      ├───── T ◄ None
│      └───── d433395a-27bc-457d-b6d9-c7034d020978 ◄ None
├───── C ◄ None
│      ├───── E ◄ None
│      │      └───── F ◄ None
│      └───── M ◄ None
├───── I ◄ None
│      └───── J ◄ None
├───── N ◄ None
│      ├───── P ◄ None
│      │      └───── R ◄ None
│      │             └───── S ◄ None
│      └───── Q ◄ None
├───── O ◄ None
├───── 945ef525-fece-45f0-a499-b5449d28ef1e ◄ None
│      └───── U ◄ None
│             └───── 779cc759-36a9-4dae-a1c9-e021d65aa1d4 ◄ None
└───── W ◄ None
       └───── V ◄ None
              └───── 14e057df-8004-47c3-9415-b7c7235ea4d8 ◄ None

We see that it’s a different type of tree, a TreeNode, which is a recursive data structure. It models the same tree data, but in a different way. This one is also more flexible, so that it doesn’t require unique names or the payload data to be a dictionary - it can be any object or value. This simplicity comes at the cost of not being a dictionary (or view of a dictionary), as FlatForest does.

We see that it has a very different structure. However, when we pretty-print it using TreeViz, we see that it’s the same tree.

monotext(pretty_tree(forest.root))
monotext(pretty_tree(new_tree))
A
├───── B
│      ├───── H
│      ├───── T
│      └───── d433395a-27bc-457d-b6d9-c7034d020978
├───── C
│      ├───── E
│      │      └───── F
│      └───── M
├───── I
│      └───── J
├───── N
│      ├───── P
│      │      └───── R
│      │             └───── S
│      └───── Q
├───── O
├───── 945ef525-fece-45f0-a499-b5449d28ef1e
│      └───── U
│             └───── 779cc759-36a9-4dae-a1c9-e021d65aa1d4
└───── W
       └───── V
              └───── 14e057df-8004-47c3-9415-b7c7235ea4d8
A
├───── B
│      ├───── H
│      ├───── T
│      └───── d433395a-27bc-457d-b6d9-c7034d020978
├───── C
│      ├───── E
│      │      └───── F
│      └───── M
├───── I
│      └───── J
├───── N
│      ├───── P
│      │      └───── R
│      │             └───── S
│      └───── Q
├───── O
├───── 945ef525-fece-45f0-a499-b5449d28ef1e
│      └───── U
│             └───── 779cc759-36a9-4dae-a1c9-e021d65aa1d4
└───── W
       └───── V
              └───── 14e057df-8004-47c3-9415-b7c7235ea4d8
result = TreeConverter.copy_under(new_tree, FlatForestNode(name="new_root"))
monotext(pretty_tree(result))
result2 = TreeConverter.copy_under(result, new_tree)
monotext(pretty_tree(result2))
A
├───── B
│      ├───── H
│      ├───── T
│      └───── d433395a-27bc-457d-b6d9-c7034d020978
├───── C
│      ├───── E
│      │      └───── F
│      └───── M
├───── I
│      └───── J
├───── N
│      ├───── P
│      │      └───── R
│      │             └───── S
│      └───── Q
├───── O
├───── 945ef525-fece-45f0-a499-b5449d28ef1e
│      └───── U
│             └───── 779cc759-36a9-4dae-a1c9-e021d65aa1d4
└───── W
       └───── V
              └───── 14e057df-8004-47c3-9415-b7c7235ea4d8
A
├───── B
│      ├───── H
│      ├───── T
│      └───── d433395a-27bc-457d-b6d9-c7034d020978
├───── C
│      ├───── E
│      │      └───── F
│      └───── M
├───── I
│      └───── J
├───── N
│      ├───── P
│      │      └───── R
│      │             └───── S
│      └───── Q
├───── O
├───── 945ef525-fece-45f0-a499-b5449d28ef1e
│      └───── U
│             └───── 779cc759-36a9-4dae-a1c9-e021d65aa1d4
└───── W
       └───── V
              └───── 14e057df-8004-47c3-9415-b7c7235ea4d8

The TreeNode is a bit more useful for operations that require recursion, but any tree can support the sae operations. The TreeNode is a bit more specialized for this purpose, and the FlatTree is a bit more specialized for more general storage and manipulation of data that is tree-like, such as configuration data or log data. See TreeNode.md for more information on the TreeNode class.

root = TreeNode(name="root", payload= {"value":0}, parent=None)
A = TreeNode(name="A", payload={"value":1}, parent=root)
print(root.children)
[TreeNode(name=A, parent=root, root=root, payload={'value': 1}, len(children)=0)]
root2 = TreeNode(name="root", payload=0)
A2 = TreeNode(name="A", parent=root2, payload=1)
B2 = TreeNode(name="B", parent=root2, payload=2)
C2 = TreeNode(name="C", parent=root2, payload=3)
D2 = TreeNode(name="D", parent=C2, payload=4)
E2 = TreeNode(name="E", parent=C2, payload=5)
F2 = TreeNode(name="F", parent=C2, payload="test")
G2 = TreeNode(name="G", parent=C2, payload=7)
H2 = TreeNode(name="H", parent=C2, payload=({1: 2}, [3, 4]))
I2 = TreeNode(name="I", parent=F2, payload=9)
monotext(pretty_tree(root2, node_details=lambda n: n.payload))
root ◄ 0
├───── A ◄ 1
├───── B ◄ 2
└───── C ◄ 3
       ├───── D ◄ 4
       ├───── E ◄ 5
       ├───── F ◄ test
       │      └───── I ◄ 9
       ├───── G ◄ 7
       └───── H ◄ ({1: 2}, [3, 4])

Algorithm Examples

Using utility algorithms with FlatTree and FlatTreeNode:

Finding descendants of a node:

from AlgoTree.utils import *
from pprint import pprint
pprint(descendants(C))
[FlatForestNode(name=E, parent=C, payload={'data': 'Data for E'}, root=C, children=['F']),
 FlatForestNode(name=F, parent=E, payload={'data': 'Data for F'}, root=C, children=[]),
 FlatForestNode(name=M, parent=C, payload={'data': 'Data for M'}, root=C, children=[])]

Finding ancestors of a node:

pprint(ancestors(I2))
[TreeNode(name=F, parent=C, root=root, payload=test, len(children)=1),
 TreeNode(name=C, parent=root, root=root, payload=3, len(children)=5),
 TreeNode(name=root, root=root, payload=0, len(children)=3)]

Finding siblings of a node:

pprint(siblings(E2))
[]

Finding leaves of a node:

pprint(leaves(root2))
[TreeNode(name=A, parent=root, root=root, payload=1, len(children)=0),
 TreeNode(name=B, parent=root, root=root, payload=2, len(children)=0),
 TreeNode(name=D, parent=C, root=root, payload=4, len(children)=0),
 TreeNode(name=E, parent=C, root=root, payload=5, len(children)=0),
 TreeNode(name=I, parent=F, root=root, payload=9, len(children)=0),
 TreeNode(name=G, parent=C, root=root, payload=7, len(children)=0),
 TreeNode(name=H, parent=C, root=root, payload=({1: 2}, [3, 4]), len(children)=0)]

Finding the height of a tree:

pprint(height(root2))
3

Finding the depth of a node:

pprint(depth(F2))
2

Breadth-first traversal:

def print_node(node, level):
    print(f"Level {level}: {node.name}")
    return False

breadth_first(root2, print_node)
Level 0: root
Level 1: A
Level 1: B
Level 1: C
Level 2: D
Level 2: E
Level 2: F
Level 2: G
Level 2: H
Level 3: I
False
print(json.dumps(utils.node_stats(forest.node("N")), indent=2))
{
  "node_info": {
    "type": "<class 'AlgoTree.flat_forest_node.FlatForestNode'>",
    "name": "N",
    "payload": {
      "data": "Data for N",
      "other_new_data": "Some other data for G"
    },
    "children": [
      "P",
      "Q"
    ],
    "parent": "A",
    "depth": 1,
    "is_root": false,
    "is_leaf": false,
    "is_internal": true,
    "ancestors": [
      "A"
    ],
    "siblings": [
      "B",
      "C",
      "I",
      "O",
      "945ef525-fece-45f0-a499-b5449d28ef1e",
      "W"
    ],
    "descendants": [
      "P",
      "R",
      "S",
      "Q"
    ],
    "path": [
      "A",
      "N"
    ],
    "root_distance": 1,
    "leaves_under": [
      "S",
      "Q"
    ]
  },
  "subtree_info": {
    "leaves": [
      "H",
      "T",
      "d433395a-27bc-457d-b6d9-c7034d020978",
      "F",
      "M",
      "J",
      "S",
      "Q",
      "O",
      "779cc759-36a9-4dae-a1c9-e021d65aa1d4",
      "14e057df-8004-47c3-9415-b7c7235ea4d8"
    ],
    "height": 3,
    "root": "A",
    "size": 5
  }
}

Mapping a function over the nodes:

def add_prefix(node):
    if node is None:
        return None
    elif node.name == "D":
        # add Q and R as children of D
        node.add_child(name="Q", value=41)
        node.add_child(name="R", value=42)
    elif node.name == "I" or node.name == "W":
        # delete I by returning None (i.e. don't add it to the new tree)
        return None
    elif "U" in [child.name for child in node.children]:
        return None
    return node

root_mapped = map(deepcopy(forest.root), add_prefix)

monotext(pretty_tree(root_mapped))

monotext(pretty_tree(forest.root))
A
├───── B
│      ├───── H
│      ├───── T
│      └───── d433395a-27bc-457d-b6d9-c7034d020978
├───── C
│      ├───── E
│      │      └───── F
│      └───── M
├───── N
│      ├───── P
│      │      └───── R
│      │             └───── S
│      └───── Q
└───── O
A
├───── B
│      ├───── H
│      ├───── T
│      └───── d433395a-27bc-457d-b6d9-c7034d020978
├───── C
│      ├───── E
│      │      └───── F
│      └───── M
├───── I
│      └───── J
├───── N
│      ├───── P
│      │      └───── R
│      │             └───── S
│      └───── Q
├───── O
├───── 945ef525-fece-45f0-a499-b5449d28ef1e
│      └───── U
│             └───── 779cc759-36a9-4dae-a1c9-e021d65aa1d4
└───── W
       └───── V
              └───── 14e057df-8004-47c3-9415-b7c7235ea4d8

Pruning nodes based on a predicate:

def should_prune(node):
    return node.name == "A"

monotext(pretty_tree(root2))
pruned_tree = prune(root2, should_prune)
monotext(pretty_tree(pruned_tree))
root
├───── A
├───── B
└───── C
       ├───── D
       ├───── E
       ├───── F
       │      └───── I
       ├───── G
       └───── H
root
├───── B
└───── C
       ├───── D
       ├───── E
       ├───── F
       │      └───── I
       ├───── G
       └───── H

Finding root-to-leaf paths:

from pprint import pprint
paths = node_to_leaf_paths(root)
# print max path length from root to leaf
pprint(max(paths, key=len))
print(utils.height(root) == len(max(paths, key=len)) - 1)
[TreeNode(name=root, root=root, payload={'value': 0}, len(children)=1),
 TreeNode(name=A, parent=root, root=root, payload={'value': 1}, len(children)=0)]
True

Converting paths to a tree:

rooter = paths_to_tree([["a", "b", "c"], ["a", "b", "d"], ["a", "e", "d"],
                        ["a", "f", "d"], ["a", "e", "g" ], ["a", "e", "h"],
                        ["a", "i", "j", "b"], ["a", "i", "j", "b", "m"],
                        ["a", "i", "j", "l", "b", "b", "b", "b", "b", "b", "t", "u", "v", "w", "x", "y", "b"]],
                        FlatForestNode)
monotext(pretty_tree(rooter))
a
├───── b
│      ├───── c
│      └───── d
├───── e
│      ├───── d_0
│      ├───── g
│      └───── h
├───── f
│      └───── d_1
└───── i
       └───── j
              ├───── b_0
              │      └───── m
              └───── l
                     └───── b_1
                            └───── b_2
                                   └───── b_3
                                          └───── b_4
                                                 └───── b_5
                                                        └───── b_6
                                                               └───── t
                                                                      └───── u
                                                                             └───── v
                                                                                    └───── w
                                                                                           └───── x
                                                                                                  └───── y
                                                                                                         └───── b_7
from AlgoTree.utils import depth, path, ancestors, siblings, is_root
A = rooter.node("i")
pretty_tree(A)
print(depth(A.children[0]))
print([n.name for n in path(A.children[0].children[0])])
print([n.name for n in ancestors(A.children[0].children[0])])
print(siblings(A.children[0]))
print(is_root(A))
2
['a', 'i', 'j', 'b_0']
['j', 'i', 'a']
[]
False
treenode = TreeNode(name="A")
TreeNode(name="B", parent=treenode)
C = TreeNode(name="C", parent=treenode)
TreeNode(name="D", parent=C)
TreeNode(name="E", parent=C)

monotext(pretty_tree(treenode))
A
├───── B
└───── C
       ├───── D
       └───── E
treenode_dict = {
    "name": "A",
    "value": 1,
    "children": [
        {"name": "B"},
        {"name": "C", "children": [
            {"name": "D"},
            {"name": "E"}
        ]}
    ]}

print(json.dumps(treenode_dict, indent=2))
print(json.dumps(treenode.to_dict(), indent=2))
{
  "name": "A",
  "value": 1,
  "children": [
    {
      "name": "B"
    },
    {
      "name": "C",
      "children": [
        {
          "name": "D"
        },
        {
          "name": "E"
        }
      ]
    }
  ]
}
{
  "name": "A",
  "payload": null,
  "children": [
    {
      "name": "B",
      "payload": null,
      "children": []
    },
    {
      "name": "C",
      "payload": null,
      "children": [
        {
          "name": "D",
          "payload": null,
          "children": []
        },
        {
          "name": "E",
          "payload": null,
          "children": []
        }
      ]
    }
  ]
}
treenode_from_dict = TreeNode.from_dict(treenode_dict)
monotext(pretty_tree(treenode_from_dict))
print(treenode_from_dict == treenode)
A
├───── B
└───── C
       ├───── D
       └───── E
True

Conclusion

The FlatForest class provides a powerful and flexible way to work with tree-like and forest-like data structures using a flat dictionary structure. It supports a wide range of operations, including node manipulation, tree traversal, detachment, pruning, and conversion between different tree representations.

Explore the AlgoTree package further to discover more features and utilities for working with trees in Python.