Expression Trees and Evaluation =============================== We are going to explore the idea of expression trees and how they relate to our tree structures, namely ``TreeNode``, and to evaluate the expression trees by rewriting the nodes in post-order traversal. First, let’s define our expression tree. .. code:: ipython3 from AlgoTree.treenode import TreeNode from copy import deepcopy import json # Define the expression tree expr = TreeNode.from_dict( { "value": "+", "type": "op", "children": [ { "value": "max", "type": "op", "children": [ { "value": "+", "type": "op", "children": [ {"type": "var", "value": "x"}, {"type": "const", "value": 1}, ], }, {"type": "const", "value": 0}, ], }, { "type": "op", "value": "+", "children": [ { "type": "op", "value": "max", "children": [ {"type": "var", "value": "x"}, {"type": "var", "value": "y"}, ], }, {"type": "const", "value": 3}, {"type": "var", "value": "y"}, ], }, ], } ) .. code:: ipython3 # Print the expression tree in JSON format print(json.dumps(expr.to_dict(), indent=4)) .. parsed-literal:: { "name": "0a5c451b-cad1-48e9-9852-aec46058acda", "payload": { "value": "+", "type": "op" }, "children": [ { "name": "78baf6c2-afbd-4bb4-90a4-a08e5a3fc4e0", "payload": { "value": "max", "type": "op" }, "children": [ { "name": "fc106112-b614-467f-adff-c0c917ab03e5", "payload": { "value": "+", "type": "op" }, "children": [ { "name": "30a87f4c-b4be-4217-8a24-0f0d258b2a25", "payload": { "type": "var", "value": "x" }, "children": [] }, { "name": "41d8adc8-1f73-409a-ab68-ea98d53a1c56", "payload": { "type": "const", "value": 1 }, "children": [] } ] }, { "name": "5a8df89c-c833-4aae-8749-5ce307424675", "payload": { "type": "const", "value": 0 }, "children": [] } ] }, { "name": "1d223ec4-9daf-45f4-b2a8-9db394571b83", "payload": { "type": "op", "value": "+" }, "children": [ { "name": "dde50326-3af8-4544-9551-9aa8123ac2bd", "payload": { "type": "op", "value": "max" }, "children": [ { "name": "f4057e80-3e19-477d-9b38-137b3a08d31e", "payload": { "type": "var", "value": "x" }, "children": [] }, { "name": "f633d8d3-2a2e-474c-a44d-5da441854dcb", "payload": { "type": "var", "value": "y" }, "children": [] } ] }, { "name": "1549bcb8-2ad7-45eb-a8f1-eef6aa53bbec", "payload": { "type": "const", "value": 3 }, "children": [] }, { "name": "7dc9523b-2ab5-4fef-8424-5f189f2e85f2", "payload": { "type": "var", "value": "y" }, "children": [] } ] } ] } Visualizing the Tree Structure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We can use the class ``PrettyTree`` (and a standalone ``pretty_tree`` function based on that class) to visualize the tree structure in a more human-readable way. .. code:: ipython3 from AlgoTree.tree_converter import TreeConverter from AlgoTree.pretty_tree import pretty_tree .. code:: ipython3 print(pretty_tree(expr, node_name=lambda x: x.payload["value"])) .. parsed-literal:: + ├───── max │ ├───── + │ │ ├───── x │ │ └───── 1 │ └───── 0 └───── + ├───── max │ ├───── x │ └───── y ├───── 3 └───── y Post-order Traversal ~~~~~~~~~~~~~~~~~~~~ As a tree structure, ``TreeNode`` implements an interface that permits tree traversal algorithms like depth-first pre-order and post-order traversals. We are going to implement a simple post-order traversal algorithm to permit computation of the expression tree we defined earlier, ``expr``. We see that it contains three operator types, ``+``, ``*``, and ``max``, as well as numbers and variables. We will provide a **closure** over all of these types so that when we evaluate the expression in post-order, all of the types are defined for the operations. .. code:: ipython3 def postorder(node, fn, ctx): """ Applies function `fn` to the nodes in the tree using post-order traversal. :param fn: Function to apply to each node. Should accept one argument: the node. :param ctx: Context passed to the function. :return: The tree with the function `fn` applied to its nodes. """ results = [] for child in node.children: result = postorder(child, fn, ctx) if result is not None: results.append(result) node.children = results return fn(node, ctx) The function ``postorder`` takes a tree node ``node``, a function ``fn``, and a context ``ctx``, and returns a rewritten tree. At each node, ``postorder`` recursively calls ``fn`` on its children before applying ``fn`` to the node itself. This is the essence of post-order traversal. Post-order is useful for problems where the children need to be processed before the node itself. For example, evaluating an expression tree, where typically the value of a node can only be computed after the values of its children are known. In contrast, pre-order traversal applies ``fn`` to the node before applying it to the children. Pre-order may be useful for tasks such as rewriting the tree in a different form, like algebraic simplification. Expression Tree Evaluator ~~~~~~~~~~~~~~~~~~~~~~~~~ We will now design a simple expression tree evaluator, ``Eval``. .. code:: ipython3 class Eval: """ An evaluator for expressions defined by operations on types, respectively defined by `Eval.Op` and `Eval.Type`. The operations are a dictionary where the keys are the operation names and the values are functions that take a node and a context and return the value of the operation in that context. """ Op = { "+": lambda x: sum(x), "max": lambda x: max(x), } Type = { "const": lambda node, _: node.payload["value"], "var": lambda node, ctx: ctx[node.payload["value"]], "op": lambda node, _: Eval.Op[node.payload["value"]]( [c.payload["value"] for c in node.children] ), } def __init__(self, debug=True): """ :param debug: If True, print debug information """ self.debug = debug def __call__(self, expr, ctx): NodeType = type(expr) def _eval(node, ctx): expr_type = node.payload["type"] value = Eval.Type[expr_type](node, ctx) result = NodeType(type="const", value=value) if self.debug: print(f"Eval({node.payload}) -> {result.payload}") return result return postorder(deepcopy(expr), _eval, ctx) To evaluate an expression tree, we need the operations to be defined for all of the types during post-order (bottom-up) traversal. We can define a closure over all of the types, and then use that closure to evaluate the expression tree. We call this closure a context. Normally, the operations and other things are also defined in the closure, but for simplicity we will just define the operations and provide closures over the variables. .. code:: ipython3 # Define the context with variable values ctx = {"x": 1, "y": 2, "z": 3} # Evaluate the expression tree with the context result = Eval(debug=True)(expr, ctx) .. parsed-literal:: Eval({'type': 'var', 'value': 'x'}) -> {'type': 'const', 'value': 1} Eval({'type': 'const', 'value': 1}) -> {'type': 'const', 'value': 1} Eval({'value': '+', 'type': 'op'}) -> {'type': 'const', 'value': 2} Eval({'type': 'const', 'value': 0}) -> {'type': 'const', 'value': 0} Eval({'value': 'max', 'type': 'op'}) -> {'type': 'const', 'value': 2} Eval({'type': 'var', 'value': 'x'}) -> {'type': 'const', 'value': 1} Eval({'type': 'var', 'value': 'y'}) -> {'type': 'const', 'value': 2} Eval({'type': 'op', 'value': 'max'}) -> {'type': 'const', 'value': 2} Eval({'type': 'const', 'value': 3}) -> {'type': 'const', 'value': 3} Eval({'type': 'var', 'value': 'y'}) -> {'type': 'const', 'value': 2} Eval({'type': 'op', 'value': '+'}) -> {'type': 'const', 'value': 7} Eval({'value': '+', 'type': 'op'}) -> {'type': 'const', 'value': 9} Let’s print the final result of the evaluation of the expression tree. .. code:: ipython3 # Print the result of the evaluation print(result.payload) .. parsed-literal:: {'type': 'const', 'value': 9} Self-Evaluating Trees --------------------- We see that we get the expected result, ``9``. Note that it is still a tree, but it has been transformed into a so-called self-evaluating tree expression, which in this case is a single node with no children. We can evaluate it again, and we see that it cannot be rewritten further. We call this state a **normal form**. Essentially, we can think of the tree as a program that computes a value, and the normal form is the result of running the program. .. code:: ipython3 # Ensure the evaluated result is in its normal form assert Eval(debug=False)(result, ctx).payload == result.payload Converting to FlatForest ~~~~~~~~~~~~~~~~~~~~~~~~ Let’s convert the tree to a ``FlatForest`` and perform the same evaluation. .. code:: ipython3 from AlgoTree.flat_forest_node import FlatForestNode from AlgoTree.flat_forest import FlatForest flat_expr = TreeConverter.convert(source=expr, target_type=FlatForestNode, extract=lambda x: x.payload) print(pretty_tree(flat_expr, node_name=lambda x: x.payload["value"])) .. parsed-literal:: + ├───── max │ ├───── + │ │ ├───── x │ │ └───── 1 │ └───── 0 └───── + ├───── max │ ├───── x │ └───── y ├───── 3 └───── y Evaluate the flat forest expression .. code:: ipython3 result = Eval(False)(flat_expr, ctx) print(result.payload) .. parsed-literal:: {'type': 'const', 'value': 9} The ``FlatForest`` structure is a different kind of structure that is more convenient for relatively flatter data, like conversation logs. It is a forest structure that is flattened into a dictionary of key-value pairs, where the value is also a dictionary. This value dictionary optionally contains the parent key, and if not then it is a root node. If more than one root node is present, then it is a forest, but by default it exposes a single root node (preferred root) for convenience, which is by default the first root node encountered. Handling Undefined Variables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ What happens when we change the context so that not every variable is defined? .. code:: ipython3 # Define an incomplete context with missing variable values open_ctx = { "x": 1, # 'y': 2, # 'y' is not defined in this context "z": 3, } # Try evaluating the expression tree with the incomplete context try: Eval(debug=True)(expr, open_ctx) except KeyError as e: print(f"Error: {e}") .. parsed-literal:: Eval({'type': 'var', 'value': 'x'}) -> {'type': 'const', 'value': 1} Eval({'type': 'const', 'value': 1}) -> {'type': 'const', 'value': 1} Eval({'value': '+', 'type': 'op'}) -> {'type': 'const', 'value': 2} Eval({'type': 'const', 'value': 0}) -> {'type': 'const', 'value': 0} Eval({'value': 'max', 'type': 'op'}) -> {'type': 'const', 'value': 2} Eval({'type': 'var', 'value': 'x'}) -> {'type': 'const', 'value': 1} Error: 'y' We see that we get an error. Our operations in ``Eval.Op`` are not defined over undefined variables. We would run into a similar problem if we used pre-order traversal instead of post-order. In pre-order traversal, we would try to evaluate the parent node (say, an operation) before we had evaluated its children, which would result in an error. Our defined operations only work over numbers (type ``const``), so we need to evaluate the non-``const`` expressions first in order for our operations to be defined for them. Post-order vs. Pre-order Traversal ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Post-order traversal is good for things like evaluating expressions, where you need to evaluate the children before you can evaluate the parent. Pre-order traversal is good for things like rewriting trees from the top down, but your rewrite rules need to be defined in terms of sub-expression trees. So, for example, you might have a complex expression and seek to rewrite it into a simpler form. This is an example of a **rewrite system**. A rewrite system is a set of rules that transform expressions into other expressions. For instance, suppose that we add a ``0`` to a variable ``x`` in the expression tree. We know that ``x + 0`` is the same as ``x``, so we could add a rewrite rule that maps the sub-tree ``(+ x 0)`` to ``x``. We could add many rewrite rules to implement, for instance, algebraic simplification (``simplify``), or implement a compiler (``compile``) that translates the tree into a different form that could be evaluated by a different set of rewrite rules. Or, the compiler could be an optimizing compiler that rewrites the tree into a form that is more efficient to evaluate, like replacing a multiplication by a power of two with a shift or getting rid of no-op operations like adding zero. Alternative Way To Construct Expression Trees --------------------------------------------- We imported from a ``dict`` (or JSON) representation of the expression tree. This is a common way to construct trees from data, and it is also a common way to serialize trees to disk or to send them over the network. Howerver, we can also construct the tree directly using the ``TreeNode`` class. .. code:: ipython3 root = TreeNode(name="+", value="+", type="op") root_1 = TreeNode(name="max", value="max", type="op", parent=root) root_2 = TreeNode(name="+", value="+", type="op", parent=root) root_1_1 = TreeNode(name="+", value="+", type="op", parent=root_1) root_1_1_1 = TreeNode(name="var", value="x", type="var", parent=root_1_1) root_1_1_2 = TreeNode(name="const", value=1, type="const", parent=root_1_1) root_2_1 = TreeNode(name="max", value="max", type="op", parent=root_2) root_2_1_1 = TreeNode(name="var", value="x", type="var", parent=root_2_1) root_2_1_2 = TreeNode(name="var", value="y", type="var", parent=root_2_1) root_2_2 = TreeNode(name="const", value=3, type="const", parent=root_2) root_2_3 = TreeNode(name="var", value="y", type="var", parent=root_2) Let’s evaluate this tree to see if it gives the same result as the previous expression tree. .. code:: ipython3 result = Eval(False)(flat_expr, ctx) print(result.payload) .. parsed-literal:: {'type': 'const', 'value': 9} Conclusion ---------- We have explored the idea of expression trees and how they relate to our tree structures, namely ``TreeNode`` and ``FlatForestNode``, and how to evaluate the expression trees by rewriting the nodes in post-order traversal. The ``TreeNode`` structure is a general-purpose tree structure that is fast and efficient for these kinds of operations. The ``FlatForestNode`` structure is a more specialized structure that is more convenient for relatively flatter data, like conversation logs.