Expression Trees and Evaluation¶
We are going to explore the idea of expression trees and how they relate
to our tree structures, namely TreeNode
, and to evaluate the
expression trees by rewriting the nodes in post-order traversal.
First, let’s define our expression tree.
from AlgoTree.treenode import TreeNode
from copy import deepcopy
import json
# Define the expression tree
expr = TreeNode.from_dict(
{
"value": "+",
"type": "op",
"children": [
{
"value": "max",
"type": "op",
"children": [
{
"value": "+",
"type": "op",
"children": [
{"type": "var", "value": "x"},
{"type": "const", "value": 1},
],
},
{"type": "const", "value": 0},
],
},
{
"type": "op",
"value": "+",
"children": [
{
"type": "op",
"value": "max",
"children": [
{"type": "var", "value": "x"},
{"type": "var", "value": "y"},
],
},
{"type": "const", "value": 3},
{"type": "var", "value": "y"},
],
},
],
}
)
# Print the expression tree in JSON format
print(json.dumps(expr.to_dict(), indent=4))
{
"name": "0a5c451b-cad1-48e9-9852-aec46058acda",
"payload": {
"value": "+",
"type": "op"
},
"children": [
{
"name": "78baf6c2-afbd-4bb4-90a4-a08e5a3fc4e0",
"payload": {
"value": "max",
"type": "op"
},
"children": [
{
"name": "fc106112-b614-467f-adff-c0c917ab03e5",
"payload": {
"value": "+",
"type": "op"
},
"children": [
{
"name": "30a87f4c-b4be-4217-8a24-0f0d258b2a25",
"payload": {
"type": "var",
"value": "x"
},
"children": []
},
{
"name": "41d8adc8-1f73-409a-ab68-ea98d53a1c56",
"payload": {
"type": "const",
"value": 1
},
"children": []
}
]
},
{
"name": "5a8df89c-c833-4aae-8749-5ce307424675",
"payload": {
"type": "const",
"value": 0
},
"children": []
}
]
},
{
"name": "1d223ec4-9daf-45f4-b2a8-9db394571b83",
"payload": {
"type": "op",
"value": "+"
},
"children": [
{
"name": "dde50326-3af8-4544-9551-9aa8123ac2bd",
"payload": {
"type": "op",
"value": "max"
},
"children": [
{
"name": "f4057e80-3e19-477d-9b38-137b3a08d31e",
"payload": {
"type": "var",
"value": "x"
},
"children": []
},
{
"name": "f633d8d3-2a2e-474c-a44d-5da441854dcb",
"payload": {
"type": "var",
"value": "y"
},
"children": []
}
]
},
{
"name": "1549bcb8-2ad7-45eb-a8f1-eef6aa53bbec",
"payload": {
"type": "const",
"value": 3
},
"children": []
},
{
"name": "7dc9523b-2ab5-4fef-8424-5f189f2e85f2",
"payload": {
"type": "var",
"value": "y"
},
"children": []
}
]
}
]
}
Visualizing the Tree Structure¶
We can use the class PrettyTree
(and a standalone pretty_tree
function based on that class) to visualize the tree structure in a more
human-readable way.
from AlgoTree.tree_converter import TreeConverter
from AlgoTree.pretty_tree import pretty_tree
print(pretty_tree(expr, node_name=lambda x: x.payload["value"]))
+
├───── max
│ ├───── +
│ │ ├───── x
│ │ └───── 1
│ └───── 0
└───── +
├───── max
│ ├───── x
│ └───── y
├───── 3
└───── y
Post-order Traversal¶
As a tree structure, TreeNode
implements an interface that permits
tree traversal algorithms like depth-first pre-order and post-order
traversals.
We are going to implement a simple post-order traversal algorithm to
permit computation of the expression tree we defined earlier, expr
.
We see that it contains three operator types, +
, *
, and max
,
as well as numbers and variables.
We will provide a closure over all of these types so that when we evaluate the expression in post-order, all of the types are defined for the operations.
def postorder(node, fn, ctx):
"""
Applies function `fn` to the nodes in the tree using post-order traversal.
:param fn: Function to apply to each node. Should accept one argument: the node.
:param ctx: Context passed to the function.
:return: The tree with the function `fn` applied to its nodes.
"""
results = []
for child in node.children:
result = postorder(child, fn, ctx)
if result is not None:
results.append(result)
node.children = results
return fn(node, ctx)
The function postorder
takes a tree node node
, a function
fn
, and a context ctx
, and returns a rewritten tree.
At each node, postorder
recursively calls fn
on its children
before applying fn
to the node itself. This is the essence of
post-order traversal.
Post-order is useful for problems where the children need to be processed before the node itself. For example, evaluating an expression tree, where typically the value of a node can only be computed after the values of its children are known.
In contrast, pre-order traversal applies fn
to the node before
applying it to the children. Pre-order may be useful for tasks such as
rewriting the tree in a different form, like algebraic simplification.
Expression Tree Evaluator¶
We will now design a simple expression tree evaluator, Eval
.
class Eval:
"""
An evaluator for expressions defined by operations on types, respectively
defined by `Eval.Op` and `Eval.Type`. The operations are a
dictionary where the keys are the operation names and the values are
functions that take a node and a context and return the value of the
operation in that context.
"""
Op = {
"+": lambda x: sum(x),
"max": lambda x: max(x),
}
Type = {
"const": lambda node, _: node.payload["value"],
"var": lambda node, ctx: ctx[node.payload["value"]],
"op": lambda node, _: Eval.Op[node.payload["value"]](
[c.payload["value"] for c in node.children]
),
}
def __init__(self, debug=True):
"""
:param debug: If True, print debug information
"""
self.debug = debug
def __call__(self, expr, ctx):
NodeType = type(expr)
def _eval(node, ctx):
expr_type = node.payload["type"]
value = Eval.Type[expr_type](node, ctx)
result = NodeType(type="const", value=value)
if self.debug:
print(f"Eval({node.payload}) -> {result.payload}")
return result
return postorder(deepcopy(expr), _eval, ctx)
To evaluate an expression tree, we need the operations to be defined for all of the types during post-order (bottom-up) traversal. We can define a closure over all of the types, and then use that closure to evaluate the expression tree.
We call this closure a context. Normally, the operations and other things are also defined in the closure, but for simplicity we will just define the operations and provide closures over the variables.
# Define the context with variable values
ctx = {"x": 1, "y": 2, "z": 3}
# Evaluate the expression tree with the context
result = Eval(debug=True)(expr, ctx)
Eval({'type': 'var', 'value': 'x'}) -> {'type': 'const', 'value': 1}
Eval({'type': 'const', 'value': 1}) -> {'type': 'const', 'value': 1}
Eval({'value': '+', 'type': 'op'}) -> {'type': 'const', 'value': 2}
Eval({'type': 'const', 'value': 0}) -> {'type': 'const', 'value': 0}
Eval({'value': 'max', 'type': 'op'}) -> {'type': 'const', 'value': 2}
Eval({'type': 'var', 'value': 'x'}) -> {'type': 'const', 'value': 1}
Eval({'type': 'var', 'value': 'y'}) -> {'type': 'const', 'value': 2}
Eval({'type': 'op', 'value': 'max'}) -> {'type': 'const', 'value': 2}
Eval({'type': 'const', 'value': 3}) -> {'type': 'const', 'value': 3}
Eval({'type': 'var', 'value': 'y'}) -> {'type': 'const', 'value': 2}
Eval({'type': 'op', 'value': '+'}) -> {'type': 'const', 'value': 7}
Eval({'value': '+', 'type': 'op'}) -> {'type': 'const', 'value': 9}
Let’s print the final result of the evaluation of the expression tree.
# Print the result of the evaluation
print(result.payload)
{'type': 'const', 'value': 9}
Self-Evaluating Trees¶
We see that we get the expected result, 9
. Note that it is still a
tree, but it has been transformed into a so-called self-evaluating tree
expression, which in this case is a single node with no children.
We can evaluate it again, and we see that it cannot be rewritten further. We call this state a normal form. Essentially, we can think of the tree as a program that computes a value, and the normal form is the result of running the program.
# Ensure the evaluated result is in its normal form
assert Eval(debug=False)(result, ctx).payload == result.payload
Converting to FlatForest¶
Let’s convert the tree to a FlatForest
and perform the same
evaluation.
from AlgoTree.flat_forest_node import FlatForestNode
from AlgoTree.flat_forest import FlatForest
flat_expr = TreeConverter.convert(source=expr, target_type=FlatForestNode, extract=lambda x: x.payload)
print(pretty_tree(flat_expr, node_name=lambda x: x.payload["value"]))
+
├───── max
│ ├───── +
│ │ ├───── x
│ │ └───── 1
│ └───── 0
└───── +
├───── max
│ ├───── x
│ └───── y
├───── 3
└───── y
Evaluate the flat forest expression
result = Eval(False)(flat_expr, ctx)
print(result.payload)
{'type': 'const', 'value': 9}
The FlatForest
structure is a different kind of structure that is
more convenient for relatively flatter data, like conversation logs. It
is a forest structure that is flattened into a dictionary of key-value
pairs, where the value is also a dictionary. This value dictionary
optionally contains the parent key, and if not then it is a root node.
If more than one root node is present, then it is a forest, but by
default it exposes a single root node (preferred root) for convenience,
which is by default the first root node encountered.
Handling Undefined Variables¶
What happens when we change the context so that not every variable is defined?
# Define an incomplete context with missing variable values
open_ctx = {
"x": 1,
# 'y': 2, # 'y' is not defined in this context
"z": 3,
}
# Try evaluating the expression tree with the incomplete context
try:
Eval(debug=True)(expr, open_ctx)
except KeyError as e:
print(f"Error: {e}")
Eval({'type': 'var', 'value': 'x'}) -> {'type': 'const', 'value': 1}
Eval({'type': 'const', 'value': 1}) -> {'type': 'const', 'value': 1}
Eval({'value': '+', 'type': 'op'}) -> {'type': 'const', 'value': 2}
Eval({'type': 'const', 'value': 0}) -> {'type': 'const', 'value': 0}
Eval({'value': 'max', 'type': 'op'}) -> {'type': 'const', 'value': 2}
Eval({'type': 'var', 'value': 'x'}) -> {'type': 'const', 'value': 1}
Error: 'y'
We see that we get an error. Our operations in Eval.Op
are not
defined over undefined variables.
We would run into a similar problem if we used pre-order traversal
instead of post-order. In pre-order traversal, we would try to evaluate
the parent node (say, an operation) before we had evaluated its
children, which would result in an error. Our defined operations only
work over numbers (type const
), so we need to evaluate the
non-const
expressions first in order for our operations to be
defined for them.
Post-order vs. Pre-order Traversal¶
Post-order traversal is good for things like evaluating expressions, where you need to evaluate the children before you can evaluate the parent.
Pre-order traversal is good for things like rewriting trees from the top
down, but your rewrite rules need to be defined in terms of
sub-expression trees. So, for example, you might have a complex
expression and seek to rewrite it into a simpler form. This is an
example of a rewrite system. A rewrite system is a set of rules that
transform expressions into other expressions. For instance, suppose that
we add a 0
to a variable x
in the expression tree. We know that
x + 0
is the same as x
, so we could add a rewrite rule that maps
the sub-tree (+ x 0)
to x
. We could add many rewrite rules to
implement, for instance, algebraic simplification (simplify
), or
implement a compiler (compile
) that translates the tree into a
different form that could be evaluated by a different set of rewrite
rules. Or, the compiler could be an optimizing compiler that rewrites
the tree into a form that is more efficient to evaluate, like replacing
a multiplication by a power of two with a shift or getting rid of no-op
operations like adding zero.
Alternative Way To Construct Expression Trees¶
We imported from a dict
(or JSON) representation of the expression
tree. This is a common way to construct trees from data, and it is also
a common way to serialize trees to disk or to send them over the
network.
Howerver, we can also construct the tree directly using the TreeNode
class.
root = TreeNode(name="+", value="+", type="op")
root_1 = TreeNode(name="max", value="max", type="op", parent=root)
root_2 = TreeNode(name="+", value="+", type="op", parent=root)
root_1_1 = TreeNode(name="+", value="+", type="op", parent=root_1)
root_1_1_1 = TreeNode(name="var", value="x", type="var", parent=root_1_1)
root_1_1_2 = TreeNode(name="const", value=1, type="const", parent=root_1_1)
root_2_1 = TreeNode(name="max", value="max", type="op", parent=root_2)
root_2_1_1 = TreeNode(name="var", value="x", type="var", parent=root_2_1)
root_2_1_2 = TreeNode(name="var", value="y", type="var", parent=root_2_1)
root_2_2 = TreeNode(name="const", value=3, type="const", parent=root_2)
root_2_3 = TreeNode(name="var", value="y", type="var", parent=root_2)
Let’s evaluate this tree to see if it gives the same result as the previous expression tree.
result = Eval(False)(flat_expr, ctx)
print(result.payload)
{'type': 'const', 'value': 9}
Conclusion¶
We have explored the idea of expression trees and how they relate to our
tree structures, namely TreeNode
and FlatForestNode
, and how to
evaluate the expression trees by rewriting the nodes in post-order
traversal.
The TreeNode
structure is a general-purpose tree structure that is
fast and efficient for these kinds of operations. The FlatForestNode
structure is a more specialized structure that is more convenient for
relatively flatter data, like conversation logs.