Trees
All the trees below are rooted trees and ordered in that the order of children (subtrees) from left to right matters.
1. Strict Binary Trees
Each node in a strict binary tree has either two child nodes and is called a Fork node, or has zero children and is called a Leaf node. If such a tree has 'f≥0' Fork nodes then it has f+1 Leaf nodes (proof by induction) so it has 2f+1 nodes in total.
1.1. Encoding Strict Binary Trees
A strict binary tree of 'n' nodes can be encoded succinctly in an n bit code-word: Perform a prefix traversal of the tree and output a '0' when first visiting a Fork and a '1' on visiting a Leaf. It can be seen that the tree can be recovered from the string of bits. The end of the code-word for the tree is detected upon seeing one more '1' than '0' so this is a prefix code for strict binary trees. It is an efficient code because pr(Fork) and pr(Leaf) tend to 1/2 as n→∞ and −log2(1/2)=1.
2. Binary Trees
Each node in a binary tree has two, one or zero child nodes; in a slight stretch, also call a node with one child a Fork. (There are variations as to whether or not it is significant if a lone child is on the left or right.) The parse-tree of an arithmetic expression is an example of a binary tree: A (node of a) binary operator (+, −, ×, ÷) has two children, a unary operator (+, − sign) has one child, and an operand (number, variable) is a Leaf.
2.1. Encoding Binary Trees
A binary tree of 'f' Forks and 'h' Leaves can be encoded in 2f+h bits: Use the method described for strict binary trees but add an extra bit for each Fork to indicate if it has one or two children. This is efficient if the probabilities of Forks having one and two children are equal.
3. General Trees
Each node in a general or k-ary tree can have any number of child nodes. There is a well-known method of representing such a tree by a multi-level list: Each tree node is represented by a list cell which points to its children (if any) and to its siblings (if any). Such a list is equivalent to a binary tree; see below.
3.1. Encoding General Trees
A general tree of 'n' nodes can be encoded in 2n bits if n is common knowledge and an 2n+1 bits otherwise: Perform a prefix traversal of the tree and output a '0' when going Down an edge and a '1' when going back Up an edge. If n is not common knowledge then we are unable to detect the end of the code-word but if an extra '1' (Up) is appended we can, giving a prefix code for general trees.
4. How Many?
The numbers of (i) strings of 'b' pairs of matched brackets '(' and ')' also known as parentheses, (ii) general trees of b+1 nodes, and (iii) strict binary trees of b Forks, are all the same and equal to the bth Catalan number*, e.g., b=3, Cb=5:
As for general trees, if b is not common knowledge we need to append an extra '1' to an encoding of a string of matched brackets to indicate the end of the code-word((())) (()()) (())() ()(()) ()()() 0001111 0010111 0011011 0100111 0101011
Starting with general trees (R=root)
R R R R R | | / \ / \ /|\ . . . . . . . . . | / \ | | . . . . . | . DDDUUU DDUDUU DDUUDU DUDDUU DUDUDU 0001111 0010111 0011011 0100111 0101011
Convert to multi-level lists
R R R R R | | | | | . . .--. .--. .--.--. | | | | . .--. . . | .
Rotate clockwise slightly and add leaves (not shown) to vacant pointers in Forks giving strict binary trees;
R R R R R / / / / / F F F F F / / / \ \ \ F F F F F F / \ / \ F F F F
Finally, remove (uninteresting) root nodes still leaving strict binary trees (leaves not shown)
F F F F F / / / \ \ \ F F F F F F / \ / \ F F F F FFFLLLL FFLFLLL FFLLFLL FLFFLLL FLFLFLL 0001111 0010111 0011011 0100111 0101011
The tree encodings are exploited in certain universal codes [AKS21] for large integers used in data-compression and A.I..
References
- [AKS21] Lloyd Allison, Arun S. Konagurthu & Daniel F. Schmidt, 'On Universal Codes for Integers: Wallace Tree, Elias Omega and Beyond', Data Compression Conference (DCC), pp.313-322, doi:10.1109/DCC50243.2021.00039, 22-26 March 2021.
And other publications.
*Catalan numbers, Cn, n≥0: 1, 1, 2, 5, 14, 42, ..., C0=1, Cn+1=∑i=0..nCiCn-i