21.3 Disjoint-set forests
21.3-1¶
Redo Exercise 21.2-2 using a disjoint-set forest with union by rank and path compression.
21.3-2¶
Write a nonrecursive version of $\text{FIND-SET}$ with path compression.
To implement $\text{FIND-SET}$ nonrecursively, let $x$ be the element we call the function on. Create a linked list $A$ which contains a pointer to $x$. Each time we most one element up the tree, insert a pointer to that element into $A$. Once the root $r$ has been found, use the linked list to find each node on the path from the root to $x$ and update its parent to $r$.
21.3-3¶
Give a sequence of $m$ $\text{MAKE-SET}$, $\text{UNION}$, and $\text{FIND-SET}$ operations, $n$ of which are $\text{MAKE-SET}$ operations, that takes $\Omega(m\lg n)$ time when we use union by rank only.
Suppose that $n' = 2k$ is the smallest power of two less than $n$. To see that this sequences of operations does take the required amount of time, we'll first note that after each iteration of the for loop indexed by $j$, we have that the elements $x_1, \dots, x_{n'}$ are in trees of depth $i$. So, after we finish the outer for loop, we have that $x_1, \dots, x_{n'}$ all lie in the same set, but are represented by a tree of depth $k \in \Omega(\lg n)$. Then, since we repeatedly call $\text{FIND-SET}$ on an item that is $\lg n$ away from its set representative, we have that each one takes time $\lg n$. So, the last for loop alltogther takes time $\Omega(m \lg n)$.
for i = 1 to n
MAKE-SET(x[i])
for i = 1 to k
for j = 1..n' - 2^{i = 1} by 2^i
UNION(x[i], x[i + 2^{j - 1}])
for i = 1 to m
FIND-SET(x[1])
21.3-4¶
Suppose that we wish to add the operation $\text{PRINT-SET}(x)$, which is given a node $x$ and prints all the members of $x$'s set, in any order. Show how we can add just a single attribute to each node in a disjoint-set forest so that $\text{PRINT-SET}(x)$ takes time linear in the number of members of $x$'s set and the asymptotic running times of the other operations are unchanged. Assume that we can print each member of the set in $O(1)$ time.
In addition to each tree, we'll store a linked list (whose set object contains a single tail pointer) with which keeps track of all the names of elements in the tree. The only additional information we'll store in each node is a pointer $x.l$ to that element's position in the list.
- When we call $\text{MAKE-SET}(x)$, we'll also create a new linked list, insert the label of $x$ into the list, and set $x.l$ to a pointer to that label. This is all done in $O(1)$.
- $\text{FIND-SET}$ will remain unchanged.
- $\text{UNION}(x, y)$ will work as usual, with the additional requirement that we union the linked lists of $x$ and $y$, since we don't need to update pointers to the head, we can link up the lists in constant time, thus preserving the runtime of $\text{UNION}$.
- Finally, $\text{PRINT-SET}(x)$ works as follows: first, set $s = \text{FIND-SET}(x)$. Then print the elements in the linked list, starting with the element pointed to by $x$. (This will be the first element in the list). Since the list contains the same number of elements as the set and printing takes $O(1)$, this operation takes linear time in the number of set members.
21.3-5 $\star$¶
Show that any sequence of $m$ $\text{MAKE-SET}$, $\text{FIND-SET}$, and $\text{LINK}$ operations, where all the $\text{LINK}$ operations appear before any of the $\text{FIND-SET}$ operations, takes only $O(m)$ time if we use both path compression and union by rank. What happens in the same situation if we use only the path-compression heuristic?
Clearly each $\text{MAKE-SET}$ and $\text{LINK}$ operation only takes time $O(1)$, so, supposing that $n$ is the number of $\text{FIND-SET}$ operations occuring after the making and linking, we need to show that all the $\text{FIND-SET}$ operations only take time $O(n)$.
To do this, we will ammortize some of the cost of the $\text{FIND-SET}$ operations into the cost of the $\text{MAKE-SET}$ operations. Imagine paying some constant amount extra for each $\text{MAKE-SET}$ operation. Then, when doing a $\text{FIND-SET}(x)$ operation, we have three possibilities:
- First, we could have that $x$ is the representative of its own set. In this case, it clearly only takes constant time to run.
- Second, we could have that the path from $x$ to its set's representative is already compressed, so it only takes a single step to find the set representative. In this case also, the time required is constant.
- Third, we could have that $x$ is not the representative and it's path has not been compressed. Then, suppose that there are k nodes between $x$ and its representative. The time of this $\text{FIND-SET}$ operation is $O(k)$, but it also ends up compressing the paths of $k$ nodes, so we use that extra amount that we paid during the $\text{MAKE-SET}$ operations for these $k$ nodes whose paths were compressed. Any subsequent call to find set for these nodes will take only a constant amount of time, so we would never try to use the work that amortization amount twice for a given node.