21-3 Tarjan's off-line least-common-ancestors algorithm

The least common ancestor of two nodes $u$ and $v$ in a rooted tree $T$ is the node $w$ that is an ancestor of both $u$ and $v$ and that has the greatest depth in $T$. In the off-line least-common-ancestors problem, we are given a rooted tree $T$ and an arbitrary set $P = \{\{u, v\}\}$ of unordered pairs of nodes in $T$, and we wish to determine the least common ancestor of each pair in $P$.

To solve the off-line least-common-ancestors problem, the following procedure performs a tree walk of $T$ with the initial call $\text{LCA}(T.root)$. We assume that each node is colored $\text{WHITE}$ prior to the walk.
LCA(u)
    MAKE-SET(u)
    FIND-SET(u).ancestor = u
    for each child v of u in T
        LCA(v)
        UNION(u, v)
        FIND-SET(u).ancestor = u
    u.color = BLACK
    for each node v such that {u, v} ∈ P
        if v.color == BLACK
            print "The least common ancestor of" u "and" v "is" FIND-SET(v).ancestor
a. Argue that line 10 executes exactly once for each pair $\{u, v\} \in P$.

b. Argue that at the time of the call $\text{LCA}(u)$, the number of sets in the disjoint-set data structure equals the depth of $u$ in $T$.

c. Prove that $\text{LCA}$ correctly prints the least common ancestor of $u$ and $v$ for each pair $\{u, v\} \in P$.

d. Analyze the running time of $\text{LCA}$, assuming that we use the implementation of the disjoint-set data structure in Section 21.3.

a. Suppose that we let $\le_{LCA}$ to be an ordering on the vertices so that $u \le_{LCA} v$ if we run line 7 of $\text{LCA}(u)$ before line 7 of $\text{LCA}(v)$. Then, when we are running line 7 of $\text{LCA}(u)$, we immediately go on to the for loop on line 8.

So, while we are doing this for loop, we still haven't called line 7 of $\text{LCA}(v)$. This means that $v.color$ is white, and so, the pair $\{u, v\}$ is not considered during the run of $\text{LCA}(u)$. However, during the for loop of $\text{LCA}(v)$, since line 7 of $\text{LCA}(u)$ has already run, $u.color = black$. This means that we will consider the pair $\{v, u\}$ during the running of $\text{LCA}(v)$.

It is not obvious what the ordering $\le_{LCA}$ is, as it will be implementation dependent. It depends on the order in which child vertices are iterated in the for loop on line 3. That is, it doesn't just depend on the graph structure.

b. We suppose that it is true prior to a given call of $\text{LCA}$, and show that this property is preserved throughout a run of the procedure, increasing the number of disjoint sets by one by the end of the procedure. So, supposing that $u$ has depth $d$ and there are $d$ items in the disjoint set data structure before it runs, it increases to $d + 1$ disjoint sets on line 1. So, by the time we get to line 4, and call $\text{LCA}$ of a child of $u$, there are $d + 1$ disjoint sets, this is exactly the depth of the child. After line 4, there are now $d + 2$ disjoint sets, so, line 5 brings it back down to $d + 1$ disjoint sets for the subsequent times through the loop. After the loop, there are no more changes to the number of disjoint sets, so, the algorithm terminates with $\text{d + 1}$ disjoint sets, as desired. Since this holds for any arbitrary run of $\text{LCA}$, it holds for all runs of $\text{LCA}$.

c. Suppose that the pair $u$ and $v$ have the least common ancestor $w$. Then, when running $\text{LCA}(w)$, $u$ will be in the subtree rooted at one of $w$'s children, and $v$ will be in another. WLOG, suppose that the subtree containing $u$ runs first.

So, when we are done with running that subtree, all of their ancestor values will point to $w$ and their colors will be black, and their ancestor values will not change until $\text{LCA}(w)$ returns. However, we run $\text{LCA}(v)$ before $\text{LCA}(w)$ returns, so in the for loop on line 8 of $\text{LCA}(v)$, we will be considering the pair $\{u, v\}$, since $u.color = black$. Since $u.ancestor$ is still $w$, that is what will be output, which is the correct answer for their $\text{LCA}$.

d. The time complexity of lines 1 and 2 are just constant. Then, for each child, we have a call to the same procedure, a $\text{UNION}$ operation which only takes constant time, and a $\text{FIND-SET}$ operation which can take at most amortized inverse Ackerman's time. Since we check each and every thing that is adjacent to $u$ for being black, we are only checking each pair in $P$ at most twice in lines 8-10, among all the runs of $\text{LCA}$. This means that the total runtime is $O(|T|\alpha(|T|) + |P|)$.