20-1 Space requirements for van Emde Boas trees

This problem explores the space requirements for van Emde Boas trees and suggests a way to modify the data structure to make its space requirement depend on the number $n$ of elements actually stored in the tree, rather than on the universe size $u$. For simplicity, assume that $\sqrt u$ is always an integer.

a. Explain why the following recurrence characterizes the space requirement $P(u)$ of a van Emde Boas tree with universe size $u$:

$$P(u) = (\sqrt u + 1) P(\sqrt u) + \Theta(\sqrt u). \tag{20.5}$$

b. Prove that recurrence $\text{(20.5)}$ has the solution $P(u) = O(u)$.

In order to reduce the space requirements, let us define a reduced-space van Emde Boas tree, or RS-vEB tree, as a vEB tree $V$ but with the following changes:

  • The attribute $V.cluster$, rather than being stored as a simple array of pointers to vEB trees with universe size $\sqrt u$, is a hash table (see Chapter 11) stored as a dynamic table (see Section 17.4). Corresponding to the array version of $V.cluster$, the hash table stores pointers to RS-vEB trees with universe size $\sqrt u$. To find the $i$th cluster, we look up the key $i$ in the hash table, so that we can find the $i$th cluster by a single search in the hash table.
  • The hash table stores only pointers to nonempty clusters. A search in the hash table for an empty cluster returns $\text{NIL}$, indicating that the cluster is empty.
  • The attribute $V.summary$ is $\text{NIL}$ if all clusters are empty. Otherwise, $V.summary$ points to an RS-vEB tree with universe size $\sqrt u$.

Because the hash table is implemented with a dynamic table, the space it requires is proportional to the number of nonempty clusters.

When we need to insert an element into an empty RS-vEB tree, we create the RS-vEB tree by calling the following procedure, where the parameter u is the universe size of the RS-vEB tree:

    allocate a new vEB tree V
    V.u = u
    V.min = NIL
    V.max = NIL
    V.summary = NIL
    create V.cluster as an empty dynamic hash table
    return V

c. Modify the $\text{VEB-TREE-INSERT}$ procedure to produce pseudocode for the procedure $\text{RS-VEB-TREE-INSERT}(V, x)$, which inserts $x$ into the RS-vEB tree $V$, calling $\text{CREATE-NEW-RS-VEB-TREE}$ as appropriate.

d. Modify the $\text{VEB-TREE-SUCCESSOR}$ procedure to produce pseudocode for the procedure $\text{RS-VEB-TREE-SUCCESSOR}(V, x)$, which returns the successor of $x$ in RS-vEB tree $V$, or $\text{NIL}$ if $x$ has no successor in $V$.

e. Prove that, under the assumption of simple uniform hashing, your $\text{RS-VEBTREE-INSERT}$ and $\text{RS-VEB-TREE-SUCCESSOR}$ procedures run in $O(\lg\lg u)$ expected time.

f. Assuming that elements are never deleted from a vEB tree, prove that the space requirement for the RS-vEB tree structure is $O(n)$, where $n$ is the number of elements actually stored in the RS-vEB tree.

g. RS-vEB trees have another advantage over vEB trees: they require less time to create. How long does it take to create an empty RS-vEB tree?

a. Lets look at what has to be stored for a vEB tree. Each vEB tree contains one vEB tree of size $\sqrt[+]u$ and $\sqrt[+]u$ vEB trees of size $\sqrt[1]u$. It also is storing three numbers each of order $O(u)$, so they need $\Theta(\lg(u))$ space each. Lastly, it needs to store $\sqrt u$ many pointers to the cluster vEB trees. We'll combine these last two contributions which are $\Theta(\lg(u))$ and $\Theta(\sqrt u)$ respectively into a single term that is $\Theta(\sqrt u)$. This gets us the recurrence

$$P(u) = P(\sqrt[+]u) + \sqrt[+]u P(\sqrt[-]u) + \Theta(\sqrt u).$$

Then, we have that $u = 2^{2m}$ (which follows from the assumption that $\sqrt u$ was an integer), this equation becomes

$$ \begin{aligned} P(u) & = (1 + 2^m)P(2^m) + \Theta(\sqrt u) \\ & = (1 + \sqrt u)P(\sqrt u) + \Theta(\sqrt u) \end{aligned} $$

as desired.

b. We recall from our solution to problem 3-6.e (it seems like so long ago now) that given a number $n$, a bound on the number of times that we need to take the squareroot of a number before it falls below $2$ is $\lg\lg n$. So, if we just unroll out recurrence, we get that

$$P(u) \le \Big(\prod_{i = 1}^{\lg\lg u}(u^{1 / 2^i} + 1) \Big) P(2) + \sum_{i = 1}^{\lg\lg u} \Theta(u^{1 / 2^i})(u^{1 / 2i} + 1).$$

The first product has a highest power of $u$ corresponding to always multiplying the first terms of each binomial. The power in this term is equal to $\sum_{i = 1}^{\lg\lg u}$ which is a partial sum of a geometric series whose sum is $1$. This means that the first term is $o(u)$. The order of the ith term in the summation appearing in the formula is $u^{2 / 2^i}$. In particular, for $i = 1$ is it $O(u)$, and for any $i > 1$, we have that $2 / 2^i < 1$, so those terms will be $o(u)$. Putting it all together, the largest term appearing is $O(u)$, and so, $P(u)$ is $O(u)$.

c. For this problem we just use the version written for normal vEB trees, with minor modifications. That is, since there are entries in cluster that may not exist, and summary may of not yet been initialized, just before we try to access either, we check to see if it's initialized. If it isn't, we do so then.

d. As in the previous problem, we just wait until just before either of the two things that may of not been allocated try to get used then allocate them if need be.

e. Since the initialization performed only take constant time, those modifications don't ruin the the desired runtime bound for the original algorithms already had. So, our responses to parts (c) and (d) are $O(\lg\lg n)$.

f. As mentioned in the errata, this part should instead be changed to $O(n\lg n)$ space. When we are adding an element, we may have to add an entry to a dynamic hash table, which means that a constant amount of extra space would be needed. If we are adding an element to that table, we also have to add an element to the RS-vEB tree in the summary, but the entry that we add in the cluster will be a constant size RS-vEB tree. We can charge the cost of that addition to the summary table to the making the minimum element entry that we added in the cluster table. Since we are always making at least one element be added as a new min entry somewhere, this amortization will mean that it is only a constant amount of time in order to store the new entry.

g. It only takes a constant amount of time to create an empty RS-vEB tree. This is immediate since the only dependence on $u$ in $\text{CREATE-NEW-RSvEB-TREE}(u)$ is on line 2 when $V.u$ is initialized, but this only takes a constant amount of time. Since nothing else in the procedure depends on $u$, it must take a constant amount of time.