20-2 $y$-fast tries

This problem investigates D. Willard's "$y$-fast tries" which, like van Emde Boas trees, perform each of the operations $\text{MEMBER}$, $\text{MINIMUM}$, $\text{MAXIMUM}$, $\text{PREDECESSOR}$, and $\text{SUCCESSOR}$ on elements drawn from a universe with size $u$ in $O(\lg\lg u)$ worst-case time. The $\text{INSERT}$ and $\text{DELETE}$ operations take $O(\lg\lg u)$ amortized time. Like reduced-space van Emde Boas trees (see Problem 20-1), $y$-fast tries use only $O(n)$ space to store $n$ elements. The design of $y$-fast tries relies on perfect hashing (see Section 11.5).

As a preliminary structure, suppose that we create a perfect hash table containing not only every element in the dynamic set, but every prefix of the binary representation of every element in the set. For example, if $u = 16$, so that $\lg u = 4$, and $x = 13$ is in the set, then because the binary representation of $13$ is $1101$, the perfect hash table would contain the strings $1$, $11$, $110$, and $1101$. In addition to the hash table, we create a doubly linked list of the elements currently in the set, in increasing order.

a. How much space does this structure require?

b. Show how to perform the $\text{MINIMUM}$ and $\text{MAXIMUM}$ operations in $O(1)$ time; the $\text{MEMBER}$, $\text{PREDECESSOR}$, and $\text{SUCCESSOR}$ operations in $O(\lg\lg u)$ time; and the $\text{INSERT}$ and $\text{DELETE}$ operations in $O(\lg u)$ time.

To reduce the space requirement to $O(n)$, we make the following changes to the data structure:

  • We cluster the $n$ elements into $n / \lg u$ groups of size $\lg u$. (Assume for now that $\lg u$ divides $n$.) The first group consists of the $\lg u$ smallest elements in the set, the second group consists of the next $\lg u$ smallest elements, and so on.
  • We designate a "representative" value for each group. The representative of the $i$th group is at least as large as the largest element in the $i$th group, and it is smaller than every element of the $(i + 1)$st group. (The representative of the last group can be the maximum possible element $u - 1$.) Note that a representative might be a value not currently in the set.
  • We store the $\lg u$ elements of each group in a balanced binary search tree, such as a red-black tree. Each representative points to the balanced binary search tree for its group, and each balanced binary search tree points to its group's representative.

The perfect hash table stores only the representatives, which are also stored in a doubly linked list in increasing order.

We call this structure a $y$-fast trie.

c. Show that a $y$-fast trie requires only $O(n)$ space to store $n$ elements.

d. Show how to perform the $\text{MINIMUM}$ and $\text{MAXIMUM}$ operations in $O(\lg\lg u)$ time with a $y$-fast trie.

e. Show how to perform the $\text{MEMBER}$ operation in $O(\lg\lg u)$ time.

f. Show how to perform the $\text{PREDECESSOR}$ and $\text{SUCCESSOR}$ operations in $O(\lg\lg u)$ time.

g. Explain why the $\text{INSERT}$ and $\text{DELETE}$ operations take $\Omega(\lg\lg u)$ time.

h. Show how to relax the requirement that each group in a $y$-fast trie has exactly $\lg u$ elements to allow $\text{INSERT}$ and $\text{DELETE}$ to run in $O(\lg\lg u)$ amortized time without affecting the asymptotic running times of the other operations.

a. By 11.5, the perfect hash table uses $O(m)$ space to store m elements. In a universe of size $u$, each element contributes $\lg u$ entries to the hash table, so the requirement is $O(n\lg u)$. Since the linked list requires $O(n)$, the total space requirement is $O(n\lg u)$.

b. $\text{MINIMUM}$ and $\text{MAXIMUM}$ are easy. We just examine the first and last elements of the associated doubly linked list. $\text{MEMBER}$ can actually be performed in $O(1)$, since we are simply checking membership in a perfect hash table. $\text{PREDECESSOR}$ and $\text{SUCCESSOR}$ are a bit more complicated.

Assume that we have a binary tree in which we store all the elements and their prefixes. When we query the hash table for an element, we get a pointer to that element's location in the binary search tree, if the element is in the tree, and $\text{NIL}$ otherwise. Moreover, assume that every leaf node comes with a pointer to its position in the doubly linked list. Let $x$ be the number whose successor we seek. Begin by performing a binary search of the prefixes in the hash table to find the longest hashed prefix $y$ which matches a prefix of $x$. This takes $O(\lg\lg u)$ since we can check if any prefix is in the hash table in $O(1)$.

Observe that $y$ can have at most one child in the BST, because if it had both children then one of these would share a longer prefix with $x$. If the left child is missing, have the left child pointer point to the largest labeled leaf node in the BST which is less than $y$. If the right child is missing, use its pointer to point to the successor of $y$. If $y$ is a leaf node then $y = x$, so we simply follow the pointer to $x$ in the doubly linked list, in $O(1)$, and its successor is the next element on the list. If $y$ is not a leaf node, we follow its predecessor or successor node, depending on which we need. This gives us $O(1)$ access to the proper element, so the total runtime is $O(\lg\lg u)$. $\text{INSERT}$ and $\text{DELETE}$ must take $O(\lg u)$ since we need to insert one entry into the hash table for each of their bits and update the pointers.

c. The doubly linked list has less than $n$ elements, while the binary search trees contains $n$ nodes, thus a $y$-fast trie requires $O(n)$ space.

d. $\text{MINIMUM}$: Find the minimum representative in the doubly linked list in $\Theta(1)$, then find the minimum element in the binary search tree in $O(\lg\lg u)$.

e. Find the smallest representative greater than $k$ with binary searching in $\Theta(\lg\lg u)$, find the element in the binary search tree in $O(\lg\lg u)$.

f. If we can find the largest representative greater than or equal to $x$, we can determine which binary tree contains the predecessor or successor of $x$. To do this, just call $\text{PREDECESSOR}$ or $\text{SUCCESSOR}$ on $x$ to locate the appropriate tree in $O(\lg\lg u)$. Since the tree has height $\lg u$, we can find the predecessor or successor in $O(\lg\lg u)$.

g. Same as e, we need to find the cluster in $\Theta(\lg\lg u)$, then the operations in the binary search tree takes $O(\lg\lg u)$.

h. We can relax the requirements and only impose the condition that each group has at least $\frac{1}{2}\lg u$ elements and at most $2\lg u$ elements.

  • If a red-black tree is too big, we split it in half at the median.
  • If a red-black tree is too small, we merge it with a neighboring tree.
  • If this causes the merged tree to become too large, we split it at the median.
  • If a tree splits, we create a new representative.
  • If two trees merge, we delete the lost representative.

Any split or merge takes $O(\lg u)$ since we have to insert or delete an element in the data structure storing our representatives, which by part (b) takes $O(\lg u)$.

However, we only split a tree after at least $\lg u$ insertions, since the size of one of the red-black trees needs to increase from $\lg u$ to $2\lg u$ and we only merge two trees after at least $(1 / 2)\lg u$ deletions, because the size of the merging tree needs to have decreased from $\lg u$ to $(1 / 2)\lg u$. Thus, the amortized cost of the merges, splits, and updates to representatives is $O(1)$ per insertion or deletion, so the amortized cost is $O(\lg\lg u)$ as desired.