11.2 Hash tables
11.2-1¶
Suppose we use a hash function $h$ to hash $n$ distinct keys into an array $T$ of length $m$. Assuming simple uniform hashing, what is the expected number of collisions? More precisely, what is the expected cardinality of $\{\{k, l\}: k \ne l \text{ and } h(k) = h(l)\}$?
Under the assumption of simple uniform hashing, we will use linearity of expectation to compute this.
Suppose that all the keys are totally ordered $\{k_1, \dots, k_n\}$. Let $X_i$ be the number of $\ell$'s such that $\ell > k_i$ and $h(\ell) = h(k_i)$. So $X_i$ is the (expected) number of times that key $k_i$ is collided by those keys hashed afterward. Note, that this is the same thing as $\sum_{j > i} \Pr(h(k_j) = h(k_i)) = \sum_{j > i} 1 / m = (n - i) / m$. Then, by linearity of expectation, the number of collisions is the sum of the number of collisions for each possible smallest element in the collision. The expected number of collisions is
$$\sum_{i = 1}^n \frac{n - i}{m} = \frac{n^2 - n(n + 1) / 2}{m} = \frac{n^2 - n}{2m}.$$
11.2-2¶
Demonstrate what happens when we insert the keys $5, 28, 19, 15, 20, 33, 12, 17, 10$ into a hash table with collisions resolved by chaining. Let the table have $9$ slots, and let the hash function be $h(k) = k \mod 9$.
Let us number our slots $0, 1, \dots, 8$.
Then our resulting hash table will look like following:
$$ \begin{array}{c|l} h(k) & \text{keys} \\ \hline 0 \mod 9 & \\ 1 \mod 9 & 10 \to 19 \to 28 \\ 2 \mod 9 & 20 \\ 3 \mod 9 & 12 \\ 4 \mod 9 & \\ 5 \mod 9 & 5 \\ 6 \mod 9 & 33 \to 15 \\ 7 \mod 9 & \\ 8 \mod 9 & 17 \end{array} $$
11.2-3¶
Professor Marley hypothesizes that he can obtain substantial performance gains by modifying the chaining scheme to keep each list in sorted order. How does the professor's modification affect the running time for successful searches, unsuccessful searches, insertions, and deletions?
- Successful searches: no difference, $\Theta(1 + \alpha)$.
- Unsuccessful searches: faster but still $\Theta(1 + \alpha)$.
- Insertions: same as successful searches, $\Theta(1 + \alpha)$.
- Deletions: same as before if we use doubly linked lists, $\Theta(1)$.
11.2-4¶
Suggest how to allocate and deallocate storage for elements within the hash table itself by linking all unused slots into a free list. Assume that one slot can store a flag and either one element plus a pointer or two pointers. All dictionary and free-list operations should run in $O(1)$ expected time. Does the free list need to be doubly linked, or does a singly linked free list suffice?
The flag in each slot of the hash table will be $1$ if the element contains a value, and $0$ if it is free. The free list must be doubly linked.
- Search is unmodified, so it has expected time $O(1)$.
- To insert an element $x$, first check if $T[h(x.key)]$ is free. If it is, delete $T[h(x.key)]$ and change the flag of $T[h(x.key)]$ to $1$. If it wasn't free to begin with, simply insert $x.key$ at the start of the list stored there.
- To delete, first check if $x.prev$ and $x.next$ are $\text{NIL}$. If they are, then the list will be empty upon deletion of $x$, so insert $T[h(x.key)]$ into the free list, update the flag of $T[h(x.key)]$ to $0$, and delete $x$ from the list it's stored in. Since deletion of an element from a singly linked list isn't $O(1)$, we must use a doubly linked list.
- All other operations are $O(1)$.
11.2-5¶
Suppose that we are storing a set of $n$ keys into a hash table of size $m$. Show that if the keys are drawn from a universe $U$ with $|U| > nm$, then $U$ has a subset of size $n$ consisting of keys that all hash to the same slot, so that the worst-case searching time for hashing with chaining is $\Theta(n)$.
Suppose the $m - 1$ slots contains at most $n - 1$ elements, then the remaining slot should have
$$|U| - (m - 1)(n - 1) > nm - (m - 1)(n - 1) = n + m - 1 \ge n$$
elements, thus $U$ has a subset of size $n$.
11.2-6¶
Suppose we have stored $n$ keys in a hash table of size $m$, with collisions resolved by chaining, and that we know the length of each chain, including the length $L$ of the longest chain. Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns it in expected time $O(L \cdot (1 + 1 / \alpha))$.
Choose one of the $m$ spots in the hash table at random. Let $n_k$ denote the number of elements stored at $T[k]$. Next pick a number $x$ from $1$ to $L$ uniformly at random. If $x < n_j$, then return the $x$th element on the list. Otherwise, repeat this process. Any element in the hash table will be selected with probability $1 / mL$, so we return any key with equal probability. Let $X$ be the random variable which counts the number of times we must repeat this process before we stop and $p$ be the probability that we return on a given attempt. Then $E[X] = p(1 + \alpha) + (1 − p)(1 + E[X])$ since we'd expect to take $1 + \alpha$ steps to reach an element on the list, and since we know how many elements are on each list, if the element doesn't exist we'll know right away. Then we have $E[X] = \alpha + 1 / p$. The probability of picking a particular element is $n / mL = \alpha / L$. Therefore, we have
$$ \begin{aligned} E[X] & = \alpha + L / \alpha \\ & = L(\alpha/L + 1/\alpha) \\ & = O(L(1 + 1/\alpha)) \end{aligned} $$
since $\alpha \le L$.