8.4 Bucket sort

8.4-1¶

Using Figure 8.4 as a model, illustrate the operation of $\text{BUCKET-SORT}$ on the array $A = \langle .79, .13, .16, .64, .39, .20, .89, .53, .71, .42 \rangle$.

$$ \begin{array}{cl} R & \\ \hline 0 & \\ 1 & .13 .16 \\ 2 & .20 \\ 3 & .39 \\ 4 & .42 \\ 5 & .53 \\ 6 & .64 \\ 7 & .79 .71 \\ 8 & .89 \\ 9 & \\ \end{array} $$

$$A = \langle.13, .16, .20, .39, .42, .53, .64, .71, .79, .89 \rangle.$$

8.4-2¶

Explain why the worst-case running time for bucket sort is $\Theta(n^2)$. What simple change to the algorithm preserves its linear average-case running time and makes its worst-case running time $O(n\lg n)$?

If all the keys fall in the same bucket and they happen to be in reverse order, we have to sort a single bucket with $n$ items in reversed order with insertion sort. This is $\Theta(n^2)$.

We can use merge sort or heapsort to improve the worst-case running time. Insertion sort was chosen because it operates well on linked lists, which has optimal time and requires only constant extra space for short linked lists. If we use another sorting algorithm, we have to convert each list to an array, which might slow down the algorithm in practice.

8.4-3¶

Let $X$ be a random variable that is equal to the number of heads in two flips of a fair coin. What is $\text E[X^2]$? What is $\text E^2[X]$?

$$ \begin{aligned} \text E[X] & = 2 \cdot \frac{1}{4} + 1 \cdot \frac{1}{2} + 0 \cdot \frac{1}{4} = 1 \\ \text E[X^2] & = 4 \cdot \frac{1}{4} + 1 \cdot \frac{1}{2} + 0 \cdot \frac{1}{4} = 1.5 \\ \text E^2[X] & = \text E[X] \cdot \text E[X] = 1 \cdot 1 = 1. \end{aligned} $$

8.4-4 $\star$¶

We are given $n$ points in the unit circle, $p_i = (x_i, y_i)$, such that $0 < x_i^2 + y_i^2 \le 1$ for $i = 1, 2, \ldots, n$. Suppose that the points are uniformly distributed; that is, the probability of finding a point in any region of the circle is proportional to the area of that region. Design an algorithm with an average-case running time of $\Theta(n)$ to sort the $n$ points by their distances $d_i = \sqrt{x_i^2 + y_i^2}$ from the origin. ($\textit{Hint:}$ Design the bucket sizes in $\text{BUCKET-SORT}$ to reﬂect the uniform distribution of the points in the unit circle.)

Bucket sort by radius,

$$ \begin{aligned} \pi r_i^2 & = \frac{i}{n} \cdot \pi 1^2 \\ r_i & = \sqrt{\frac{i}{n}}. \end{aligned} $$

8.4-5 $\star$¶

A probability distribution function $P(x)$ for a random variable $X$ is defined by $P(x) = \Pr\{X \le x\}$. Suppose that we draw a list of $n$ random variables $X_1, X_2, \ldots, X_n$ from a continuous probability distribution function $P$ that is computable in $O(1)$ time. Give an algorithm that sorts these numbers in linear average-case time.

Bucket sort by $p_i$, so we have $n$ buckets: $[p_0, p_1), [p_1, p_2), \dots, [p_{n - 1}, p_n)$. Note that not all buckets are the same size, which is ok as to ensure linear run time, the inputs should on average be uniformly distributed amongst all buckets, of which the intervals defined with $p_i$ will do so.

$p_i$ is defined as follows:

$$P(p_i) = \frac{i}{n}.$$