7-3 Alternative quicksort analysis

An alternative analysis of the running time of randomized quicksort focuses on the expected running time of each individual recursive call to $\text{RANDOMIZED-QUICKSORT}$, rather than on the number of comparisons performed.

a. Argue that, given an array of size $n$, the probability that any particular element is chosen as the pivot is $1 / n$. Use this to define indicator random variables

$$X_i = I\{i\text{th smallest element is chosen as the pivot}\}.$$

What is $\text E[X_i]$?

b. Let $T(n)$ be a random variable denoting the running time of quicksort on an array of size $n$. Argue that

$$\text E[T(n)] = \text E\bigg[\sum_{q = 1}^n X_q(T(q - 1) + T(n - q) + \Theta(n))\bigg]. \tag{7.5}$$

c. Show that we can rewrite equation $\text{(7.5)}$ as

$$\text E[T(n)] = \frac{2}{n}\sum_{q = 2}^{n - 1}\text E[T(q)] + \Theta(n). \tag{7.6}$$

d. Show that

$$\sum_{k = 2}^{n - 1}k\lg k \le \frac{1}{2}n^2\lg n - \frac{1}{8}n^2. \tag{7.7}$$

($\textit{Hint:}$ Split the summation into two parts, one for $k = 2, 3, \ldots, \lceil n / 2 \rceil - 1$ and one for $k = \lceil n / 2 \rceil, \ldots, n - 1$.)

e. Using the bound from equation $\text{(7.7)}$, show that the recurrence in equation $\text{(7.6)}$ has the solution $\text E[T(n)] = \Theta(n\lg n)$. ($\textit{Hint:}$ Show, by substitution, that $\text E[T(n)] \le an\lg n$ for sufficiently large $n$ and for some positive constant $a$.)

a. Since the pivot is selected as a random element in the array, which has size $n$, the probabilities of any particular element being selected are all equal, and add to one, so, are all $\frac{1}{n}$. As such, $\text E[X_i] = \Pr\{i \text{ smallest is picked}\} = \frac{1}{n}$.

b. We can apply linearity of expectation over all of the events $X_i$. Suppose we have a particular $X_i$ be true, then, we will have one of the sub arrays be length $i - 1$, and the other be $n - i$, and will of course still need linear time to run the partition procedure. This corresponds exactly to the summand in equation $\text{(7.5)}$.

c.

$$ \begin{aligned} & \text E\Bigg[\sum_{q = 1}^n X_q(T(q - 1) + T(n - q) + \Theta(n)) \Bigg] \\ & = \sum_{q = 1}^n \text E[X_q(T(q - 1) + T(n - q) + \Theta(n))] \\ & = \sum_{q = 1}^n(T(q - 1) + T(n - q) + \Theta(n))/n \\ & = \Theta(n) + \frac{1}{n} \sum_{q = 1}^n(T(q - 1)+T(n - 1)) \\ & = \Theta(n) + \frac{1}{n} \Big(\sum_{q = 1}^n T(q - 1) + \sum_{q = 1}^n T (n - q) \Big) \\ & = \Theta(n) + \frac{1}{n} \Big(\sum_{q = 1}^n T(q - 1) + \sum_{q = 1}^n T (q - 1) \Big) \\ & = \Theta(n) + \frac{2}{n} \sum_{q = 1}^n T(q - 1) \\ & = \Theta(n) + \frac{2}{n} \sum_{q = 0}^{n - 1} T(q) \\ & = \Theta(n) + \frac{2}{n} \sum_{q = 2}^{n - 1} T(q). \end{aligned} $$

d. We will prove this inequality in a different way than suggested by the hint. If we let $f(k) = k\lg k$ treated as a continuous function, then $f'(k) = \lg k + 1$. Note now that the summation written out is the left hand approximation of the integral of $f(k)$ from $2$ to $n$ with step size $1$. By integration by parts, the anti-derivative of $k\lg k$ is

$$\frac{1}{\lg 2}(\frac{k^2}{2}\ln k-\frac{k^2}{4}).$$

So, plugging in the bounds and subtracting, we get $\frac{n^2\lg n}{2} - \frac{n^2}{4\ln 2} - 1$. Since $f$ has a positive derivative over the entire interval that the integral is being evaluated over, the left hand rule provides a underapproximation of the integral, so, we have that

$$ \begin{aligned} \sum_{k = 2}^{n - 1} k\lg k & \le \frac{n^2\lg n}{2} - \frac{n^2}{4\ln 2} - 1 \\ & \le \frac{n^2\lg n}{2} - \frac{n^2}{8}, \end{aligned} $$

where the last inequality uses the fact that $\ln 2 > 1 / 2$.

e. Assume by induction that $T(q) \le q \lg(q) + \Theta(n)$. Combining $\text{(7.6)}$ and $\text{(7.7)}$, we have

$$ \begin{aligned} \text E[T(n)] & = \frac{2}{n} \sum_{q = 2}^{n - 1} \text E[T(q)] + \Theta(n) \\ & \le \frac{2}{n} \sum_{q = 2}^{n - 1}(q\lg q + \Theta(n)) + \Theta(n) \\ & \le \frac{2}{n} \sum_{q = 2}^{n - 1}q\lg q + \frac{2}{n}\Theta(n) + \Theta(n) \\ & \le \frac{2}{n}(\frac{1}{2}n^2\lg n - \frac{1}{8}n^2) + \Theta(n) \\ & = n\lg n -\frac{1}{4}n + \Theta(n) \\ & = n\lg n+\Theta(n). \end{aligned} $$