10-3 Searching a sorted compact list

Exercise 10.3-4 asked how we might maintain an nn-element list compactly in the first nn positions of an array. We shall assume that all keys are distinct and that the compact list is also sorted, that is, key[i]<key[next[i]]key[i] < key[next[i]] for all i=1,2,,ni = 1, 2, \ldots, n such that next[i]NILnext[i] \ne \text{NIL}. We will also assume that we have a variable LL that contains the index of the first element on the list. Under these assumptions, you will show that we can use the following randomized algorithm to search the list in O(n)O(\sqrt n) expected time.

COMPACT-LIST-SEARCH(L, n, k)
    i = L
    while i != NIL and key[i] < k
        j = RANDOM(1, n)
        if key[i] < key[j] and key[j]  k
            i = j
            if key[i] == k
                return i
        i = next[i]
    if i == NIL or key[i] > k
        return NIL
    else return i

If we ignore lines 3–7 of the procedure, we have an ordinary algorithm for searching a sorted linked list, in which index ii points to each position of the list in turn. The search terminates once the index ii "falls off" the end of the list or once key[i]kkey[i] \ge k. In the latter case, if key[i]=kkey[i] = k, clearly we have found a key with the value kk. If, however, key[i]>kkey[i] > k, then we will never find a key with the value kk, and so terminating the search was the right thing to do.

Lines 3–7 attempt to skip ahead to a randomly chosen position jj. Such a skip benefits us if key[j]key[j] is larger than key[i]key[i] and no larger than kk; in such a case, jj marks a position in the list that ii would have to reach during an ordinary list search. Because the list is compact, we know that any choice of jj between 11 and nn indexes some object in the list rather than a slot on the free list.

Instead of analyzing the performance of COMPACT-LIST-SEARCH\text{COMPACT-LIST-SEARCH} directly, we shall analyze a related algorithm, COMPACT-LIST-SEARCH\text{COMPACT-LIST-SEARCH}', which executes two separate loops. This algorithm takes an additional parameter tt which determines an upper bound on the number of iterations of the first loop.

COMPACT-LIST-SEARCH'(L, n, k, t)
    i = L
    for q = 1 to t
        j = RANDOM(1, n)
        if key[i] < key[j] and key[j]  k
            i = j
            if key[i] == k
                return i
    while i != NIL and key[i] < k
        i = next[i]
    if i == NIL or key[i] > k
        return NIL
    else return i

To compare the execution of the algorithms COMPACT-LIST-SEARCH(L,n,k)\text{COMPACT-LIST-SEARCH}(L, n, k) and COMPACT-LIST-SEARCH(L,n,k,t)\text{COMPACT-LIST-SEARCH}'(L, n, k, t), assume that the sequence of integers returned by the calls of RANDOM(1,n)\text{RANDOM}(1, n) is the same for both algorithms.

a. Suppose that COMPACT-LIST-SEARCH(L,n,k)\text{COMPACT-LIST-SEARCH}(L, n, k) takes tt iterations of the while loop of lines 2–8. Argue that COMPACT-LIST-SEARCH(L,n,k,t)\text{COMPACT-LIST-SEARCH}'(L, n, k, t) returns the same answer and that the total number of iterations of both the for and while loops within COMPACT-LIST-SEARCH\text{COMPACT-LIST-SEARCH}' is at least tt.

In the call COMPACT-LIST-SEARCH(L,n,k,t)\text{COMPACT-LIST-SEARCH}'(L, n, k, t), let XtX_t be the random variable that describes the distance in the linked list (that is, through the chain of nextnext pointers) from position ii to the desired key kk after tt iterations of the for loop of lines 2–7 have occurred.

b. Argue that the expected running time of COMPACT-LIST-SEARCH(L,n,k,t)\text{COMPACT-LIST-SEARCH}'(L, n, k, t) is O(t+E[Xt])O(t + \text E[X_t]).

c. Show that E[Xt]r=1n(1r/n)t\text E[X_t] \le \sum_{r = 1}^n (1 - r / n)^t. (Hint:\textit{Hint:} Use equation (C.25)\text{(C.25)}.)

d. Show that r=0n1rtnt+1/(t+1)\sum_{r = 0}^{n - 1} r^t \le n^{t + 1} / (t + 1).

e. Prove that E[Xt]n/(t+1)\text E[X_t] \le n / (t + 1).

f. Show that COMPACT-LIST-SEARCH(L,n,k,t)\text{COMPACT-LIST-SEARCH}'(L, n, k, t) runs in O(t+n/t)O(t + n / t) expected time.

g. Conclude that COMPACT-LIST-SEARCH\text{COMPACT-LIST-SEARCH} runs in O(n)O(\sqrt n) expected time.

h. Why do we assume that all keys are distinct in COMPACT-LIST-SEARCH\text{COMPACT-LIST-SEARCH}? Argue that random skips do not necessarily help asymptotically when the list contains repeated key values.

a. If the original version of the algorithm takes only tt iterations, then, we have that it was only at most t random skips though the list to get to the desired value, since each iteration of the original while loop is a possible random jump followed by a normal step through the linked list.

b. The for loop on lines 2–7 will get run exactly tt times, each of which is constant runtime. After that, the while loop on lines 8–9 will be run exactly XtX_t times. So, the total runtime is O(t+E[Xt])O(t + \text E[X_t]).

c. Using equation C.25\text{C.25}, we have that E[Xt]=i=1Pr{Xti}\text E[X_t] = \sum_{i = 1}^\infty \Pr\{X_t \ge i\}. So, we need to show that Pr{Xti}(1i/n)t\Pr\{X_t \ge i\} \le (1 - i / n)^t. This can be seen because having XtX_t being greater than ii means that each random choice will result in an element that is either at least ii steps before the desired element, or is after the desired element. There are nin - i such elements, out of the total nn elements that we were pricking from. So, for a single one of the choices to be from such a range, we have a probability of (ni)/n=(1i/n)(n - i) / n = (1 - i / n). Since each of the selections was independent, the total probability that all of them were is (1i/n)t(1 - i / n)^t, as desired. Lastly, we can note that since the linked list has length nn, the probability that XtX_t is greater than nn is equal to zero.

d. Since we have that t>0t > 0, we know that the function f(x)=xtf(x) = x^t is increasing, so, that means that xtf(x)\lfloor x \rfloor^t \le f(x). So,

r=0n1rt=0nrtdr0nf(r)dr=nt+1t+1.\sum_{r = 0}^{n - 1} r^t = \int_0^n \lfloor r \rfloor^t dr \le \int_0^n f(r)dr = \frac{n^{t + 1}}{t + 1}.

e.

E[Xt]r=1n(1r/n)tfrom part (c)=r=1n(nr)tnt=1ntr=1n(nr)t, \begin{aligned} \text E[X_t] & \le \sum_{r = 1}^n (1 - r / n)^t & \text{from part (c)} \\ & = \sum_{r = 1}^n \frac{(n - r)^t}{n^t} \\ & = \frac{1}{n^t} \sum_{r = 1}^n (n - r)^t, \end{aligned}

and

r=1n(nr)t=(n1)t+(n2)t++1t+0t=r=0n1rt. \begin{aligned} \sum_{r = 1}^n (n - r)^t & = (n - 1)^t + (n - 2)^t + \cdots + 1^t + 0^t \\ & = \sum_{r = 0}^{n - 1} r^t. \end{aligned}

So,

E[Xt]=1ntr=0n1rt1ntnt+1t+1from part (d)=nt+1. \begin{aligned} \text E[X_t] & = \frac{1}{n^t} \sum_{r = 0}^{n - 1} r^t \\ & \le \frac{1}{n^t} \cdot \frac{n^{t + 1}}{t + 1} & \text{from part (d)} \\ & = \frac{n}{t + 1}. \end{aligned}

f. We just put together parts (b) and (e) to get that it runs in time O(t+n/(t+1))O(t + n / (t + 1)). But, this is the same as O(t+n/t)O(t + n / t).

g. Since we have that for any number of iterations tt that the first algorithm takes to find its answer, the second algorithm will return it in time O(t+n/t)O(t + n / t). In particular, if we just have that t=nt = \sqrt n. The second algorithm takes time only O(n)O(\sqrt n). This means that tihe first list search algorithm is O(n)O(\sqrt n) as well.

h. If we don't have distinct key values, then, we may randomly select an element that is further along than we had been before, but not jump to it because it has the same key as what we were currently at. The analysis will break when we try to bound the probability that XtiX_t \ge i.