5-2 Searching an unsorted array

The problem examines three algorithms for searching for a value xx in an unsorted array AA consisting for nn elements.

Consider the following randomized strategy: pick a random index ii into AA. If A[i]=xA[i] = x, then we terminate; otherwise, we continue the search by picking a new random index into AA. We continue picking random indices into AA until we find an index jj such that A[j]=xA[j] = x or until we have checked every element of AA. Note that we pick from the whole set of indices each time, so that we may examine a given element more than once.

a. Write pseudocode for a procedure RANDOM-SEARCH\text{RANDOM-SEARCH} to implement the strategy above. Be sure that your algorithm terminates when all indices into AA have been picked.

b. Suppose that there is exactly one index ii such that A[i]=xA[i] = x. What is the expected number of indices into AA that we must pick before we find xx and RANDOM-SEARCH\text{RANDOM-SEARCH} terminates?

c. Generalizing your solution to part (b), suppose that there are k1k \ge 1 indices ii such that A[i]=xA[i] = x. What is the expected number of indices into AA that we must pick before we find xx and RANDOM-SEARCH\text{RANDOM-SEARCH} terminates? Your answer should be a function of nn and kk.

d. Suppose that there are no indices ii such that A[i]=xA[i] = x. What is the expected number of indices into AA that we must pick before we have checked all elements of AA and RANDOM-SEARCH\text{RANDOM-SEARCH} terminates?

Now consider a deterministic linear search algorithm, which we refer to as DETERMINISTIC-SEARCH\text{DETERMINISTIC-SEARCH}. Specifically, the algorithm searches AA for xx in order, considering A[1],A[2],A[3],,A[n]A[1], A[2], A[3], \ldots, A[n] until either it finds A[i]=xA[i] = x or it reaches the end of the array. Assume that possible permutations of the input array are equally likely.

e. Suppose that there is exactly one index ii such that A[i]=xA[i] = x. What is the average-case running time of DETERMINISTIC-SEARCH\text{DETERMINISTIC-SEARCH}? What is the worst-case running time of DETERMINISTIC-SEARCH\text{DETERMINISTIC-SEARCH}?

f. Generalizing your solution to part (e), suppose that there are k1k \ge 1 indices ii such that A[i]=xA[i] = x. What is the average-case running time of DETERMINISTIC-SEARCH\text{DETERMINISTIC-SEARCH}? What is the worst-case running time of DETERMINISTIC-SEARCH\text{DETERMINISTIC-SEARCH}? Your answer should be a function of nn and kk.

g. Suppose that there are no indices ii such that A[i]=xA[i] = x. What is the average-case running time of DETERMINISTIC-SEARCH\text{DETERMINISTIC-SEARCH}? What is the worst-case running time of DETERMINISTIC-SEARCH\text{DETERMINISTIC-SEARCH}?

Finally, consider a randomized algorithm SCRAMBLE-SEARCH\text{SCRAMBLE-SEARCH} that works by first randomly permuting the input array and then running the deterministic linear search given above on the resulting permuting array.

h. Letting kk be the number of indices ii such that A[i]=xA[i] = x, give the worst-case and expected running time of SCRAMBLE-SEARCH\text{SCRAMBLE-SEARCH} for the cases in which k=0k = 0 and k=1k = 1. Generalizing your solution to handle the case in which k1k \ge 1.

i. Which of the three searching algorithms would you use? Explain your answer.

a.

RANDOM-SEARCH(x, A, n)
    v = Ø
    while |Ø| != n
        i = RANDOM(1, n)
        if A[i] = x
            return i
        else
            v = v  i
    return NIL

vv can be implemented in multiple ways: a hash table, a tree or a bitmap. The last one would probabily perform best and consume the least space.

b. RANDOM-SEARCH\text{RANDOM-SEARCH} is well-modelled by Bernoulli trials. The expected number of picks is nn.

c. In similar fashion, the expected number of picks is n/kn / k.

d. This is modelled by the balls and bins problem, explored in section 5.4.2. The answer is n(lnn+O(1))n(\ln n + O(1)).

e. The worst-case running time is nn. The average-case is (n+1)/2(n + 1) / 2 (obviously).

f. The worst-case running time is nk+1n - k + 1. The average-case running time is (n+1)/(k+1)(n + 1) / (k + 1). Let XiX_i be an indicator random variable that the iith element is a match. Pr{Xi}=1/(k+1)\Pr\{X_i\} = 1 / (k + 1). Let YY be an indicator random variable that we have found a match after the first nk+1n - k + 1 elements (Pr{Y}=1\Pr\{Y\} = 1). Thus,

E[X]=E[X1+X2++Xnk+Y]=1+i=1nkE[Xi]=1+nkk+1=n+1k+1. \begin{aligned} \text E[X] & = \text E[X_1 + X_2 + \ldots + X_{n - k} + Y] \\ & = 1 + \sum_{i = 1}^{n - k}\text E[X_i] = 1 + \frac{n - k}{k + 1} \\ & = \frac{n + 1}{k + 1}. \end{aligned}

g. Both the worst-case and average case is nn.

h. It's the same as DETERMINISTIC-SEARCH\text{DETERMINISTIC-SEARCH}, only we replace "average-case" with "expected".

i. Definitelly DETERMINISTIC-SEARCH\text{DETERMINISTIC-SEARCH}. SCRAMBLE-SEARCH\text{SCRAMBLE-SEARCH} gives better expected results, but for the cost of randomly permuting the array, which is a linear operation. In the same time we could have scanned the full array and reported a result.