5-2 Searching an unsorted array

The problem examines three algorithms for searching for a value $x$ in an unsorted array $A$ consisting for $n$ elements.

Consider the following randomized strategy: pick a random index $i$ into $A$. If $A[i] = x$, then we terminate; otherwise, we continue the search by picking a new random index into $A$. We continue picking random indices into $A$ until we find an index $j$ such that $A[j] = x$ or until we have checked every element of $A$. Note that we pick from the whole set of indices each time, so that we may examine a given element more than once.

a. Write pseudocode for a procedure $\text{RANDOM-SEARCH}$ to implement the strategy above. Be sure that your algorithm terminates when all indices into $A$ have been picked.

b. Suppose that there is exactly one index $i$ such that $A[i] = x$. What is the expected number of indices into $A$ that we must pick before we find $x$ and $\text{RANDOM-SEARCH}$ terminates?

c. Generalizing your solution to part (b), suppose that there are $k \ge 1$ indices $i$ such that $A[i] = x$. What is the expected number of indices into $A$ that we must pick before we find $x$ and $\text{RANDOM-SEARCH}$ terminates? Your answer should be a function of $n$ and $k$.

d. Suppose that there are no indices $i$ such that $A[i] = x$. What is the expected number of indices into $A$ that we must pick before we have checked all elements of $A$ and $\text{RANDOM-SEARCH}$ terminates?

Now consider a deterministic linear search algorithm, which we refer to as $\text{DETERMINISTIC-SEARCH}$. Specifically, the algorithm searches $A$ for $x$ in order, considering $A[1], A[2], A[3], \ldots, A[n]$ until either it finds $A[i] = x$ or it reaches the end of the array. Assume that possible permutations of the input array are equally likely.

e. Suppose that there is exactly one index $i$ such that $A[i] = x$. What is the average-case running time of $\text{DETERMINISTIC-SEARCH}$? What is the worst-case running time of $\text{DETERMINISTIC-SEARCH}$?

f. Generalizing your solution to part (e), suppose that there are $k \ge 1$ indices $i$ such that $A[i] = x$. What is the average-case running time of $\text{DETERMINISTIC-SEARCH}$? What is the worst-case running time of $\text{DETERMINISTIC-SEARCH}$? Your answer should be a function of $n$ and $k$.

g. Suppose that there are no indices $i$ such that $A[i] = x$. What is the average-case running time of $\text{DETERMINISTIC-SEARCH}$? What is the worst-case running time of $\text{DETERMINISTIC-SEARCH}$?

Finally, consider a randomized algorithm $\text{SCRAMBLE-SEARCH}$ that works by first randomly permuting the input array and then running the deterministic linear search given above on the resulting permuting array.

h. Letting $k$ be the number of indices $i$ such that $A[i] = x$, give the worst-case and expected running time of $\text{SCRAMBLE-SEARCH}$ for the cases in which $k = 0$ and $k = 1$. Generalizing your solution to handle the case in which $k \ge 1$.

i. Which of the three searching algorithms would you use? Explain your answer.

a.

RANDOM-SEARCH(x, A, n)
    v = Ø                     // a set (or bitmap, etc.) of visited indices
    while |v| < n:
        i = RANDOM(1, n)
        if i ∉ v:             // only use i if it hasn't been picked before
            if A[i] = x:
                return i
            else:
                v = v ∪ {i}
    return NIL

$v$ can be implemented in multiple ways: a hash table, a tree or a bitmap. The last one would probabily perform best and consume the least space.

b. $\text{RANDOM-SEARCH}$ is well-modelled by Bernoulli trials. The expected number of picks is $n$.

c. In similar fashion, the expected number of picks is $n / k$.

d. This is modelled by the balls and bins problem, explored in section 5.4.2. The answer is $n(\ln n + O(1))$.

e. The worst-case running time is $n$. The average-case is $(n + 1) / 2$ (obviously).

f. The worst-case running time is $n - k + 1$. The average-case running time is $(n + 1) / (k + 1)$. Let $X_i$ be an indicator random variable that the $i$th element is a match. $\Pr\{X_i\} = 1 / (k + 1)$. Let $Y$ be an indicator random variable that we have found a match after the first $n - k + 1$ elements ($\Pr\{Y\} = 1$). Thus,

$$ \begin{aligned} \text E[X] & = \text E[X_1 + X_2 + \ldots + X_{n - k} + Y] \\ & = 1 + \sum_{i = 1}^{n - k}\text E[X_i] = 1 + \frac{n - k}{k + 1} \\ & = \frac{n + 1}{k + 1}. \end{aligned} $$

g. Both the worst-case and average case is $n$.

h. It's the same as $\text{DETERMINISTIC-SEARCH}$, only we replace "average-case" with "expected".

i. Definitelly $\text{DETERMINISTIC-SEARCH}$. $\text{SCRAMBLE-SEARCH}$ gives better expected results, but for the cost of randomly permuting the array, which is a linear operation. In the same time we could have scanned the full array and reported a result.