32-1 String matching based on repetition factors

Let $y^i$ denote the concatenation of string $y$ with itself $i$ times. For example, $(\text{ab})^3 = \text{ababab}$. We say that a string $x \in \Sigma^*$ has repetition factor $r$ if $x = y ^ r$ for some string $y \in \Sigma^*$ and some $r > 0$. Let $\rho(x)$ denote the largest $r$ such that $x$ has repetition factor $r$.

a. Give an efficient algorithm that takes as input a pattern $P[1 \ldots m]$ and computes the value $\rho(P_i)$ for $i = 1, 2, \ldots, m$. What is the running time of your algorithm?

b. For any pattern $P[1 \ldots m]$, let $\rho^*(P)$ be defined as $\max_{1 \le i \le m} \rho(P_i)$. Prove that if the pattern $P$ is chosen randomly from the set of all binary strings of length $m$, then the expected value of $\rho^*(P)$ is $O(1)$.

c. Argue that the following string-matching algorithm correctly finds all occurrences of pattern $P$ in a text $T[1 \ldots n]$ in time $O(\rho^*(P)n + m)$:

REPETITION_MATCHER(P, T)
    m = P.length
    n = T.length
    k = 1 + ρ*(P)
    q = 0
    s = 0
    while s  n - m
        if T[s + q + 1] == P[q + 1]
            q = q + 1
            if q == m
                print "Pattern occurs with shift" s
        if q == m or T[s + q + 1] != P[q + 1]
            s = s + max(1, ceil(q / k))
            q = 0

This algorithm is due to Galil and Seiferas. By extending these ideas greatly, they obtained a linear-time string-matching algorithm that uses only $O(1)$ storage beyond what is required for $P$ and $T$.

a. Compute $\pi$, let $l = m - \pi[m]$, if $m ~\text{mod}~ l = 0$ and for all $p = m - i \cdot l > 0$, $p - \pi[p] = l$, then $\rho(P_i) = m / l$, otherwise $\rho(P_i) = 1$. The running time is $\Theta(n)$.

b.

$$ \begin{aligned} P(\rho^*(P) \ge 2) & = \frac{1}{2} + \frac{1}{8} + \frac{1}{32} + \cdots \approx \frac{2}{3} \\ P(\rho^*(P) \ge 3) & = \frac{1}{4} + \frac{1}{32} + \frac{1}{256} + \cdots \approx \frac{2}{7} \\ P(\rho^*(P) \ge 4) & = \frac{1}{8} + \frac{1}{128} + \frac{1}{2048} + \cdots \approx \frac{2}{15} \\ P(\rho^*(P) = 1) & = \frac{1}{3} \\ P(\rho^*(P) = 2) & = \frac{8}{21} \\ P(\rho^*(P) = 3) & = \frac{16}{105} \\ \text E[\rho^*(P)] & = 1 \cdot \frac{1}{3} + 2 \cdot \frac{8}{21} + 3 \cdot \frac{16}{105} + \ldots \approx 2.21 \end{aligned} $$

c.

(Omit!)