26-6 The Hopcroft-Karp bipartite matching algorithm

In this problem, we describe a faster algorithm, due to Hopcroft and Karp, for $p$ finding a maximum matching in a bipartite graph. The algorithm runs in $O(\sqrt V E)$ time. Given an undirected, bipartite graph $G = (V, E)$, where $V = L \cup R$ and all edges have exactly one endpoint in $L$, let $M$ be a matching in $G$. We say that a simple path $P$ in $G$ is an augmenting path with respect to $M$ if it starts at an unmatched vertex in $L$, ends at an unmatched vertex in $R$, and its edges belong alternately to $M$ and $E - M$. (This definition of an augmenting path is related to, but different from, an augmenting path in a flow network.) In this problem, we treat a path as a sequence of edges, rather than as a sequence of vertices. A shortest augmenting path with respect to a matching $M$ is an augmenting path with a minimum number of edges.

Given two sets $A$ and $B$, the symmetric difference $A \oplus B$ is defined as $(A - B) \cup (B - A)$, that is, the elements that are in exactly one of the two sets.

a. Show that if $M$ is a matching and $P$ is an augmenting path with respect to $M$, then the symmetric difference $M \oplus P$ is a matching and $|M \oplus P| = |M| + 1$. Show that if $P_1, P_2, \ldots, P_k$ are vertex-disjoint augmenting paths with respect to $M$, then the symmetric difference $M \oplus (P_1 \cup P_2 \cup \cdots \cup P_k)$ is a matching with cardinality $|M| + k$.

The general structure of our algorithm is the following:
HOPCROPFT-KARP(G)
    M = Ø
    repeat
        let P = {P[1], P[2], ..., P[k]} be a maximal set of vertex-disjoint shortest augmenting paths with respect to M
        M = M ⨁ (P[1] ∪ P[2] ∪ ... ∪ P[k])
    until P == Ø
    return M
The remainder of this problem asks you to analyze the number of iterations in the algorithm (that is, the number of iterations in the repeat loop) and to describe an implementation of line 3.

b. Given two matchings $M$ and $M^*$ in $G$, show that every vertex in the graph $G' = (V, M \oplus M^*)$ has degree at most $2$. Conclude that $G'$ is a disjoint union of simple paths or cycles. Argue that edges in each such simple path or cycle belong alternately to $M$ or $M^*$. Prove that if $|M| \le |M^*|$, then $M \oplus M^*$ contains at least $|M^*| - |M|$ vertex-disjoint augmenting paths with respect to $M$.

Let $l$ be the length of a shortest augmenting path with respect to a matching $M$, and let $P_1, P_2, \ldots, P_k$ be a maximal set of vertex-disjoint augmenting paths of length $l$ with respect to $M$. Let $M' = M \oplus (P_1 \cup \cdots \cup P_k)$, and suppose that $P$ is a shortest augmenting path with respect to $M'$.

c. Show that if $P$ is vertex-disjoint from $P_1, P_2, \ldots, P_k$ , then $P$ has more than $l$ edges.

d. Now suppose that $P$ is not vertex-disjoint from $P_1, P_2, \ldots, P_k$ . Let $A$ be the set of edges $(M \oplus M') \oplus P$. Show that $A = (P_1 \cup P_2 \cup \cdots \cup P_k) \oplus P$ and that $|A| \ge (k + 1)l$. Conclude that $P$ has more than $l$ edges.

e. Prove that if a shortest augmenting path with respect to $M$ has $l$ edges, the size of the maximum matching is at most $|M| + |V| / (l + 1)$.

f. Show that the number of repeat loop iterations in the algorithm is at most $2\sqrt{|V|}$. ($\textit{Hint:}$ By how much can $M$ grow after iteration number $\sqrt{|V|}$?)

g. Give an algorithm that runs in $O(E)$ time to find a maximal set of vertexdisjoint shortest augmenting paths $P_1, P_2, \ldots, P_k$ for a given matching $M$. Conclude that the total running time of $\text{HOPCROFT-KARP}$ is $O(\sqrt V E)$.

a. Suppose $M$ is a matching and $P$ is an augmenting path with respect to $M$. Then $P$ consists of $k$ edges in $M$, and $k + 1$ edges not in $M$. This is because the first edge of $P$ touches an unmatched vertex in $L$, so it cannot be in $M$. Similarly, the last edge in $P$ touches an unmatched vertex in $R$, so the last edge cannot be in $M$. Since the edges alternate being in or not in $M$, there must be exactly one more edge not in $M$ than in $M$. This implies that

$$|M \oplus P| = |M| + |P| - 2k = |M| + 2k + 1 - 2k = |M| + 1,$$

since we must remove each edge of $M$ which is in $P$ from both $M$ and $P$. Now suppose $P_1, P_2, \ldots, P_k$ are vertex-disjoint augmenting paths with respect to $M$. Let $k_i$ be the number of edges in $P_i$ which are in $M$, so that $|P_i| = 2k + i + 1$. Then we have

$$M \oplus (P_1 \cup P_2 \cup \cdots \cup P_k) = |M| + |P_1| + \cdots + |P_k| - 2k_1 - 2k_2 - \cdots - 2k_k = |M| + k.$$

To see that we in fact get a matching, suppose that there was some vertex $v$ which had at least $2$ incident edges $e$ and $e'$. They cannot both come from $M$, since $M$ is a matching. They cannot both come from $P$ since $P$ is simple and every other edge of $P$ is removed. Thus, $e \in M$ and $e' \in P \backslash M$. However, if $e \in M$ then $e \in P$, so $e \notin M \oplus P$, a contradiction. A similar argument gives the case of $M \oplus (P_1 \cup \cdots \cup P_k)$.

b. Suppose some vertex in $G'$ has degree at least $3$. Since the edges of $G'$ come from $M \oplus M^*$, at least $2$ of these edges come from the same matching. However, a matching never contains two edges with the same endpoint, so this is impossible. Thus every vertex has degree at most $2$, so $G'$ is a disjoint union of simple paths and cycles. If edge $(u, v)$ is followed by edge $(z, w)$ in a simple path or cycle then we must have $v = z$. Since two edges with the same endpoint cannot appear in a matching, they must belong alternately to $M$ and $M^*$. Since edges alternate, every cycle has the same number of edges in each matching and every path has at most one more edge in one matching than in the other. Thus, if $|M| \le |M^*|$ there must be at least $|M^*| - |M|$ vertex-disjoint augmenting paths with respect to $M$.

c. Every vertex matched by $M$ must be incident with some edge in $M'$. Since $P$ is augmenting with respect to $M$′, the left endpoint of the first edge of $P$ isn't incident to a vertex touched by an edge in $M'$. In particular, $P$ starts at a vertex in $L$ which is unmatched by $M$ since every vertex of $M$ is incident with an edge in $M'$. Since $P$ is vertex disjoint from $P_1, P_2, \ldots, P_k$, any edge of $P$ which is in $M'$ must in fact be in $M$ and any edge of $P$ which is not in $M'$ cannot be in $M$. Since $P$ has edges alternately in $M'$ and $E - M'$, $P$ must in fact have edges alternately in $M$ and $E - M$. Finally, the last edge of $P$ must be incident to a vertex in $R$ which is unmatched by $M'$. Any vertex unmatched by $M'$ is also unmatched by $M$, so $P$ is an augmenting path for $M$. $P$ must have length at least $l$ since $l$ is the length of the shortest augmenting path with respect to $M$. If $P$ had length exactly $l$, then this would contradict the fact that $P_1 \cup \cdots \cup P_k$ is a maximal set of vertex disjoint paths of length $l$ because we could add $P$ to the set. Thus $P$ has more than $l$ edges.

d. Any edge in $M \oplus M'$ is in exactly one of $M$ or $M'$. Thus, the only possible contributing edges from $M'$ are from $P_1 \cup \cdots \cup P_k$. An edge from $M$ can contribute if and only if it is not in exactly one of $M$ and $P_1 \cup \cdots \cup P_k$, which means it must be in both. Thus, the edges from $M$ are redundant so $M \oplus M' = (P_1 \cup \cdots \cup P_k)$ which implies $A = (P_1 \cup \cdots \cup P_k) \oplus P$.

Now we'll show that $P$ is edge disjoint from each $P_i$. Suppose that an edge $e$ of $P$ is also an edge of $P_i$ for some $i$. Since $P$ is an augmenting path with respect to $M'$ either $e \in M'$ or $e \in E - M'$. Suppose $e \in M'$. Since $P$ is also augmenting with respect to $M$, we must have $e \in M$. However, if $e$ is in $M$ and $M'$, then $e$ cannot be in any of the $P_i$'s by the definition of $M'$. Now suppose $e \in E - M'$. Then $e \in E - M$ since $P$ is augmenting with respect to $M$. Since $e$ is an edge of $P_i$, $e \in E - M'$ implies that $e \in M$, a contradiction.

Since $P$ has edges alternately in $M'$ and $E - M'$ and is edge disjoint from $P_1 \cup \cdots \cup P_k$, $P$ is also an augmenting path for $M$, which implies $|P| \ge l$. Since every edge in $A$ is disjoint we conclude that $|A| \ge (k + 1)l$.

e. Suppose $M^*$ is a matching with strictly more than $|M| + |V| / (l + 1)$ edges. By part (b) there are strictly more than $|V| / (l + 1)$ vertex-disjoint augmenting paths with respect to $M$. Each one of these contains at least $l$ edges, so it is incident on $l + 1$ vertices. Since the paths are vertex disjoint, there are strictly more than $|V|(l + 1) / (l + 1)$ distinct vertices incident with these paths, a contradiction. Thus, the size of the maximum matching is at most $|M| + |V| / (l + 1)$.

f. Consider what happens after iteration number $\sqrt{|V|}$. Let $M^*$ be a maximal matching in $G$. Then $|M^*| \ge |M|$ so by part (b), $M \oplus M^*$ contains at least $|M^*| - |M|$ vertex disjoint augmenting paths with respect to $M$. By part (c), each of these is also a an augmenting path for $M$. Since each has length $\sqrt{|V|}$, there can be at most $\sqrt{|V|}$ such paths, so $|M^*| - |M| \le \sqrt{|V|}$. Thus, only $\sqrt{|V|}$ additional iterations of the repeat loop can occur, so there are at most $2\sqrt{|V|}$ iterations in total.

g. For each unmatched vertex in $L$ we can perform a modified $\text{BFS}$ to find the length of the shortest path to an unmatched vertex in $R$. Modify the $\text{BFS}$ to ensure that we only traverse an edge if it causes the path to alternate between an edge in $M$ and an edge in $E - M$. The first time an unmatched vertex in $R$ is reached we know the length $k$ of a shortest augmenting path.

We can use this to stop our search early if at any point we have traversed more than that number of edges. To find disjoint paths, start at the vertices of $R$ which were found at distance $k$ in the $\text{BFS}$. Run a $\text{DFS}$ backwards from these, which maintains the property that the next vertex we pick has distance one fewer, and the edges alternate between being in $M$ and $E - M$. As we build up a path, mark the vertices as used so that we never traverse them again. This takes $O(E)$, so by part (f) the total runtime is $O(\sqrt VE)$.