26-6 The Hopcroft-Karp bipartite matching algorithm

In this problem, we describe a faster algorithm, due to Hopcroft and Karp, for pp finding a maximum matching in a bipartite graph. The algorithm runs in O(VE)O(\sqrt V E) time. Given an undirected, bipartite graph G=(V,E)G = (V, E), where V=LRV = L \cup R and all edges have exactly one endpoint in LL, let MM be a matching in GG. We say that a simple path PP in GG is an augmenting path with respect to MM if it starts at an unmatched vertex in LL, ends at an unmatched vertex in RR, and its edges belong alternately to MM and EME - M. (This definition of an augmenting path is related to, but different from, an augmenting path in a flow network.) In this problem, we treat a path as a sequence of edges, rather than as a sequence of vertices. A shortest augmenting path with respect to a matching MM is an augmenting path with a minimum number of edges.

Given two sets AA and BB, the symmetric difference ABA \oplus B is defined as (AB)(BA)(A - B) \cup (B - A), that is, the elements that are in exactly one of the two sets.

a. Show that if MM is a matching and PP is an augmenting path with respect to MM, then the symmetric difference MPM \oplus P is a matching and MP=M+1|M \oplus P| = |M| + 1. Show that if P1,P2,,PkP_1, P_2, \ldots, P_k are vertex-disjoint augmenting paths with respect to MM, then the symmetric difference M(P1P2Pk)M \oplus (P_1 \cup P_2 \cup \cdots \cup P_k) is a matching with cardinality M+k|M| + k.

The general structure of our algorithm is the following:

HOPCROPFT-KARP(G)
    M = Ø
    repeat
        let P = {P[1], P[2], ..., P[k]} be a maximal set of vertex-disjoint shortest augmenting paths with respect to M
        M = M  (P[1]  P[2]  ...  P[k])
    until P == Ø
    return M

The remainder of this problem asks you to analyze the number of iterations in the algorithm (that is, the number of iterations in the repeat loop) and to describe an implementation of line 3.

b. Given two matchings MM and MM^* in GG, show that every vertex in the graph G=(V,MM)G' = (V, M \oplus M^*) has degree at most 22. Conclude that GG' is a disjoint union of simple paths or cycles. Argue that edges in each such simple path or cycle belong alternately to MM or MM^*. Prove that if MM|M| \le |M^*|, then MMM \oplus M^* contains at least MM|M^*| - |M| vertex-disjoint augmenting paths with respect to MM.

Let ll be the length of a shortest augmenting path with respect to a matching MM, and let P1,P2,,PkP_1, P_2, \ldots, P_k be a maximal set of vertex-disjoint augmenting paths of length ll with respect to MM. Let M=M(P1Pk)M' = M \oplus (P_1 \cup \cdots \cup P_k), and suppose that PP is a shortest augmenting path with respect to MM'.

c. Show that if PP is vertex-disjoint from P1,P2,,PkP_1, P_2, \ldots, P_k , then PP has more than ll edges.

d. Now suppose that PP is not vertex-disjoint from P1,P2,,PkP_1, P_2, \ldots, P_k . Let AA be the set of edges (MM)P(M \oplus M') \oplus P. Show that A=(P1P2Pk)PA = (P_1 \cup P_2 \cup \cdots \cup P_k) \oplus P and that A(k+1)l|A| \ge (k + 1)l. Conclude that PP has more than ll edges.

e. Prove that if a shortest augmenting path with respect to MM has ll edges, the size of the maximum matching is at most M+V/(l+1)|M| + |V| / (l + 1).

f. Show that the number of repeat loop iterations in the algorithm is at most 2V2\sqrt{|V|}. (Hint:\textit{Hint:} By how much can MM grow after iteration number V\sqrt{|V|}?)

g. Give an algorithm that runs in O(E)O(E) time to find a maximal set of vertexdisjoint shortest augmenting paths P1,P2,,PkP_1, P_2, \ldots, P_k for a given matching MM. Conclude that the total running time of HOPCROFT-KARP\text{HOPCROFT-KARP} is O(VE)O(\sqrt V E).

a. Suppose MM is a matching and PP is an augmenting path with respect to MM. Then PP consists of kk edges in MM, and k+1k + 1 edges not in MM. This is because the first edge of PP touches an unmatched vertex in LL, so it cannot be in MM. Similarly, the last edge in PP touches an unmatched vertex in RR, so the last edge cannot be in MM. Since the edges alternate being in or not in MM, there must be exactly one more edge not in MM than in MM. This implies that

MP=M+P2k=M+2k+12k=M+1,|M \oplus P| = |M| + |P| - 2k = |M| + 2k + 1 - 2k = |M| + 1,

since we must remove each edge of MM which is in PP from both MM and PP. Now suppose P1,P2,,PkP_1, P_2, \ldots, P_k are vertex-disjoint augmenting paths with respect to MM. Let kik_i be the number of edges in PiP_i which are in MM, so that Pi=2k+i+1|P_i| = 2k + i + 1. Then we have

M(P1P2Pk)=M+P1++Pk2k12k22kk=M+k.M \oplus (P_1 \cup P_2 \cup \cdots \cup P_k) = |M| + |P_1| + \cdots + |P_k| - 2k_1 - 2k_2 - \cdots - 2k_k = |M| + k.

To see that we in fact get a matching, suppose that there was some vertex vv which had at least 22 incident edges ee and ee'. They cannot both come from MM, since MM is a matching. They cannot both come from PP since PP is simple and every other edge of PP is removed. Thus, eMe \in M and eP\Me' \in P \backslash M. However, if eMe \in M then ePe \in P, so eMPe \notin M \oplus P, a contradiction. A similar argument gives the case of M(P1Pk)M \oplus (P_1 \cup \cdots \cup P_k).

b. Suppose some vertex in GG' has degree at least 33. Since the edges of GG' come from MMM \oplus M^*, at least 22 of these edges come from the same matching. However, a matching never contains two edges with the same endpoint, so this is impossible. Thus every vertex has degree at most 22, so GG' is a disjoint union of simple paths and cycles. If edge (u,v)(u, v) is followed by edge (z,w)(z, w) in a simple path or cycle then we must have v=zv = z. Since two edges with the same endpoint cannot appear in a matching, they must belong alternately to MM and MM^*. Since edges alternate, every cycle has the same number of edges in each matching and every path has at most one more edge in one matching than in the other. Thus, if MM|M| \le |M^*| there must be at least MM|M^*| - |M| vertex-disjoint augmenting paths with respect to MM.

c. Every vertex matched by MM must be incident with some edge in MM'. Since PP is augmenting with respect to MM′, the left endpoint of the first edge of PP isn't incident to a vertex touched by an edge in MM'. In particular, PP starts at a vertex in LL which is unmatched by MM since every vertex of MM is incident with an edge in MM'. Since PP is vertex disjoint from P1,P2,,PkP_1, P_2, \ldots, P_k, any edge of PP which is in MM' must in fact be in MM and any edge of PP which is not in MM' cannot be in MM. Since PP has edges alternately in MM' and EME - M', PP must in fact have edges alternately in MM and EME - M. Finally, the last edge of PP must be incident to a vertex in RR which is unmatched by MM'. Any vertex unmatched by MM' is also unmatched by MM, so PP is an augmenting path for MM. PP must have length at least ll since ll is the length of the shortest augmenting path with respect to MM. If PP had length exactly ll, then this would contradict the fact that P1PkP_1 \cup \cdots \cup P_k is a maximal set of vertex disjoint paths of length ll because we could add PP to the set. Thus PP has more than ll edges.

d. Any edge in MMM \oplus M' is in exactly one of MM or MM'. Thus, the only possible contributing edges from MM' are from P1PkP_1 \cup \cdots \cup P_k. An edge from MM can contribute if and only if it is not in exactly one of MM and P1PkP_1 \cup \cdots \cup P_k, which means it must be in both. Thus, the edges from MM are redundant so MM=(P1Pk)M \oplus M' = (P_1 \cup \cdots \cup P_k) which implies A=(P1Pk)PA = (P_1 \cup \cdots \cup P_k) \oplus P.

Now we'll show that PP is edge disjoint from each PiP_i. Suppose that an edge ee of PP is also an edge of PiP_i for some ii. Since PP is an augmenting path with respect to MM' either eMe \in M' or eEMe \in E - M'. Suppose eMe \in M'. Since PP is also augmenting with respect to MM, we must have eMe \in M. However, if ee is in MM and MM', then ee cannot be in any of the PiP_i's by the definition of MM'. Now suppose eEMe \in E - M'. Then eEMe \in E - M since PP is augmenting with respect to MM. Since ee is an edge of PiP_i, eEMe \in E - M' implies that eMe \in M, a contradiction.

Since PP has edges alternately in MM' and EME - M' and is edge disjoint from P1PkP_1 \cup \cdots \cup P_k, PP is also an augmenting path for MM, which implies Pl|P| \ge l. Since every edge in AA is disjoint we conclude that A(k+1)l|A| \ge (k + 1)l.

e. Suppose MM^* is a matching with strictly more than M+V/(l+1)|M| + |V| / (l + 1) edges. By part (b) there are strictly more than V/(l+1)|V| / (l + 1) vertex-disjoint augmenting paths with respect to MM. Each one of these contains at least ll edges, so it is incident on l+1l + 1 vertices. Since the paths are vertex disjoint, there are strictly more than V(l+1)/(l+1)|V|(l + 1) / (l + 1) distinct vertices incident with these paths, a contradiction. Thus, the size of the maximum matching is at most M+V/(l+1)|M| + |V| / (l + 1).

f. Consider what happens after iteration number V\sqrt{|V|}. Let MM^* be a maximal matching in GG. Then MM|M^*| \ge |M| so by part (b), MMM \oplus M^* contains at least MM|M^*| - |M| vertex disjoint augmenting paths with respect to MM. By part (c), each of these is also a an augmenting path for MM. Since each has length V\sqrt{|V|}, there can be at most V\sqrt{|V|} such paths, so MMV|M^*| - |M| \le \sqrt{|V|}. Thus, only V\sqrt{|V|} additional iterations of the repeat loop can occur, so there are at most 2V2\sqrt{|V|} iterations in total.

g. For each unmatched vertex in LL we can perform a modified BFS\text{BFS} to find the length of the shortest path to an unmatched vertex in RR. Modify the BFS\text{BFS} to ensure that we only traverse an edge if it causes the path to alternate between an edge in MM and an edge in EME - M. The first time an unmatched vertex in RR is reached we know the length kk of a shortest augmenting path.

We can use this to stop our search early if at any point we have traversed more than that number of edges. To find disjoint paths, start at the vertices of RR which were found at distance kk in the BFS\text{BFS}. Run a DFS\text{DFS} backwards from these, which maintains the property that the next vertex we pick has distance one fewer, and the edges alternate between being in MM and EME - M. As we build up a path, mark the vertices as used so that we never traverse them again. This takes O(E)O(E), so by part (f) the total runtime is O(VE)O(\sqrt VE).