26-6 The Hopcroft-Karp bipartite matching algorithm
In this problem, we describe a faster algorithm, due to Hopcroft and Karp, for finding a maximum matching in a bipartite graph. The algorithm runs in time. Given an undirected, bipartite graph , where and all edges have exactly one endpoint in , let be a matching in . We say that a simple path in is an augmenting path with respect to if it starts at an unmatched vertex in , ends at an unmatched vertex in , and its edges belong alternately to and . (This definition of an augmenting path is related to, but different from, an augmenting path in a flow network.) In this problem, we treat a path as a sequence of edges, rather than as a sequence of vertices. A shortest augmenting path with respect to a matching is an augmenting path with a minimum number of edges.
Given two sets and , the symmetric difference is defined as , that is, the elements that are in exactly one of the two sets.
a. Show that if is a matching and is an augmenting path with respect to , then the symmetric difference is a matching and . Show that if are vertex-disjoint augmenting paths with respect to , then the symmetric difference is a matching with cardinality .
The general structure of our algorithm is the following:
HOPCROPFT-KARP(G) M = Ø repeat let P = {P[1], P[2], ..., P[k]} be a maximal set of vertex-disjoint shortest augmenting paths with respect to M M = M ⨁ (P[1] ∪ P[2] ∪ ... ∪ P[k]) until P == Ø return M
The remainder of this problem asks you to analyze the number of iterations in the algorithm (that is, the number of iterations in the repeat loop) and to describe an implementation of line 3.
b. Given two matchings and in , show that every vertex in the graph has degree at most . Conclude that is a disjoint union of simple paths or cycles. Argue that edges in each such simple path or cycle belong alternately to or . Prove that if , then contains at least vertex-disjoint augmenting paths with respect to .
Let be the length of a shortest augmenting path with respect to a matching , and let be a maximal set of vertex-disjoint augmenting paths of length with respect to . Let , and suppose that is a shortest augmenting path with respect to .
c. Show that if is vertex-disjoint from , then has more than edges.
d. Now suppose that is not vertex-disjoint from . Let be the set of edges . Show that and that . Conclude that has more than edges.
e. Prove that if a shortest augmenting path with respect to has edges, the size of the maximum matching is at most .
f. Show that the number of repeat loop iterations in the algorithm is at most . ( By how much can grow after iteration number ?)
g. Give an algorithm that runs in time to find a maximal set of vertexdisjoint shortest augmenting paths for a given matching . Conclude that the total running time of is .
a. Suppose is a matching and is an augmenting path with respect to . Then consists of edges in , and edges not in . This is because the first edge of touches an unmatched vertex in , so it cannot be in . Similarly, the last edge in touches an unmatched vertex in , so the last edge cannot be in . Since the edges alternate being in or not in , there must be exactly one more edge not in than in . This implies that
since we must remove each edge of which is in from both and . Now suppose are vertex-disjoint augmenting paths with respect to . Let be the number of edges in which are in , so that . Then we have
To see that we in fact get a matching, suppose that there was some vertex which had at least incident edges and . They cannot both come from , since is a matching. They cannot both come from since is simple and every other edge of is removed. Thus, and . However, if then , so , a contradiction. A similar argument gives the case of .
b. Suppose some vertex in has degree at least . Since the edges of come from , at least of these edges come from the same matching. However, a matching never contains two edges with the same endpoint, so this is impossible. Thus every vertex has degree at most , so is a disjoint union of simple paths and cycles. If edge is followed by edge in a simple path or cycle then we must have . Since two edges with the same endpoint cannot appear in a matching, they must belong alternately to and . Since edges alternate, every cycle has the same number of edges in each matching and every path has at most one more edge in one matching than in the other. Thus, if there must be at least vertex-disjoint augmenting paths with respect to .
c. Every vertex matched by must be incident with some edge in . Since is augmenting with respect to ′, the left endpoint of the first edge of isn't incident to a vertex touched by an edge in . In particular, starts at a vertex in which is unmatched by since every vertex of is incident with an edge in . Since is vertex disjoint from , any edge of which is in must in fact be in and any edge of which is not in cannot be in . Since has edges alternately in and , must in fact have edges alternately in and . Finally, the last edge of must be incident to a vertex in which is unmatched by . Any vertex unmatched by is also unmatched by , so is an augmenting path for . must have length at least since is the length of the shortest augmenting path with respect to . If had length exactly , then this would contradict the fact that is a maximal set of vertex disjoint paths of length because we could add to the set. Thus has more than edges.
d. Any edge in is in exactly one of or . Thus, the only possible contributing edges from are from . An edge from can contribute if and only if it is not in exactly one of and , which means it must be in both. Thus, the edges from are redundant so which implies .
Now we'll show that is edge disjoint from each . Suppose that an edge of is also an edge of for some . Since is an augmenting path with respect to either or . Suppose . Since is also augmenting with respect to , we must have . However, if is in and , then cannot be in any of the 's by the definition of . Now suppose . Then since is augmenting with respect to . Since is an edge of , implies that , a contradiction.
Since has edges alternately in and and is edge disjoint from , is also an augmenting path for , which implies . Since every edge in is disjoint we conclude that .
e. Suppose is a matching with strictly more than edges. By part (b) there are strictly more than vertex-disjoint augmenting paths with respect to . Each one of these contains at least edges, so it is incident on vertices. Since the paths are vertex disjoint, there are strictly more than distinct vertices incident with these paths, a contradiction. Thus, the size of the maximum matching is at most .
f. Consider what happens after iteration number . Let be a maximal matching in . Then so by part (b), contains at least vertex disjoint augmenting paths with respect to . By part (c), each of these is also a an augmenting path for . Since each has length , there can be at most such paths, so . Thus, only additional iterations of the repeat loop can occur, so there are at most iterations in total.
g. For each unmatched vertex in we can perform a modified to find the length of the shortest path to an unmatched vertex in . Modify the to ensure that we only traverse an edge if it causes the path to alternate between an edge in and an edge in . The first time an unmatched vertex in is reached we know the length of a shortest augmenting path.
We can use this to stop our search early if at any point we have traversed more than that number of edges. To find disjoint paths, start at the vertices of which were found at distance in the . Run a backwards from these, which maintains the property that the next vertex we pick has distance one fewer, and the edges alternate between being in and . As we build up a path, mark the vertices as used so that we never traverse them again. This takes , so by part (f) the total runtime is .