Skip to content

33.4 Finding the closest pair of points


Professor Williams comes up with a scheme that allows the closest-pair algorithm to check only $5$ points following each point in array $Y'$. The idea is always to place points on line $l$ into set $P_L$. Then, there cannot be pairs of coincident points on line $l$ with one point in $P_L$ and one in $P_R$. Thus, at most $6$ points can reside in the $\delta \times 2\delta$ rectangle. What is the flaw in the professor's scheme?

In particular, when we select line $l$, we may be unable perform an even split of the vertices. So, we don't neccesarily have that both the left set of points and right set of points have fallen to roughly half. For example, suppose that the points are all arranged on a vertical line, then, when we recurse on the the left set of points, we haven't reduced the problem size at all, let alone by a factor of two. There is also the issue in this setup that you may end up asking about a set of size less than two when looking at the right set of points.


Show that it actually suffices to check only the points in the $5$ array positions following each point in the array $Y'$.

Since we only care about the shortest distance, the distance $\delta'$ must be strictly less than $\delta$. The picture in Figure 33.11(b) only illustrates the case of a nonstrict inequality. If we exclude the possibility of points whose $x$ coordinate differs by exactly $\delta$ from $l$, then it is only possible to place at most $6$ points in the $\delta \times 2\delta$ rectangle, so it suffices to check on the points in the $5$ array positions following each point in the array $Y'$.


We can define the distance between two points in ways other than euclidean. In the plane, the $L_m$-distance between points $p_1$ and $p_2$ is given by the expression $(|x_1 - x_2|^m + |y_1 - y_2|^m)^{1 / m}$. Euclidean distance, therefore, is $L_2$-distance. Modify the closest-pair algorithm to use the $L_1$-distance, which is also known as the Manhattan distance.

In the analysis of the algorithm, most of it goes through just based on the triangle inequality. The only main point of difference is in looking at the number of points that can be fit into a $\delta \times 2\delta$ rectangle. In particular, we can cram in two more points than the eight shown into the rectangle by placing points at the centers of the two squares that the rectangle breaks into. This means that we need to consider points up to $9$ away in $Y'$ instead of $7$ away. This has no impact on the asymptotics of the algorithm and it is the only correction to the algorithm that is needed if we switch from $L_2$ to $L_1$.


Given two points $p_1$ and $p_2$ in the plane, the $L_\infty$-distance between them is given by $\max(|x_1 - x_2|, |y_1 - y_2|)$. Modify the closest-pair algorithm to use the $L_\infty$-distance.

We can simply run the divide and conquer algorithm described in the section, modifying the brute force search for $|P| \le 3$ and the check against the next $7$ points in $Y'$ to use the $L_\infty$ distance. Since the $L_\infty$ distance between two points is always less than the euclidean distance, there can be at most $8$ points in the $\delta \times 2\delta$ rectangle which we need to examine in order to determine whether the closest pair is in that box. Thus, the modified algorithm is still correct and has the same runtime.


Suppose that $\Omega(n)$ of the points given to the closest-pair algorithm are covertical. Show how to determine the sets $P_L$ and $P_R$ and how to determine whether each point of $Y$ is in $P_L$ or $P_R$ so that the running time for the closest-pair algorithm remains $O(n\lg n)$.

We select the line $l$ so that it is roughly equal, and then, we won't run into any issue if we just pick an arbitrary subset of the vertices that are on the line to go to one side or the other.

Since the analysis of the algorithm allowed for both elements from $P_L$ and $P_R$ to be on the line, we still have correctness if we do this. To determine what values of $Y$ belong to which of the set can be made easier if we select our set going to $P_L$ to be the lowest however many points are needed, and the $P_R$ to be the higher points. Then, just knowing the index of $Y$ that we are looking at, we know whether that point belonged to $P_L$ or to $P_R$.


Suggest a change to the closest-pair algorithm that avoids presorting the $Y$ array but leaves the running time as $O(n\lg n)$. ($\textit{Hint:}$ Merge sorted arrays $Y_L$ and $Y_R$ to form the sorted array $Y$.)

In addition to returning the distance of the closest pair, the modify the algorithm to also return the points passed to it, sorted by $y$-coordinate, as $Y$. To do this, merge $Y_L$ and $Y_R$ returned by each of its recursive calls. If we are at the base case, when $n \le 3$, simply use insertion sort to sort the elements by y-coordinate directly. Since each merge takes linear time, this doesn't affect the recursive equation for the runtime.