9-2 Weighted median

For $n$ distinct elements $x_1, x_2, \ldots, x_n$ with positive weights $w_1, w_2, \ldots, w_n$ such that $\sum_{i = 1}^n w_i = 1$, the weighted (lower) median is the element $x_k$ satisfying

$$\sum_{x_i < x_k} w_i < \frac{1}{2}$$

and

$$\sum_{x_i > x_k} w_i \le \frac{1}{2}.$$

For example, if the elements are $0.1, 0.35, 0.05, 0.1, 0.15, 0.05, 0.2$ and each element equals its weight (that is, $w_i = x_i$ for $i = 1, 2, \ldots, 7$), then the median is $0.1$, but the weighted median is $0.2$.

a. Argue that the median of $x_1, x_2, \ldots, x_n$ is the weighted median of the $x_i$ with weights $w_i = 1 / n$ for $i = 1, 2, \ldots, n$.

b. Show how to compute the weighted median of $n$ elements in $O(n\lg n)$ worstcase time using sorting.

c. Show how to compute the weighted median in $\Theta(n)$ worst-case time using a linear-time median algorithm such as $\text{SELECT}$ from Section 9.3.

The post-office location problem is defined as follows. We are given $n$ points $p_1, p_2, \ldots, p_n$ with associated weights $w_1, w_2, \ldots, w_n$. We wish to find a point $p$ (not necessarily one of the input points) that minimizes the sum $\sum_{i = 1}^n w_i d(p, p_i)$, where $d(a, b)$ is the distance between points $a$ and $b$.

d. Argue that the weighted median is a best solution for the $1$-dimensional postoffice location problem, in which points are simply real numbers and the distance between points $a$ and $b$ is $d(a, b) = |a - b|$.

e. Find the best solution for the $2$-dimensional post-office location problem, in which the points are $(x,y)$ coordinate pairs and the distance between points $a = (x_1, y_1)$ and $b = (x_2, y_2)$ is the Manhattan distance given by $d(a, b) = |x_1 - x_2| + |y_1 - y_2|$.

a. Let $m_k$ be the number of $x_i$ smaller than $x_k$. When weights of $1 / n$ are assigned to each $x_i$, we have $\sum_{x_i < x_k} w_i = m_k / n$ and $\sum_{x_i > x_k} w_i = (n - m_k - 1) / 2$. The only value of $m_k$ which makes these sums $< 1 / 2$ and $\le 1 / 2$ respectively is when $\lceil n / 2 \rceil - 1$, and this value $x$ must be the median since it has equal numbers of $x_i's$ which are larger and smaller than it.

b. First use mergesort to sort the $x_i$'s by value in $O(n\lg n)$ time. Let $S_i$ be the sum of the weights of the first $i$ elements of this sorted array and note that it is $O(1)$ to update $S_i$. Compute $S_1, S_2, \dots$ until you reach $k$ such that $S_{k − 1} < 1 / 2$ and $S_k \ge 1 / 2$. The weighted median is $x_k$.

c. We modify $\text{SELECT}$ to do this in linear time. Let $x$ be the median of medians. Compute $\sum_{x_i < x} w_i$ and $\sum_{x_i > x} w_i$ and check if either of these is larger than $1 / 2$. If not, stop. If so, recurse on the collection of smaller or larger elements known to contain the weighted median. This doesn't change the runtime, so it is $\Theta(n)$.

d. Let $p$ be the minimizer, and suppose that $p$ is not the weighted median. Let $\epsilon$ be small enough such that $\epsilon < \min_i(|p − p_i|)$, where we don't include $k$ if $p = p_k$. If $p_m$ is the weighted median and $p < p_m$, choose $\epsilon > 0$. Otherwise choose $\epsilon < 0$. Thus, we have

$$\sum_{i = 1}^n w_id(p + \epsilon, p_i) = \sum_{i = 1}^n w_id(p, p_i) + \epsilon\left(\sum_{p_i < p} w_i - \sum_{p_i > p} w_i \right) < \sum_{i = 1}^n w_id(p, p_i),$$

the difference in sums will take the opposite sign of epsilon.

e. Observe that

$$\sum_{i = 1}^n w_id(p, p_i) = \sum_{i = 1}^n w_i |p_x - (p_i)_x| + \sum_{i = 1}^n w_i|p_y - (p_i)_y|.$$

It will suffice to minimize each sum separately, which we can do since we choose $p_x$ and $p_y$ individually. By part (e), we simply take $p = (p_x, p_y)$ to be such that $p_x$ is the weighted median of the $x$-coordinates of the $p_i$'s and py is the weighted medain of the $y$-coordiantes of the $p_i$'s.