This Haskell cicada was a Google Codejam competition problem way back in 2012 Codejam Round 1B 2012: Safety in Numbers. No, I did not participate. I’m not very good and I’m also not very fast, two things needed in coding competitions. I do like these competitions as a source of problems to tackle. .
Problem
There are $n$ contestants in a reality TV show. Each contestant is assigned a point value by the judges and receives votes from the audience. The point value given by the judges and the audience’s votes are combined to form a final score for the contestant, in the following way:
Let $s$ be the sum of the judge-assigned point values of all contestants. Now suppose a contestant got $s_i$ points from the judges, and that they received a fraction $\alpha_i$ (between $0$ and $1$, inclusive) of the audience’s votes ($\alpha_i$ might be, for example, $0.3$). Then that contestant’s final score is $c_i=s_i+\alpha_i s$. Note that the sum of all contestants’ audience vote fractions must be one.
The contestant with the lowest score is eliminated.
Given the points contestants got from judges, your job is to find out, for each contestant, the minimum percentage of audience votes they must receive in order for them to be guaranteed not to be eliminated, no matter how the rest of the audience’s votes are distributed.
If the lowest score is shared by multiple contestants, no contestants will be eliminated.
Solution
We have the popular vote distributed over the $n$ candidates. Each candidate gets fraction $\alpha_i \in [0, 1]$ of it, and the sum of the fractions add up to one.
$$ \sum^{n-1}_{i=0} \alpha_i = 1 $$
We also have the score from the judges. Each candidate gets score $s_i$ from the judges and
$$ s = \sum^{n-1}_{i=0} s_i $$
is the sum of all the judge’s scores.
The final score of a candidate is $c_i = s_i + \alpha_i s$.
For $c_i$ to be eliminated, it needs to be smaller than every $c_j, j \ne i$.
The interesting thing about this problem is that the final score of each candidate depends on a distribution of the total judge score according to audience votes. It means that the final scores are not independent of each other.
It seems useful to sort the candidates by the score they got from the judges:
$$ s_0 \leq s_1 \leq \ldots \leq s_{n-1} $$
For a given candidate $i$ we need to figure out for each possible value of $\alpha_i$ if there exists an audience vote distribution of the remaining $1 - \alpha_i$ that would eliminate candidate $i$. The possible values of $\alpha_i$ are in the range $[0, 1]$.
We define the following function for each candidate:
$$ f_i(\alpha_i) = \begin{cases} 1 & \text{ candidate safe with } \alpha_i\\ 0 & \text{ candidate not safe with } \alpha_i \end{cases} $$
A candidate $i$ is safe with $\alpha_i$ iff there is no distribution of the remaining audience votes $1 - \alpha_i$ that would eliminate candidate $i$. It seems we should consider the worst case distribution of the remaining audience votes. If the worst case doesn’t eliminate the candidate, then they are safe. The candidate score is $c_i = s_i + \alpha_i s$. Let $k$ be the index such that
$$ s_{k-1} \leq c_i < s_k $$
We define $s_{-1} = - \infty$ and $s_n = \infty$ for the two cases where $c_i$ is smaller than all judge scores or bigger than all judge scores. The worst case then is distributing the remaining $1 - \alpha_i$ audience votes in such a way to lift the candidates $0, 1, \ldots k-1$ over candidate $i$. It is the worst case because the candidates $k, \ldots, n-1$ already have bigger total scores than candidate $i$ no matter the remaining distribution, so using it fully to lift over the others is the worst case. We get $k$ inequalities:
$$ s_j + \alpha_j s > c_i, 0 \leq j < k $$
The worst case is if all remaining audience votes go to these $0, 1, \ldots k-1$ candidates:
$$ \sum_{j=0}^{k-1} \alpha_j = 1 - \alpha_i $$
We add left and right sides of the inequalities and get:
$$ \sum_{j=0}^{k-1}s_j + (1 - \alpha_i) s > k c_i $$
This gives us a criteria to compute $f_i(\alpha_i)$ for each value of $\alpha_i$:
$$ f_i(\alpha_i) = \begin{cases} 1 & \text{ if }\sum_{j=0}^{k-1}s_j + (1 - \alpha_i) s \leq k c_i\\ 0 & \text{ if }\sum_{j=0}^{k-1}s_j + (1 - \alpha_i) s > k c_i \end{cases} $$
$f_i$ is a step function. Once it reaches one, it stays there because increasing $\alpha_i$ decreases the remaining audience votes so there is even less of a chance to lift the losing candidates past candidate $i$ using audience votes.
The step can occur inside or outside interval $[0, 1]$. If it is outside then candidate $i$ is either always eliminated or always safe, no matter the audience votes.
Let’s write some code. Before we can compute $f_i(\alpha_i)$ we need to compute the index $k$ where $c_i$ falls in the sorted sequence of judge scores $s_j$. The sequence is sorted, so we can use binary search. We might as well write a more general binary search, that finds the index at which a given predicate changes from False to True.
binarySearch :: (Int->Bool) -> Int -> Int -> Int
-- Preconditions:
-- predicate p has to be defined on domain [low, high)
-- low < high
-- predicate p has to satisfy: p(i) => (\forall low <= j < i: not p(j))
-- and
-- (\forall i <= j < high: p(j))
-- Return:
-- returns smallest index i with p(i),
-- if all values i in domain result in not p(i), then returns high
binarySearch p low _ | (p low) = low
binarySearch _ low high | (low + 1 == high) = high
binarySearch p low high = if (p m) then binarySearch p low m
else binarySearch p m high
where m = low + (high - low) `div` 2
For the step function itself we need to consider the fact that the search space is continuous because it is the interval of real numbers $[0, 1]$. We can still use binary search but instead of an exact result we have to accept an approximation: we stop when we narrowed the interval where we know the step is to a desired precision.
findStep' :: Double -> (Double -> Bool) -> Double -> Double -> Double
findStep' d _ low high | (high - low) < d = (high - low) / 2
findStep' d p low high = if (p m) then findStep' d p low m
else findStep' d p m high
where m = low + (high - low) / 2
findStep :: Double -> (Double -> Bool) -> Double
-- Preconditions:
-- predicate p has to be defined on domain [0,1]
-- predicate p has to satisfy: p(i) => (\forall low <= j < i: not p(j))
-- and
-- (\forall i <= j < high: p(j))
-- returns x such that y \in (x - d, x + d) with y = infinum{z: p(z)}
findStep _ p | (p 0.0) = 0.0
findStep _ p | (not (p 1.0)) = 1.0
findStep d p = findStep' d p 0.0 1.0