k-approximation of k-hitting set

From HandWiki

In computer science, k-approximation of k-hitting set is an approximation algorithm for weighted hitting set. The input is a collection S of subsets of some universe T and a mapping W from T to non-negative numbers called the weights of the elements of T. In k-hitting set the size of the sets in S cannot be larger than k. That is, [math]\displaystyle{ \forall i \in S: |i| \leq k }[/math]. The problem is now to pick some subset T' of T such that every set in S contains some element of T', and such that the total weight of all elements in T' is as small as possible.

The algorithm

For each set [math]\displaystyle{ j }[/math] in S is maintained a price, [math]\displaystyle{ p_j }[/math], which is initially 0. For an element a in T, let S(a) be the collection of sets from S containing a. During the algorithm the following invariant is kept

[math]\displaystyle{ \forall a \in T: \sum_{j \in S(a)} p_j \leq W(a).\, }[/math]

We say that an element, a, from T is tight if [math]\displaystyle{ \Sigma_{j \in S(a)} p_j = W(a) }[/math]. The main part of the algorithm consists of a loop: As long as there is a set in S that contains no element from T which is tight, the price of this set is increased as much as possible without violating the invariant above. When this loop exits, all sets contain some tight element. Pick all the tight elements to be the hitting set.

Correctness

The algorithm always terminates because in each iteration of the loop the price of some set in S is increased enough to make one more element from T tight. If it cannot increase any price, it exits. It runs in polynomial time because the loop will not make more iterations than the number of elements in the union of all the sets of S. It returns a hitting set, because when the loop exits, all sets in S contain a tight element from T, and the set of these tight elements are returned.

Note that for any hitting set T* and any prices [math]\displaystyle{ p_1, \ldots, p_{|S|} }[/math] where the invariant from the algorithm is true, the total weight of the hitting set is an upper limit to the sum over all members of T* of the sum of the prices of sets containing this element, that is: [math]\displaystyle{ \Sigma_{a \in T^*} \Sigma_{j \in S(a)} p_j \leq \Sigma_{a \in T^*} W(a) }[/math]. This follows from the invariant property. Further, since the price of every set must occur at least once on the left hand side, we get [math]\displaystyle{ \Sigma_{j \in S} p_j \leq \Sigma_{a \in T^*} W(a) }[/math]. Especially, this property is true for the optimal hitting set.

Further, for the hitting set H returned from the algorithm above, we have [math]\displaystyle{ \Sigma_{a \in H} \Sigma_{j \in S(a)} p_j = \Sigma_{a \in H} W(a) }[/math]. Since any price [math]\displaystyle{ p_j }[/math] can appear at most k times in the left-hand side (since subsets of S can contain no more than k element from T) we get [math]\displaystyle{ \Sigma_{a \in H} W(a) \leq k \cdot \Sigma_{j \in S} p_j }[/math] Combined with the previous paragraph we get [math]\displaystyle{ \Sigma_{a \in H} W(a) \leq k \cdot \Sigma_{a \in T^*} W(a) }[/math], where T* is the optimal hitting set. This is exactly the guarantee that it is a k-approximation algorithm.

Relation to linear programming

This algorithm is an instance of the primal-dual method, also called the pricing method. The intuition is that it is dual to a linear programming algorithm. For an explanation see http://algo.inria.fr/seminars/sem00-01/vazirani.html.

References