Set cover problem

Short description: Classical problem in combinatorics

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory.

Given a set of elements ${1, 2, \dots, n}$ (henceforth referred to as the universe, specifying all possible elements under consideration) and a collection, referred to as $S$ , of a given $m$ subsets whose union equals the universe, the set cover problem is to identify a smallest sub-collection of $S$ whose union equals the universe.

For example, consider the universe, $U = {1, 2, 3, 4, 5}$ and the collection of sets $S = { {1, 2, 3}, {2, 4}, {3, 4}, {4, 5} }.$ In this example, $m$ is equal to 4, as there are four subsets that comprise this collection. The union of $S$ is equal to $U$ . However, we can cover all elements with only two sets: ${ {1, 2, 3}, {4, 5} }‍$ , see picture, but not with only one set. Therefore, the solution to the set cover problem for this $U$ and $S$ has size 2.

More formally, given a universe $𝒰$ and a family $𝒮$ of subsets of $𝒰$ , a set cover is a subfamily $𝒞 \subseteq 𝒮$ of sets whose union is $𝒰$ .

In the set cover decision problem, the input is a pair $(𝒰, 𝒮)$ and an integer $k$ ; the question is whether there is a set cover of size $k$ or less.
In the set cover optimization problem, the input is a pair $(𝒰, 𝒮)$ , and the task is to find a set cover that uses the fewest sets.

The decision version of set covering is NP-complete. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972. The optimization/search version of set cover is NP-hard.^[1] It is a problem "whose study has led to the development of fundamental techniques for the entire field" of approximation algorithms.^[2]

Variants

In the weighted set cover problem, each set is assigned a positive weight (representing its cost), and the goal is to find a set cover with a smallest weight. The usual (unweighted) set cover corresponds to all sets having a weight of 1.

In the fractional set cover problem, it is allowed to select fractions of sets, rather than entire sets. A fractional set cover is an assignment of a fraction (a number in [0,1]) to each set in

𝒮

, such that for each element x in the universe, the sum of fractions of sets that contain x is at least 1. The goal is to find a fractional set cover in which the sum of fractions is as small as possible. Note that a (usual) set cover is equivalent to a fractional set cover in which all fractions are either 0 or 1; therefore, the size of the smallest fractional cover is at most the size of the smallest cover, but may be smaller. For example, consider the universe $U = {1, 2, 3}$ and the collection of sets $S = { {1, 2}, {2, 3}, {3, 1} }.$ The smallest set cover has a size of 2, e.g. ${ {1, 2}, {2, 3} }.$ But there is a fractional set cover of size 1.5, in which a 0.5 fraction of each set is taken.

In the capacitated set cover problem, each set $s \in 𝒮$ is associated with a capacity $c_{S}$ which denotes the number of elements it can supply coverage. The goal is to determine the optimal way to select sets such that each element receives the coverage it requires.

Linear program formulation

The set cover problem can be formulated as the following integer linear program (ILP).^[3]

minimize	$\sum_{s \in 𝒮} x_{s}$		(minimize the number of sets)
subject to	$\sum_{s : e \in s} x_{s} ⩾ 1$	for all $e \in 𝒰$	(cover every element of the universe)
	$x_{s} \in {0, 1}$	for all $s \in 𝒮$ .	(every set is either in the set cover or not)

For a more compact representation of the covering constraint, one can define an incidence matrix $A$ , where each row corresponds to an element and each column corresponds to a set, and $A_{e, s} = 1$ if element e is in set s, and $A_{e, s} = 0$ otherwise. Then, the covering constraint can be written as $A x ⩾ 1$ .

Weighted set cover is described by a program identical to the one given above, except that the objective function to minimize is $\sum_{s \in 𝒮} w_{s} x_{s}$ , where $w_{s}$ is the weight of set $s \in 𝒮$ .

Fractional set cover is described by a program identical to the one given above, except that $x_{s}$ can be non-integer, so the last constraint is replaced by $0 \leq x_{s} \leq 1$ .

This linear program belongs to the more general class of LPs for covering problems, as all the coefficients in the objective function and both sides of the constraints are non-negative. The integrality gap of the ILP is at most $\log n$ (where $n$ is the size of the universe). It has been shown that its relaxation indeed gives a factor- $\log n$ approximation algorithm for the minimum set cover problem.^[4] See randomized rounding#setcover for a detailed explanation.

Hitting set formulation

The set cover problem is equivalent to the hitting set problem. A subset $H$ of $U$ is called a hitting set when $H \cap S_{j} \neq \emptyset$ for all $1 \leq j \leq m$ (i.e., $H$ intersects or “hits” all subsets in $S$ ). The hitting set problem is to find a minimum hitting set $H$ for a given $U$ and $S$ .

To show that the problems are equivalent, for a universe $U$ of size $n$ and collection of sets $S$ of size $m$ , construct $U^{'} = {1, 2, \dots, m}$ and $S'_{i} = {j ∣ i \in S_{j}}$ . Then a set cover $C$ of $S$ is equivalent to a hitting set $H^{'}$ of $U^{'}$ where $S_{j} \in C ⟺ j \in H^{'}$ , and vice versa.

This equivalence can also be visualized by representing the problem as a bipartite graph of $n + m$ vertices, with $n$ vertices on the left representing elements of $U$ , and $m$ vertices on the right representing elements of $S$ , and edges representing set membership (i.e., there is an edge between the $i$ -th vertex on the left and the $j$ -th vertex of the right iff. $i \in S_{j}$ ). Then a set cover is a subset $C$ of right vertices such that each left vertex is adjacent to at least one member of $C$ , while a hitting set is a subset $H$ of left vertices such that each right vertex is adjacent to at least one member of $H$ . These definitions are exactly the same except that left and right are swapped. But there is nothing special about the sides in the bipartite graph; we could have put the elements of $U$ on the right side, and the elements of $S$ on the left side, creating a graph that is a mirror image of the one described above. This shows that set covers in the original graph are equivalent to hitting sets in the mirrored graph, and vice versa.

In the field of computational geometry, a hitting set for a collection of geometrical objects is also called a stabbing set or piercing set.^[5]

Greedy algorithm

There is a greedy algorithm for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements. This method can be implemented in time linear in the sum of sizes of the input sets, using a bucket queue to prioritize the sets.^[6] It achieves an approximation ratio of $H (s)$ , where $s$ is the size of the set to be covered.^[7] ^[8] ^[9] In other words, it finds a covering that may be $H (n)$ times as large as the minimum one, where $H (n)$ is the $n$ -th harmonic number: $H (n) = \sum_{k = 1}^{n} \frac{1}{k} \leq \ln n + 1$

This greedy algorithm actually achieves an approximation ratio of $H (s^{'})$ where $s^{'}$ is the maximum cardinality set of $S$ . For $δ -$ dense instances, however, there exists a $c \ln m$ -approximation algorithm for every $c > 0$ .^[10]

Tight example for the greedy algorithm with k=3

There is a standard example on which the greedy algorithm achieves an approximation ratio of $\log_{2} (n) / 2$ . The universe consists of $n = 2^{(k + 1)} - 2$ elements. The set system consists of $k$ pairwise disjoint sets $S_{1}, \dots, S_{k}$ with sizes $2, 4, 8, \dots, 2^{k}$ respectively, as well as two additional disjoint sets $T_{0}, T_{1}$ , each of which contains half of the elements from each $S_{i}$ . On this input, the greedy algorithm takes the sets $S_{k}, \dots, S_{1}$ , in that order, while the optimal solution consists only of $T_{0}$ and $T_{1}$ . An example of such an input for $k = 3$ is pictured on the right.

Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms (see Inapproximability results below), under plausible complexity assumptions. A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly $\ln n - \ln \ln n + Θ (1)$ .^[11]

Low-frequency systems

If each element occurs in at most f sets, then a solution can be found in polynomial time that approximates the optimum to within a factor of f using LP relaxation.

If the constraint $x_{S} \in {0, 1}$ is replaced by $x_{S} \geq 0$ for all S in $𝒮$ in the integer linear program shown above, then it becomes a (non-integer) linear program L. The algorithm can be described as follows:

Find an optimal solution O for the program L using some polynomial-time method of solving linear programs.
Pick all sets S for which the corresponding variable x_S has value at least 1/f in the solution O.^[12]

Inapproximability results

When $n$ refers to the size of the universe, (Lund Yannakakis) showed that set covering cannot be approximated in polynomial time to within a factor of $\frac{1}{2} \log_{2} n \approx 0.72 \ln n$ , unless NP has quasi-polynomial time algorithms. Feige (1998) improved this lower bound to $(1 - o (1)) \cdot \ln n$ under the same assumptions, which essentially matches the approximation ratio achieved by the greedy algorithm. (Raz Safra) established a lower bound of $c \cdot \ln n$ , where $c$ is a certain constant, under the weaker assumption that P $=$ NP. A similar result with a higher value of $c$ was recently proved by (Alon Moshkovitz). (Dinur Steurer) showed optimal inapproximability by proving that it cannot be approximated to $(1 - o (1)) \cdot \ln n$ unless P $=$ NP.

In low-frequency systems, (Dinur Guruswami) proved it is NP-hard to approximate set cover to better than $f - 1 - ϵ$ . If the Unique games conjecture is true, this can be improved to $f - ϵ$ as proven by (Khot Regev).

(Trevisan 2001) proves that set cover instances with sets of size at most $Δ$ cannot be approximated to a factor better than $\ln Δ - O (\ln \ln Δ)$ unless P $=$ NP, thus making the approximation of $\ln Δ + 1$ of the greedy algorithm essentially tight in this case.

Weighted set cover

The greedy algorithm for the weighted set cover problem^[7] directly generalizes the unweighted version. Given a universe $𝒰$ and a family $𝒮$ of subsets of $𝒰$ , where each set $S \in 𝒮$ is assigned a non-negative weight (cost), the algorithm maintains the subset of elements that are not yet covered. Initially, all elements of $𝒰$ are uncovered. At each iteration, the algorithm selects a set $S \in 𝒮$ that minimizes the ratio between its weight and the number of currently uncovered elements it contains. The selected set is added to the solution, and all elements contained in it are marked as covered. This process is repeated until all elements of $𝒰$ are covered. The greedy algorithm is known to produce a solution whose total weight is at most a factor of $H (n)$ times the optimal solution, where $H (n)$ denotes the $n$ -th harmonic number and $n = | 𝒰 |$ .

For low frequency systems, where every element is contained in at most $f$ sets, the deterministic LP rounding algorithm gets an $f$ -approximation.^[13] It starts with the optimal solution to the linear programming relaxation of the problem stated above. Sets whose fractional value exceeds $1 / f$ are selected to form an integer solution.

The primal-dual algorithm for the set cover problem is an iterative method that constructs feasible solutions to both the primal and dual linear programs simultaneously. Starting with all dual variables set to zero, the algorithm repeatedly increases the dual variables corresponding to uncovered elements uniformly, until some set’s dual constraint becomes tight (i.e., the sum of the dual variables for elements in the set equals its cost). This tight set is then added to the primal solution, covering the corresponding elements. The process continues until all elements are covered. The algorithm guarantees an approximation ratio of $f$ , where $f$ is the maximum number of sets that any element belongs to.^[14]

Randomized rounding is an approximation technique for the weighted set cover problem that uses the solution of the linear programming relaxation. Let $x_{S}^{*}$ be an optimal fractional solution to the LP relaxation. Each set $S \in 𝒮$ is independently included in the cover with probability $x_{S}^{*}$ . By linearity of expectation, the expected cost of the chosen sets equals the LP optimum. The probability that any element remains uncovered can be made arbitrarily small by scaling probabilities or repeating the rounding. Using standard concentration bounds, this produces a feasible set cover whose expected cost is within an $O (\log n)$ factor of the optimal solution, where $n$ is the size of the universe.

References can be found in^[15] and^[16].

Fractional set cover

Notes

↑ Korte & Vygen 2012, p. 414.
↑ (Vazirani 2001)
↑ (Vazirani 2001)
↑ (Vazirani 2001)
↑ Nielsen, Frank (2000-09-06). "Fast stabbing of boxes in high dimensions". Theoretical Computer Science 246 (1): 53–72. doi:10.1016/S0304-3975(98)00336-3. ISSN 0304-3975. http://www.lix.polytechnique.fr/%7Enielsen/pdf/2000-FastStabbingBoxes-TCS.pdf.
↑ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. "Exercise 35.3-3". Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. pp. 1122. ISBN 0-262-03384-4.
↑ ^7.0 ^7.1 "A Greedy Heuristic for the Set-Covering Problem", Mathematics of Operations Research 4 (3): 233–235, August 1979, doi:10.1287/moor.4.3.233, Bibcode: 1979MatOR...4..233C
↑ Johnson, D. S. (1974), "Approximation algorithms for combinatorial problems", Journal of Computer and System Sciences 9 (3): 256–278, doi:10.1016/S0022-0000(74)80044-9, Bibcode: 1974JCoSS...9..256J
↑ Lovász, L. (1975), "On the ratio of optimal integral and fractional covers", Discrete Mathematics 13 (4): 383–390, doi:10.1016/0012-365X(75)90058-2
↑ Karpinski & Zelikovsky 1998
↑ Slavík Petr A tight analysis of the greedy algorithm for set cover. STOC'96, Pages 435-441, doi:10.1145/237814.237991
↑ (Vazirani 2001)
↑ Hochbaum, Dorit S. (1982), "Approximation algorithms for the Set Covering and Vertex Cover problems", SIAM Journal on Computing 11 (3): 555–556, doi:10.1137/0211045, ISSN 0097-5397
↑ Bar-Yehuda, Reuven; Even, Shimon (1981), "A linear-time approximation algorithm for the weighted vertex cover problem", Journal of Algorithms 2 (2): 198–203, doi:10.1016/0196-6774(81)90016-7, ISSN 0196-6774
↑ Young, Neal E. (2016), "Greedy Set‑Cover Algorithms", Encyclopedia of Algorithms, Springer, pp. 886–889, ISBN 978-1-4939-2868-7
↑ Williamson, David P.; Shmoys, David B. (2011), The Design of Approximation Algorithms, Cambridge University Press, ISBN 978-0-521-19578-7
↑ Information., Sandia National Laboratories. United States. Department of Energy. United States. Department of Energy. Office of Scientific and Technical (1999). On the Red-Blue Set Cover Problem.. United States. Dept. of Energy. OCLC 68396743.
↑ Gainer-Dewar, Andrew; Vera-Licona, Paola (2017), "The minimal hitting set generation problem: algorithms and computation", SIAM Journal on Discrete Mathematics 31 (1): 63–100, doi:10.1137/15M1055024

References

Alon, Noga; Moshkovitz, Dana; Safra, Shmuel (2006), "Algorithmic construction of sets for k-restrictions", ACM Trans. Algorithms 2 (2): 153–177, doi:10.1145/1150334.1150336, ISSN 1549-6325 .
Introduction to Algorithms, Cambridge, Mass.: MIT Press and McGraw-Hill, 2001, pp. 1033–1038, ISBN 978-0-262-03293-3
Feige, Uriel (1998), "A threshold of ln n for approximating set cover", Journal of the ACM 45 (4): 634–652, doi:10.1145/285055.285059, ISSN 0004-5411 .
Karpinski, Marek; Zelikovsky, Alexander (1998), "Approximating dense cases of covering problems", Proceedings of the DIMACS Workshop on Network Design: Connectivity and Facilities Location, 40, American Mathematical Society, pp. 169–178, ISBN 9780821870846, https://books.google.com/books?id=IMmuF0RZk1MC&q=karpinski+zelikovsky+cover+dense&pg=PA169
Lund, Carsten; Yannakakis, Mihalis (1994), "On the hardness of approximating minimization problems", Journal of the ACM 41 (5): 960–981, doi:10.1145/185675.306789, ISSN 0004-5411 .
Raz, Ran; Safra, Shmuel (1997), "A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP", STOC '97: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, ACM, pp. 475–484, ISBN 978-0-89791-888-6 .
Dinur, Irit; Steurer, David (2013), "Analytical approach to parallel repetition", STOC '14: Proceedings of the forty-sixth annual ACM symposium on Theory of computing, ACM, pp. 624–633 .
Vazirani, Vijay V. (2001), Approximation Algorithms, Springer-Verlag, ISBN 978-3-540-65367-7, https://www.ics.uci.edu/~vazirani/book.pdf
Korte, Bernhard; Vygen, Jens (2012), Combinatorial Optimization: Theory and Algorithms (5 ed.), Springer, ISBN 978-3-642-24487-2
Cardoso, Nuno; Abreu, Rui (2014), "An Efficient Distributed Algorithm for Computing Minimal Hitting Sets", Proceedings of the 25th International Workshop on Principles of Diagnosis, Graz, Austria, doi:10.5281/zenodo.10037, http://dx-2014.ist.tugraz.at/papers/DX14_Mon_PM_S1_paper1.pdf
A new multilayered PCP and the hardness of hypergraph vertex cover, Association for Computing Machinery, 2003, pp. 595–601, doi:10.1145/780542.780629, ISBN 1581136749, https://doi.org/10.1145/780542.780629
Vertex cover might be hard to approximate to within 2− $ϵ$ , Journal of Computer and System Sciences, 2008, pp. 335–349, doi:10.1016/j.jcss.2007.06.019, https://doi.org/10.1016/j.jcss.2007.06.019
"Non-approximability results for optimization problems on bounded degree instances", Association for Computing Machinery, 2001, pp. 453–461, doi:10.1145/380752.380839, ISBN 1-58113-349-9, https://doi.org/10.1145/380752.380839

External links

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Set cover problem. Read more

[FOOTNOTEKorteVygen2012414-1] Korte & Vygen 2012, p. 414.

[2] (Vazirani 2001)

[3] (Vazirani 2001)

[4] (Vazirani 2001)

[5] Nielsen, Frank (2000-09-06). "Fast stabbing of boxes in high dimensions". Theoretical Computer Science 246 (1): 53–72. doi:10.1016/S0304-3975(98)00336-3. ISSN 0304-3975. http://www.lix.polytechnique.fr/%7Enielsen/pdf/2000-FastStabbingBoxes-TCS.pdf.

[6] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. "Exercise 35.3-3". Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. pp. 1122. ISBN 0-262-03384-4.

[Chvatal1979-7] 7.0 ^7.1 "A Greedy Heuristic for the Set-Covering Problem", Mathematics of Operations Research 4 (3): 233–235, August 1979, doi:10.1287/moor.4.3.233, Bibcode: 1979MatOR...4..233C

[8] Johnson, D. S. (1974), "Approximation algorithms for combinatorial problems", Journal of Computer and System Sciences 9 (3): 256–278, doi:10.1016/S0022-0000(74)80044-9, Bibcode: 1974JCoSS...9..256J

[9] Lovász, L. (1975), "On the ratio of optimal integral and fractional covers", Discrete Mathematics 13 (4): 383–390, doi:10.1016/0012-365X(75)90058-2

[10] Karpinski & Zelikovsky 1998

[11] Slavík Petr A tight analysis of the greedy algorithm for set cover. STOC'96, Pages 435-441, doi:10.1145/237814.237991

[12] (Vazirani 2001)

[13] Hochbaum, Dorit S. (1982), "Approximation algorithms for the Set Covering and Vertex Cover problems", SIAM Journal on Computing 11 (3): 555–556, doi:10.1137/0211045, ISSN 0097-5397

[14] Bar-Yehuda, Reuven; Even, Shimon (1981), "A linear-time approximation algorithm for the weighted vertex cover problem", Journal of Algorithms 2 (2): 198–203, doi:10.1016/0196-6774(81)90016-7, ISSN 0196-6774

[15] Young, Neal E. (2016), "Greedy Set‑Cover Algorithms", Encyclopedia of Algorithms, Springer, pp. 886–889, ISBN 978-1-4939-2868-7

[16] Williamson, David P.; Shmoys, David B. (2011), The Design of Approximation Algorithms, Cambridge University Press, ISBN 978-0-521-19578-7

[17] Information., Sandia National Laboratories. United States. Department of Energy. United States. Department of Energy. Office of Scientific and Technical (1999). On the Red-Blue Set Cover Problem.. United States. Dept. of Energy. OCLC 68396743.

[18] Gainer-Dewar, Andrew; Vera-Licona, Paola (2017), "The minimal hitting set generation problem: algorithms and computation", SIAM Journal on Discrete Mathematics 31 (1): 63–100, doi:10.1137/15M1055024

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

Anonymous

Search

Set cover problem

Namespaces

More

Page actions

Contents

Variants

Linear program formulation

Hitting set formulation

Greedy algorithm

Low-frequency systems

Inapproximability results

Weighted set cover

Fractional set cover

Related problems

Notes

References

External links

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Set cover problem

Variants

Linear program formulation

Hitting set formulation

Greedy algorithm

Low-frequency systems

Inapproximability results

Weighted set cover

Fractional set cover

Related problems

Notes

References

External links

Navigation

Wiki tools

Page tools

Other projects

Categories