Partition problem
In number theory and computer science, the partition problem, or number partitioning,[1] is the task of deciding whether a given multiset S of positive integers can be partitioned into two subsets S1 and S2 such that the sum of the numbers in S1 equals the sum of the numbers in S2. Although the partition problem is NP-complete, there is a pseudo-polynomial time dynamic programming solution, and there are heuristics that solve the problem in many instances, either optimally or approximately. For this reason, it has been called "the easiest hard problem".[2] There is an optimization version of the partition problem, which is to partition the multiset S into two subsets S1, S2 such that the difference between the sum of elements in S1 and the sum of elements in S2 is minimized. The optimization version is NP-hard, but can be solved efficiently in practice.[3]
Examples
Given S = {3,1,1,2,2,1}, a valid solution to the partition problem is the two sets S1 = {1,1,1,2} and S2 = {2,3}. Both sets sum to 5, and they partition S. Note that this solution is not unique. S1 = {3,1,1} and S2 = {2,2,1} is another solution.
Not every multiset of positive integers has a partition into two subsets with equal sum. An example of such a set is S = {2,5}.
Pseudo-polynomial time algorithm
The problem can be solved using dynamic programming when the size of the set and the size of the sum of the integers in the set are not too big to render the storage requirements infeasible.
Suppose the input to the algorithm is a multiset [math]\displaystyle{ S }[/math] of cardinality [math]\displaystyle{ N }[/math]:
- S = {x1, ..., xN}
Let K be the sum of all elements in S. That is: K = x1 + ... + xN. We will build an algorithm that determines whether there is a subset of S that sums to [math]\displaystyle{ \lfloor K/2 \rfloor }[/math]. If there is a subset, then:
- if K is even, the rest of S also sums to [math]\displaystyle{ \lfloor K/2 \rfloor }[/math]
- if K is odd, then the rest of S sums to [math]\displaystyle{ \lceil K/2 \rceil }[/math]. This is as good a solution as possible.
Recurrence relation
We wish to determine if there is a subset of S that sums to [math]\displaystyle{ \lfloor K/2 \rfloor }[/math]. Let:
- p(i, j) be True if a subset of { x1, ..., xj } sums to i and False otherwise.
Then p([math]\displaystyle{ \lfloor K/2 \rfloor }[/math], N) is True if and only if there is a subset of S that sums to [math]\displaystyle{ \lfloor K/2 \rfloor }[/math]. The goal of our algorithm will be to compute p([math]\displaystyle{ \lfloor K/2 \rfloor }[/math], N). In aid of this, we have the following recurrence relation:
- p(i, j) is True if either p(i, j − 1) is True or if p(i − xj, j − 1) is True
- p(i, j) is False otherwise
The reasoning for this is as follows: there is some subset of S that sums to i using numbers
- x1, ..., xj
if and only if either of the following is true:
- There is a subset of { x1, ..., xj−1 } that sums to i;
- there is a subset of { x1, ..., xj−1 } that sums to i − xj, since xj + that subset's sum = i.
The pseudo-polynomial algorithm
The algorithm consists of building up a table of size [math]\displaystyle{ \lfloor K/2 \rfloor }[/math] by [math]\displaystyle{ N }[/math] containing the values of the recurrence. Remember that [math]\displaystyle{ K }[/math] is the sum of all [math]\displaystyle{ N }[/math] elements in [math]\displaystyle{ S }[/math]. Once the entire table is filled in, we return [math]\displaystyle{ P(\lfloor K/2 \rfloor, N) }[/math]. Below is a depiction of the table [math]\displaystyle{ P }[/math]. There is a blue arrow from one block to another if the value of the target-block might depend on the value of the source-block. This dependence is a property of the recurrence relation.
function find_partition(S) is input: A list of integers S. output: True if S can be partitioned into two subsets that have equal sum. n ← |S| K ← sum(S) P ← empty boolean table of size ([math]\displaystyle{ \lfloor K/2 \rfloor }[/math] + 1) by (n + 1) initialize top row (P(0,x)) of P to True initialize leftmost column (P(x, 0)) of P, except for P(0, 0) to False for i from 1 to [math]\displaystyle{ \lfloor K/2 \rfloor }[/math] for j from 1 to n if (i-S[j]) >= 0 then P(i, j) ← P(i, j-1) or P(i-S[j], j-1) else P(i, j) ← P(i, j-1) return P([math]\displaystyle{ \lfloor K/2 \rfloor }[/math], n)
Example
Below is the table P for the example set used above S = {3, 1, 1, 2, 2, 1}:
Analysis
This algorithm runs in time O(K/2 N), where N is the number of elements in the input set and K is the sum of elements in the input set.
The algorithm can be extended to the k-way multi-partitioning problem, but then takes O(n(k − 1)mk − 1) memory where m is the largest number in the input, making it impractical even for k = 3 unless the inputs are very small numbers.[3]
Special case of the subset-sum problem
The partition problem can be viewed as a special case of the subset sum problem and the pseudo-polynomial time dynamic programming solution given above generalizes to a solution for the subset sum problem.
Approximation algorithm approaches
Several heuristic algorithms exist to produce approximations to the partition optimization problem. These can be extended to linear-space exact algorithms.[3]
The greedy algorithm
One approach to the problem, imitating the way children choose teams for a game, is the greedy algorithm, which iterates through the numbers in descending order, assigning each of them to whichever subset has the smaller sum. This approach has a running time of O(n log n). This heuristic works well in practice when the numbers in the set are of about the same size as its cardinality or less, but it is not guaranteed to produce the best possible partition. For example, given the set S = {4, 5, 6, 7, 8} as input, this greedy algorithm would partition S into subsets {4, 5, 8} and {6, 7}; however, S has an exactly balanced partition into subsets {7, 8} and {4, 5, 6}.
This greedy approach is known to give a 7⁄6-approximation to the optimal solution of the optimization version; that is, if the greedy algorithm outputs two sets A and B, then max(∑A, ∑B) ≤ 7/6 OPT, where OPT is the size of the larger set in the best possible partition.[4] Below is an example (written in Python) for the greedy algorithm.
def find_partition(numbers): """Separate given numbers into two series of equal sum. Args: numbers: an collection of numbers, for an example a list of integers. Returns: Two lists of numbers. """ A = [] B = [] sum_A = 0 sum_B = 0 for n in sorted(numbers, reverse=True): if sum_A < sum_B: A.append(n) sum_A = sum_A + n else: B.append(n) sum_B = sum_B + n return (A, B)
Example
>>> find_partition([1, 2, 3, 4, 5]) ([4, 3], [5, 2, 1])
This algorithm can be extended to the case of k > 2 sets: to take the k largest elements, and for each partition of them, extends the partition by adding the remaining elements successively to whichever set is smaller. (The simple version above corresponds to k = 2.) This version runs in time O(2k n2) and is known to give a 4/3-1/3k approximation.[4] Τhus, we have a polynomial-time approximation scheme (PTAS) for the number partition problem, though this is not a fully polynomial time approximation scheme (the running time is exponential in the desired approximation guarantee). However, there are variations of this idea that are fully polynomial-time approximation schemes for the subset-sum problem, and hence for the partition problem as well.[5][6]
Differencing algorithm
Another heuristic is the largest differencing method (LDM),[7] also called the Karmarkar–Karp heuristic[3] after the pair of scientists that published it in 1982.[8] LDM operates in two phases. The first phase of the algorithm takes the two largest numbers from the input and replaces them by their difference; this is repeated until only one number remains. The replacement represents the decision to put the two numbers in different sets, without immediately deciding which one is in which set. At the end of phase one, the single remaining number is the difference of the two subset sums. The second phase reconstructs the actual solution.[2]
The differencing heuristic performs better than the greedy one, but is still bad for instances where the numbers are exponential in the size of the set.[2]
The following Java code implements the first phase of Karmarkar–Karp. It uses a heap to efficiently find the pair of largest remaining numbers.
int karmarkarKarpPartition(int[] baseArr) { // create max heap PriorityQueue<Integer> heap = new PriorityQueue<Integer>(baseArr.length, REVERSE_INT_CMP); for (int value : baseArr) { heap.add(value); } while(heap.size() > 1) { int val1 = heap.poll(); int val2 = heap.poll(); heap.add(val1 - val2); } return heap.poll(); }
Other approaches
There are also anytime algorithms, based on the differencing heuristic, that first find the solution returned by the differencing heuristic, then find progressively better solutions as time allows (possibly requiring exponential time to reach optimality, for the worst instances).[9]
Hard instances
Sets with only one, or no partitions tend to be hardest (or most expensive) to solve compared to their input sizes. When the values are small compared to the size of the set, perfect partitions are more likely. The problem is known to undergo a "phase transition"; being likely for some sets and unlikely for others. If m is the number of bits needed to express any number in the set and n is the size of the set then [math]\displaystyle{ m/n \lt 1 }[/math] tends to have many solutions and [math]\displaystyle{ m/n \gt 1 }[/math] tends to have few or no solutions. As n and m get larger, the probability of a perfect partition goes to 1 or 0 respectively. This was originally argued based on empirical evidence by Gent and Walsh,[10] then using methods from statistical physics by Mertens,[11] and later proved by Borgs, Chayes, and Pittel.[12]
Variants and generalizations
The restriction of requiring the partition to have equal size, or that all input integers be distinct, is also NP-hard.[citation needed]
There is a problem called the 3-partition problem which is to partition the set S into |S|/3 triples each with the same sum. This problem is quite different to the partition problem and has no pseudo-polynomial time algorithm unless P = NP.[13]
The multi-way partition problem generalizes the optimization version of the partition problem. Here, the goal is to divide a set or multiset of n integers into a given number k of subsets, minimizing the difference between the smallest and the largest subset sums.[3]
Probabilistic version
A related problem, somewhat similar to the Birthday paradox, is that of determining the size of the input set so that we have a probability of one half that there is a solution, under the assumption that each element in the set is randomly selected with uniform distribution between 1 and some given value.
The solution to this problem can be counter-intuitive, like the birthday paradox.
Notes
- ↑ Korf 1998
- ↑ 2.0 2.1 2.2 Hayes 2002
- ↑ 3.0 3.1 3.2 3.3 3.4 Korf, Richard E. (2009). "Multi-Way Number Partitioning". IJCAI. http://ijcai.org/papers09/Papers/IJCAI09-096.pdf.
- ↑ 4.0 4.1 Ron L. Graham (1969). "Bounds on multiprocessor timing anomalies". SIAM J. Appl. Math. 17 (2): pp. 416–429.
- ↑ Hans Kellerer; Ulrich Pferschy; David Pisinger (2004), Knapsack problems, Springer, p. 97, ISBN 9783540402862, https://books.google.com/books?id=u5DB7gck08YC&pg=PA97
- ↑ Martello, Silvano; Toth, Paolo (1990). "4 Subset-sum problem". Knapsack problems: Algorithms and computer interpretations. Wiley-Interscience. pp. 105–136. ISBN 978-0-471-92420-3. https://archive.org/details/knapsackproblems0000mart/page/105.
- ↑ Michiels, Wil; Korst, Jan; Aarts, Emile (2003). "Performance ratios for the Karmarkar–Karp differencing method". Electronic Notes in Discrete Mathematics 13: 71–75. doi:10.1016/S1571-0653(04)00442-1.
- ↑ Karmarkar & Karp 1982
- ↑ Korf 1998, Mertens 1999
- ↑ Gent & Walsh 1996
- ↑ Mertens 1998, Mertens 2001
- ↑ Borgs, Chayes & Pittel 2001
- ↑ Garey, Michael; Johnson, David (1979). Computers and Intractability; A Guide to the Theory of NP-Completeness. pp. 96–105. ISBN 978-0-7167-1045-5. https://archive.org/details/computersintract0000gare.
References
- Hayes, Brian (March–April 2002), "The Easiest Hard Problem", American Scientist (Sigma Xi, The Scientific Research Society) 90 (2): 113–117
- Karmarkar, Narenda; Karp, Richard M (1982), "The Differencing Method of Set Partitioning", Technical Report UCB/CSD 82/113 (University of California at Berkeley: Computer Science Division (EECS)), http://digitalassets.lib.berkeley.edu/techreports/ucb/text/CSD-83-113.pdf
- Gent, Ian; Walsh, Toby (August 1996), Wolfgang Wahlster, ed., Phase Transitions and Annealed Theories: Number Partitioning as a Case Study, John Wiley and Sons, pp. 170–174
- Gent, Ian; Walsh, Toby (1998), "Analysis of Heuristics for Number Partitioning", Computational Intelligence 14 (3): 430–451, doi:10.1111/0824-7935.00069
- Mertens, Stephan (November 1998), "Phase Transition in the Number Partitioning Problem", Physical Review Letters 81 (20): 4281–4284, doi:10.1103/PhysRevLett.81.4281, Bibcode: 1998PhRvL..81.4281M
- Mertens, Stephan (2001), "A physicist's approach to number partitioning", Theoretical Computer Science 265 (1–2): 79–108, doi:10.1016/S0304-3975(01)00153-0
- Mertens, Stephan (2006), "The Easiest Hard Problem: Number Partitioning", in Allon Percus; Gabriel Istrate; Cristopher Moore, Computational complexity and statistical physics, Oxford University Press US, p. 125, ISBN 9780195177374, Bibcode: 2003cond.mat.10317M, https://books.google.com/books?id=4YD6AxV95zEC&pg=PA125
- Borgs, Christian; Chayes, Jennifer; Pittel, Boris (2001), "Phase transition and finite-size scaling for the integer partitioning problem", Random Structures and Algorithms 19 (3–4): 247–288, doi:10.1002/rsa.10004
- Korf, Richard E. (1998), "A complete anytime algorithm for number partitioning", Artificial Intelligence 106 (2): 181–203, doi:10.1016/S0004-3702(98)00086-1, ISSN 0004-3702
- Mertens, Stephan (1999), A complete anytime algorithm for balanced number partitioning, pp. arXiv:cs/9903011, Bibcode: 1999cs........3011M