Interval scheduling

From HandWiki

Interval scheduling is a class of problems in computer science, particularly in the area of algorithm design. The problems consider a set of tasks. Each task is represented by an interval describing the time in which it needs to be executed. For instance, task A might run from 2:00 to 5:00, task B might run from 4:00 to 10:00 and task C might run from 9:00 to 11:00. A subset of intervals is compatible if no two intervals overlap. For example, the subset {A,C} is compatible, as is the subset {B}; but neither {A,B} nor {B,C} are compatible subsets, because the corresponding intervals within each subset overlap. The interval scheduling maximization problem (ISMP) is to find a largest compatible set - a set of non-overlapping intervals of maximum size. The goal here is to execute as many tasks as possible.

In an upgraded version of the problem, the intervals are partitioned into groups. A subset of intervals is compatible if no two intervals overlap, and moreover, no two intervals belong to the same group (i.e. the subset contains at most a single representative interval of each group).

The group interval scheduling decision problem (GISDP) is to decide whether there exists a compatible set in which all groups are represented. The goal here is to execute a single representative task from each group. GISDPk is a restricted version of GISDP in which the number of intervals in each group is at most k.

The group interval scheduling maximization problem (GISMP) is to find a largest compatible set - a set of non-overlapping representatives of maximum size. The goal here is to execute a representative task from as many groups as possible. GISMPk is a restricted version of GISMP in which the number of intervals in each group is at most k. This problem is often called JISPk, where J stands for Job.

GISMP is the most general problem; the other two problems can be seen as special cases of it:

  • ISMP is the special case in which each task belongs to its own group (i.e. it is equal to GISMP1).
  • GISDP is the problem of deciding whether the maximum is exactly equal to the number of groups.

Interval Scheduling Maximization

[1]

IntervalSelection.svg

Several algorithms, that may look promising at first sight, actually do not find the optimal solution:

  • Selecting the intervals that start earliest is not an optimal solution, because if the earliest interval happens to be very long, accepting it would make us reject many other shorter requests.
  • Selecting the shortest intervals or selecting intervals with the fewest conflicts is also not optimal.

Greedy polynomial solution

The following greedy algorithm does find the optimal solution:

  1. Select the interval, x, with the earliest finishing time.
  2. Remove x, and all intervals intersecting x, from the set of candidate intervals.
  3. Repeat until the set of candidate intervals is empty.

Whenever we select an interval at step 1, we may have to remove many intervals in step 2. However, all these intervals necessarily cross the finishing time of x, and thus they all cross each other (see figure). Hence, at most 1 of these intervals can be in the optimal solution. Hence, for every interval in the optimal solution, there is an interval in the greedy solution. This proves that the greedy algorithm indeed finds an optimal solution.

A more formal explanation is given by a Charging argument.

The greedy algorithm can be executed in time O(n log n), where n is the number of tasks, using a preprocessing step in which the tasks are sorted by their finishing times.

Group Interval Scheduling Decision

NP-complete when some groups contain 3 or more intervals

GISDPk is NP-complete when [math]\displaystyle{ k\geq 3 }[/math],[2] even when all intervals have the same length.[3] This can be shown by a reduction from the following version of the Boolean satisfiability problem:

Let [math]\displaystyle{ X = \{x_1, x_2,..., x_p\} }[/math] be a set of Boolean variables. Let [math]\displaystyle{ C = \{c_1, c_2,..., c_q\} }[/math] be a set of

clauses over X such that (1) each clause in C has at most three literals and (2) each variable is restricted to appear once or twice positively and once negatively overall in C. Decide whether there is an assignment to variables of X such that each clause in C has at least one true literal.

This version was shown [4] to be NP-complete likewise to the unrestricted version.

Given an instance of this satisfiability problem, construct the following instance of GISDP. All intervals have a length of 3, so it is sufficient to represent each interval by its starting time:

  • For every variable [math]\displaystyle{ x_i }[/math] (for i=1,...,p), create a group with two intervals: one starting at [math]\displaystyle{ 50i-10 }[/math] (representing the assignment [math]\displaystyle{ x_i=false }[/math]) and another starting at [math]\displaystyle{ 50i+10 }[/math] (representing the assignment [math]\displaystyle{ x_i=true }[/math]).
  • For every clause [math]\displaystyle{ c_j }[/math] (for j=1,...,q), create a group with the following intervals:
    • For every variable [math]\displaystyle{ x_i }[/math] that appears positively for the first time in C - an interval starting at [math]\displaystyle{ 50i-12 }[/math].
    • For every variable [math]\displaystyle{ x_i }[/math] that appears positively for the second time in C - an interval starting at [math]\displaystyle{ 50i-8 }[/math]. Note that both these intervals intersect the interval [math]\displaystyle{ 50i-10 }[/math], associated with [math]\displaystyle{ x_i=false }[/math].
    • For every variable [math]\displaystyle{ x_i }[/math] that appears negatively - an interval starting at [math]\displaystyle{ 50i+8 }[/math]. This interval intersects the interval [math]\displaystyle{ 50i+10 }[/math] associated with [math]\displaystyle{ x_i=true }[/math].

Note that there is no overlap between intervals in groups associated with different clauses. This is ensured since a variable appears at most twice positively and once negatively.

The constructed GISDP has a feasible solution (i.e. a scheduling in which each group is represented), if and only if the given set of boolean clauses has a satisfying assignment. Hence GISDP3 is NP-complete, and so is GISDPk for every [math]\displaystyle{ k\geq 3 }[/math].

Polynomial when all groups contain at most 2 intervals

GISDP2 can be solved at polynomial time by the following reduction to the 2-satisfiability problem:[3]

  • For every group i create two variables, representing its two intervals: [math]\displaystyle{ x_i }[/math] and [math]\displaystyle{ y_i }[/math].
  • For every group i, create the clauses: [math]\displaystyle{ x_i \cup y_i }[/math] and [math]\displaystyle{ \neg{x_i} \cup \neg{y_i} }[/math], which represent the assertion that exactly one of these two intervals should be selected.
  • For every two intersecting intervals (i.e. [math]\displaystyle{ x_i }[/math] and [math]\displaystyle{ y_j }[/math]) create the clause: [math]\displaystyle{ \neg{x_i} \cup \neg{y_j} }[/math], which represent the assertion that at most one of these two intervals should be selected.

This construction contains at most O(n2) clauses (one for each intersection between intervals, plus two for each group). Each clause contains 2 literals. The satisfiability of such formulas can be decided in time linear in the number of clauses (see 2-SAT). Therefore, the GISDP2 can be solved in polynomial time.

Group Interval Scheduling Maximization

MaxSNP-complete when some groups contain 2 or more intervals

GISMPk is NP-complete even when [math]\displaystyle{ k\geq 2 }[/math].[5]

Moreover, GISMPk is MaxSNP-complete, i.e., it does not have a PTAS unless P=NP. This can be proved by showing an approximation-preserving reduction from MAX 3-SAT-3 to GISMP2.[5]

Polynomial 2-approximation

The following greedy algorithm finds a solution that contains at least 1/2 of the optimal number of intervals:[5]

  1. Select the interval, x, with the earliest finishing time.
  2. Remove x, and all intervals intersecting x, and all intervals in the same group of x, from the set of candidate intervals.
  3. Continue until the set of candidate intervals is empty.

A formal explanation is given by a Charging argument.

The approximation factor of 2 is tight. For example, in the following instance of GISMP2:

  • Group #1: {[0..2], [4..6]}
  • Group #2: {[1..3]}

The greedy algorithm selects only 1 interval [0..2] from group #1, while an optimal scheduling is to select [1..3] from group #2 and then [4..6] from group #1.

LP based approximation algorithms

Using the technique of Linear programming relaxation, it is possible to approximate the optimal scheduling with slightly better approximation factors. The approximation ratio of the first such algorithm is asymptotically 2 when k is large, but when k=2 the algorithm achieves an approximation ratio of 5/3.[5] The approximation factor for arbitrary k was later improved to 1.582.[6]

Graph representations

An interval scheduling problem can be described by an intersection graph, where each vertex is an interval, and there is an edge between two vertices if and only if their intervals overlap. In this representation, the interval scheduling problem is equivalent to finding the maximum independent set in this intersection graph. In general graphs, finding a maximum independent set is NP-hard. Therefore, it is interesting that in interval intersection graphs it can be done exactly in polynomial time.[citation needed]

A group-interval scheduling problem, i.e. GISMPk, can be described by a similar interval-intersection graph, with additional edges between each two intervals of the same group, i.e., this is the edge union of an interval graph and a graph consisting of n disjoint cliques of size k.

Variations

An important class of scheduling algorithms is the class of dynamic priority algorithms. When none of the intervals overlap the optimum solution is trivial. The optimum for the non-weighted version can found with the earliest deadline first scheduling. Weighted interval scheduling is a generalization where a value is assigned to each executed task and the goal is to maximize the total value. The solution need not be unique.

The interval scheduling problem is 1-dimensional – only the time dimension is relevant. The Maximum disjoint set problem is a generalization to 2 or more dimensions. This generalization, too, is NP-complete.

Another variation is resource allocation, in which a set of intervals s are scheduled using resources k such that k is minimized. That is, all the intervals must be scheduled, but the objective is to reduce the number of resources as much as possible.

Another variation is when there are m processors instead of a single processor. I.e., m different tasks can run in parallel.[2]

See also

Sources

  1. Kleinberg, Jon; Tardos, Éva (2006). Algorithm Design. ISBN 978-0-321-29535-4. https://archive.org/details/algorithmdesign0000klei. 
  2. 2.0 2.1 Nakajima, K.; Hakimi, S. L. (1982). "Complexity results for scheduling tasks with discrete starting times". Journal of Algorithms 3 (4): 344. doi:10.1016/0196-6774(82)90030-X. 
  3. 3.0 3.1 Mark Keil, J. (1992). "On the complexity of scheduling tasks with discrete starting times". Operations Research Letters 12 (5): 293–295. doi:10.1016/0167-6377(92)90087-j. 
  4. Papadimitriou, Christos H.; Steiglitz, Kenneth (July 1998). Combinatorial Optimization : Algorithms and Complexity. Dover. ISBN 978-0-486-40258-1. 
  5. 5.0 5.1 5.2 5.3 Spieksma, F. C. R. (1999). "On the approximability of an interval scheduling problem". Journal of Scheduling 2 (5): 215–227. doi:10.1002/(sici)1099-1425(199909/10)2:5<215::aid-jos27>3.0.co;2-y.  citing Kolen in personal communication
  6. Chuzhoy, J.; Ostrovsky, R.; Rabani, Y. (2006). "Approximation Algorithms for the Job Interval Selection Problem and Related Scheduling Problems". Mathematics of Operations Research 31 (4): 730. doi:10.1287/moor.1060.0218.