Separation oracle

From HandWiki
Short description: Concept in the mathematical theory of convex optimization

A separation oracle (also called a cutting-plane oracle) is a concept in the mathematical theory of convex optimization. It is a method to describe a convex set that is given as an input to an optimization algorithm. Separation oracles are used as input to ellipsoid methods.[1]:87,96,98

Definition

Let K be a convex and compact set in Rn. A strong separation oracle for K is an oracle (black box) that, given a vector y in Rn, returns one of the following:[1]:48

  • Assert that y is in K.
  • Find a hyperplane that separates y from K: a vector a in Rn, such that [math]\displaystyle{ a\cdot y \gt a\cdot x }[/math] for all x in K.

A strong separation oracle is completely accurate, and thus may be hard to construct. For practical reasons, a weaker version is considered, which allows for small errors in the boundary of K and the inequalities. Given a small error tolerance d>0, we say that:

  • A vector y is d-near K if its Euclidean distance from K is at most d;
  • A vector y is d-deep in K if it is in K, and its Euclidean distance from any point in outside K is at least d.

The weak version also considers rational numbers, which have a representation of finite length, rather than arbitrary real numbers. A weak separation oracle for K is an oracle that, given a vector y in Qn and a rational number d>0, returns one of the following::[1]:51

  • Assert that y is d-near K;
  • Find a vector a in Qn, normalized such that its maximum element is 1, such that [math]\displaystyle{ a\cdot y +d\geq a\cdot x }[/math] for all x that are d-deep in K.

Implementation

A special case of a convex set is a set represented by linear inequalities: [math]\displaystyle{ K = \{x | Ax \leq b \} }[/math]. Such a set is called a convex polytope. A strong separation oracle for a convex polytope can be implemented, but its run-time depends on the input format.

Representation by inequalities

If the matrix A and the vector b are given as input, so that [math]\displaystyle{ K = \{x | Ax \leq b \} }[/math], then a strong separation oracle can be implemented as follows.[2] Given a point y, compute [math]\displaystyle{ Ay }[/math]:

  • If the outcome is at most [math]\displaystyle{ b }[/math], then y is in K by definition;
  • Otherwise, there is at least one row [math]\displaystyle{ c }[/math] of A, such that [math]\displaystyle{ c\cdot y }[/math] is larger than the corresponding value in [math]\displaystyle{ b }[/math]; this row [math]\displaystyle{ c }[/math] gives us the separating hyperplane, as [math]\displaystyle{ c\cdot y \gt b \geq c\cdot x }[/math] for all x in K.

This oracle runs in polynomial time as long as the number of constraints is polynomial.

Representation by vertices

Suppose the set of vertices of K is given as an input, so that [math]\displaystyle{ K = \text{conv}(v_1,\ldots,v_k) = }[/math] the convex hull of its vertices. Then, deciding whether y is in K requires to check whether y is a convex combination of the input vectors, that is, whether there exist coefficients z1,...,zk such that: [1]:49

  • [math]\displaystyle{ z_1\cdot v_1 + \cdots + z_k\cdot v_k = y }[/math];
  • [math]\displaystyle{ 0 \leq z_i\leq 1 }[/math] for all i in 1,...,k.

This is a linear program with k variables and n equality constraints (one for each element of y). If y is not in K, then the above program has no solution, and the separation oracle needs to find a vector c such that

  • [math]\displaystyle{ c\cdot y \gt c\cdot v_i }[/math] for all i in 1,...,k.

Note that the two above representations can be very different in size: it is possible that a polytope can be represented by a small number of inequalities, but has exponentially many vertices (for example, an n-dimensional cube). Conversely, it is possible that a polytope has a small number of vertices, but requires exponentially many inequalities (for example, the convex hull of the 2n vectors of the form (0,...,±1,...,0).

Problem-specific representation

In some linear optimization problems, even though the number of constraints is exponential, one can still write a custom separation oracle that works in polynomial time. Some examples are:

  • The minimum-cost arborescence problem: given a weighted directed graph and a vertex r in it, find a subgraph of minimum cost that contains a directed path from r to any other vertex. The problem can be presented as an LP with a constraint for each subset of vertices, which is an exponential number of constraints. However, a separation oracle can be implemented using n-1 applications of the minimum cut procedure.[3]
  • The maximum independent set problem. It can be approximated by an LP with a constraint for every odd-length cycle. While there are exponentially-many such cycles, a separation oracle that works in polynomial time can be implemented by just finding an odd cycle of minimum length, which can be done in polynomial time.[3]
  • The dual of the configuration linear program for the bin packing problem. It can be approximated by an LP with a constraint for each feasible configuration. While there are exponentially-many such cycles, a separation oracle that works in pseudopolynomial time can be implemented by solving a knapsack problem. This is used by the Karmarkar-Karp bin packing algorithms.

Non-linear sets

Let f be a convex function on Rn. The set [math]\displaystyle{ K = \{(x, t) | f(x)\leq t \} }[/math] is a convex set in Rn+1. Given an evaluation oracle for f (a black box that returns the value of f for every given point), one can easily check whether a vector (y, t) is in K. In order to get a separation oracle, we need also an oracle to evaluate the subgradient of f.[1]:49 Suppose some vector (y, s) is not in K, so f(y) > s. Let g be the subgradient of f at y (g is a vector in Rn). Denote [math]\displaystyle{ c := (g, -1) }[/math].Then, [math]\displaystyle{ c\cdot (y,s) = g\cdot y - s \gt g\cdot y - f(y) }[/math], and for all (x, t) in K: [math]\displaystyle{ c\cdot (x,t) = g\cdot x - t \leq g\cdot x - f(x) }[/math]. By definition of a subgradient: [math]\displaystyle{ f(x)\geq f(y) + g\cdot (x-y) }[/math] for all x in Rn. Therefore, [math]\displaystyle{ g\cdot y - f(y) \geq g\cdot x-f(x) }[/math], so [math]\displaystyle{ c\cdot(y,s) \gt c\cdot(x,t) }[/math] , and c represents a separating hyperplane.

Usage

A strong separation oracle can be given as an input to the ellipsoid method for solving a linear program. Consider the linear program [math]\displaystyle{ \text{maximize}~~ c\cdot x ~~\text{subject to}~~ Ax \leq b, x\geq 0 }[/math]. The ellipsoid method maintains an ellipsoid that initially contains the entire feasible domain [math]\displaystyle{ A x \leq b }[/math]. At each iteration t, it takes the center [math]\displaystyle{ x_t }[/math] of the current ellipsoid, and sends it to the separation oracle:

  • If the oracle says that [math]\displaystyle{ x_t }[/math] is feasible (that is, contained in the set [math]\displaystyle{ Ax \leq b }[/math]), then we do an "optimality cut" at [math]\displaystyle{ x_t }[/math]: we cut from the ellipsoid all points x for which [math]\displaystyle{ c \cdot x \lt c \cdot x_t }[/math]. These points are definitely not optimal.
  • If the oracle says that [math]\displaystyle{ x_t }[/math] is infeasible, then it typically returns a specific constraint that is violated by [math]\displaystyle{ x_t }[/math], that is, a row [math]\displaystyle{ a_j }[/math] in the matrix A, such that [math]\displaystyle{ a_j\cdot x_t \gt b_j }[/math]. Since [math]\displaystyle{ a_j \cdot x \leq b_j }[/math] for all feasible x, this implies that [math]\displaystyle{ a_j\cdot x_t \gt a_j\cdot x }[/math] for all feasible x. Then, we do a "feasibility cut" at [math]\displaystyle{ x_t }[/math]: we cut from the ellipsoid all points y for which [math]\displaystyle{ a_j\cdot y \gt a_j\cdot x_t }[/math]. These points are definitely not feasible.

After making a cut, we construct a new, smaller ellipsoid, that contains the remaining region. It can be shown that this process converges to an approximate solution, in time polynomial in the required accuracy.

Converting a weak oracle to a strong oracle

Given a weak separation oracle for a polyhedron, it is possible to construct a strong separation oracle by a careful method of rounding, or by diophantine approximations.[1]:159

See also

References