MRF optimization via dual decomposition

From HandWiki

In dual decomposition a problem is broken into smaller subproblems and a solution to the relaxed problem is found. This method can be employed for MRF optimization.[1] Dual decomposition is applied to markov logic programs as an inference technique.[2]

Background

Discrete MRF Optimization (inference) is very important in Machine Learning and Computer vision, which is realized on CUDA graphical processing units.[3] Consider a graph [math]\displaystyle{ G = (V,E) }[/math]with nodes [math]\displaystyle{ V }[/math]and Edges [math]\displaystyle{ E }[/math]. The goal is to assign a label [math]\displaystyle{ l_p }[/math]to each [math]\displaystyle{ p \in V }[/math]so that the MRF Energy is minimized:

(1) [math]\displaystyle{ \min \Sigma_{p\in V} \theta_p(l_p) + \Sigma_{pq\in \varepsilon} \theta_{pq}(l_p)(l_q) }[/math]

Major MRF Optimization methods are based on Graph cuts or Message passing. They rely on the following integer linear programming formulation

(2) [math]\displaystyle{ \min_x E(\theta, x)= \theta.x = \sum_{p \in V} \theta_p.x_p+\sum_{pq \in \varepsilon} \theta_{pq}.x_{pq} }[/math]

In many applications, the MRF-variables are {0,1}-variables that satisfy: [math]\displaystyle{ x_p(l)=1 }[/math][math]\displaystyle{ \Leftrightarrow }[/math]label [math]\displaystyle{ l }[/math] is assigned to [math]\displaystyle{ p }[/math], while [math]\displaystyle{ x_{pq}(l,l^\prime) =1 }[/math] , labels [math]\displaystyle{ l,l^\prime }[/math] are assigned to [math]\displaystyle{ p,q }[/math].

Dual Decomposition

The main idea behind decomposition is surprisingly simple:

  1. decompose your original complex problem into smaller solvable subproblems,
  2. extract a solution by cleverly combining the solutions from these subproblems.

A sample problem to decompose:

[math]\displaystyle{ \min_x \Sigma_{i} f^i(x) }[/math]where [math]\displaystyle{ x \in C }[/math]

In this problem, separately minimizing every single [math]\displaystyle{ f^i(x) }[/math]over [math]\displaystyle{ x }[/math] is easy; but minimizing their sum is a complex problem. So the problem needs to get decomposed using auxiliary variables [math]\displaystyle{ \{x^i\} }[/math]and the problem will be as follows:

[math]\displaystyle{ \min_{\{x^i\},x} \Sigma_i f^i(x^i) }[/math]where [math]\displaystyle{ x^i \in C, x^i=x }[/math]

Now we can relax the constraints by multipliers [math]\displaystyle{ \{\lambda^i\} }[/math] which gives us the following Lagrangian dual function:

[math]\displaystyle{ g(\{\lambda^i\}) =\min_{\{x^i \in C\},x} \Sigma_i f^i(x^i) + \Sigma_i \lambda^i.(x^i-x)=\min_{\{x^i \in C\},x} \Sigma_i [f^i(x^i)+\lambda^i.x^i]-(\Sigma_i \lambda^i)x }[/math]

Now we eliminate [math]\displaystyle{ x }[/math] from the dual function by minimizing over [math]\displaystyle{ x }[/math] and dual function becomes:

[math]\displaystyle{ g(\{\lambda^i\})=\min_{\{x^i \in C\}}\Sigma_i[f^i(x^i) + \lambda^i.x^i] }[/math]

We can setup a Lagrangian dual problem:

(3) [math]\displaystyle{ \max_{\{\lambda^i\}\in \Lambda} g({\lambda^i})=\Sigma_i g^i(x^i), }[/math] The Master problem

(4) [math]\displaystyle{ g^i(x^i)=min_{x^i} f^i(x^i) +\lambda^i.x^i }[/math]where [math]\displaystyle{ x^i \in C }[/math]The Slave problems

MRF optimization via Dual Decomposition

The original MRF optimization problem is NP-hard and we need to transform it into something easier.

[math]\displaystyle{ \tau }[/math]is a set of sub-trees of graph [math]\displaystyle{ G }[/math]where its trees cover all nodes and edges of the main graph. And MRFs defined for every tree [math]\displaystyle{ T }[/math] in [math]\displaystyle{ \tau }[/math]will be smaller. The vector of MRF parameters is [math]\displaystyle{ \theta^T }[/math]and the vector of MRF variables is [math]\displaystyle{ x^T }[/math], these two are just smaller in comparison with original MRF vectors [math]\displaystyle{ \theta, x }[/math]. For all vectors [math]\displaystyle{ \theta^T }[/math]we'll have the following:

(5) [math]\displaystyle{ \sum_{T \in \tau(p)} \theta_p^T= \theta_p, \sum_{T \in \tau(pq)} \theta_{pq}^T= \theta_{pq}. }[/math]

Where [math]\displaystyle{ \tau(p) }[/math]and [math]\displaystyle{ \tau(pq) }[/math]denote all trees of [math]\displaystyle{ \tau }[/math]than contain node [math]\displaystyle{ p }[/math]and edge [math]\displaystyle{ pq }[/math]respectively. We simply can write:

(6) [math]\displaystyle{ E(\theta,x)=\sum_{T \in \tau} E(\theta^T, x^T) }[/math]

And our constraints will be:

(7) [math]\displaystyle{ x^T \in \chi^T, x^T=x_{|T},\forall T \in \tau }[/math]

Our original MRF problem will become:

(8) [math]\displaystyle{ \min_{\{x^T\},x} \Sigma_{T \in \tau} E(\theta^T, x^T) }[/math]where [math]\displaystyle{ x^T \in \chi^T, \forall T \in \tau }[/math]and [math]\displaystyle{ x^T \in x_{|T}, \forall T \in \tau }[/math]

And we'll have the dual problem we were seeking:

(9) [math]\displaystyle{ \max_{\{\lambda^T\} \in \Lambda} g(\{\lambda^T\})= \sum_{T \in \tau} g^T(\lambda^T), }[/math]The Master problem

where each function [math]\displaystyle{ g^T(.) }[/math]is defined as:

(10) [math]\displaystyle{ g^T(\lambda^T)=\min_{x^T} E(\theta^T+ \lambda^T, x^T) }[/math]where [math]\displaystyle{ x^T \in \chi^T }[/math]The Slave problems

Theoretical Properties

Theorem 1. Lagrangian relaxation (9) is equivalent to the LP relaxation of (2).

[math]\displaystyle{ \min_{\{x^T\},x} \{E(x, \theta)|x_p^T=s_p,x^T \in \text{CONVEXHULL}(\chi^T)\} }[/math]

Theorem 2. If the sequence of multipliers [math]\displaystyle{ \{\alpha_t\} }[/math]satisfies [math]\displaystyle{ \alpha_t \geq 0, \lim_{t \to \infin} \alpha_t = 0, \sum_{t=0}^\infin \alpha_t=\infin }[/math]then the algorithm converges to the optimal solution of (9).

Theorem 3. The distance of the current solution [math]\displaystyle{ \{\theta^T\} }[/math]to the optimal solution [math]\displaystyle{ \{\bar \theta^T\} }[/math], which decreases at every iteration.

Theorem 4. Any solution obtained by the method satisfies the WTA (weak tree agreement) condition.

Theorem 5. For binary MRFs with sub-modular energies, the method computes a globally optimal solution.

References

  1. MRF Optimization via Dual Decomposition. http://www.csd.uoc.gr/~komod/publications/docs/ICCV07_Dual_Decomposition.pdf. 
  2. Feng Niu and Ce Zhang and Christopher Re and Jude Shavlik (2012). "Scaling Inference for Markov Logic via Dual Decomposition". 2012 IEEE 12th International Conference on Data Mining. IEEE. doi:10.1109/icdm.2012.96. 
  3. Shervin Rahimzadeh Arashloo and Josef Kittler (2013). "Efficient processing of MRFs for unconstrained-pose face recognition". 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS). IEEE. doi:10.1109/btas.2013.6712721.