MRF optimization via dual decomposition
In dual decomposition a problem is broken into smaller subproblems and a solution to the relaxed problem is found. This method can be employed for MRF optimization.[1] Dual decomposition is applied to markov logic programs as an inference technique.[2]
Background
Discrete MRF Optimization (inference) is very important in Machine Learning and Computer vision, which is realized on CUDA graphical processing units.[3] Consider a graph [math]\displaystyle{ G = (V,E) }[/math]with nodes [math]\displaystyle{ V }[/math]and Edges [math]\displaystyle{ E }[/math]. The goal is to assign a label [math]\displaystyle{ l_p }[/math]to each [math]\displaystyle{ p \in V }[/math]so that the MRF Energy is minimized:
(1) [math]\displaystyle{ \min \Sigma_{p\in V} \theta_p(l_p) + \Sigma_{pq\in \varepsilon} \theta_{pq}(l_p)(l_q) }[/math]
Major MRF Optimization methods are based on Graph cuts or Message passing. They rely on the following integer linear programming formulation
(2) [math]\displaystyle{ \min_x E(\theta, x)= \theta.x = \sum_{p \in V} \theta_p.x_p+\sum_{pq \in \varepsilon} \theta_{pq}.x_{pq} }[/math]
In many applications, the MRF-variables are {0,1}-variables that satisfy: [math]\displaystyle{ x_p(l)=1 }[/math][math]\displaystyle{ \Leftrightarrow }[/math]label [math]\displaystyle{ l }[/math] is assigned to [math]\displaystyle{ p }[/math], while [math]\displaystyle{ x_{pq}(l,l^\prime) =1 }[/math] , labels [math]\displaystyle{ l,l^\prime }[/math] are assigned to [math]\displaystyle{ p,q }[/math].
Dual Decomposition
The main idea behind decomposition is surprisingly simple:
- decompose your original complex problem into smaller solvable subproblems,
- extract a solution by cleverly combining the solutions from these subproblems.
A sample problem to decompose:
[math]\displaystyle{ \min_x \Sigma_{i} f^i(x) }[/math]where [math]\displaystyle{ x \in C }[/math]
In this problem, separately minimizing every single [math]\displaystyle{ f^i(x) }[/math]over [math]\displaystyle{ x }[/math] is easy; but minimizing their sum is a complex problem. So the problem needs to get decomposed using auxiliary variables [math]\displaystyle{ \{x^i\} }[/math]and the problem will be as follows:
[math]\displaystyle{ \min_{\{x^i\},x} \Sigma_i f^i(x^i) }[/math]where [math]\displaystyle{ x^i \in C, x^i=x }[/math]
Now we can relax the constraints by multipliers [math]\displaystyle{ \{\lambda^i\} }[/math] which gives us the following Lagrangian dual function:
[math]\displaystyle{ g(\{\lambda^i\}) =\min_{\{x^i \in C\},x} \Sigma_i f^i(x^i) + \Sigma_i \lambda^i.(x^i-x)=\min_{\{x^i \in C\},x} \Sigma_i [f^i(x^i)+\lambda^i.x^i]-(\Sigma_i \lambda^i)x }[/math]
Now we eliminate [math]\displaystyle{ x }[/math] from the dual function by minimizing over [math]\displaystyle{ x }[/math] and dual function becomes:
[math]\displaystyle{ g(\{\lambda^i\})=\min_{\{x^i \in C\}}\Sigma_i[f^i(x^i) + \lambda^i.x^i] }[/math]
We can set up a Lagrangian dual problem:
(3) [math]\displaystyle{ \max_{\{\lambda^i\}\in \Lambda} g({\lambda^i})=\Sigma_i g^i(x^i), }[/math] The Master problem
(4) [math]\displaystyle{ g^i(x^i)=min_{x^i} f^i(x^i) +\lambda^i.x^i }[/math]where [math]\displaystyle{ x^i \in C }[/math]The Slave problems
MRF optimization via Dual Decomposition
The original MRF optimization problem is NP-hard and we need to transform it into something easier.
[math]\displaystyle{ \tau }[/math]is a set of sub-trees of graph [math]\displaystyle{ G }[/math]where its trees cover all nodes and edges of the main graph. And MRFs defined for every tree [math]\displaystyle{ T }[/math] in [math]\displaystyle{ \tau }[/math]will be smaller. The vector of MRF parameters is [math]\displaystyle{ \theta^T }[/math]and the vector of MRF variables is [math]\displaystyle{ x^T }[/math], these two are just smaller in comparison with original MRF vectors [math]\displaystyle{ \theta, x }[/math]. For all vectors [math]\displaystyle{ \theta^T }[/math]we'll have the following:
(5) [math]\displaystyle{ \sum_{T \in \tau(p)} \theta_p^T= \theta_p, \sum_{T \in \tau(pq)} \theta_{pq}^T= \theta_{pq}. }[/math]
Where [math]\displaystyle{ \tau(p) }[/math]and [math]\displaystyle{ \tau(pq) }[/math]denote all trees of [math]\displaystyle{ \tau }[/math]than contain node [math]\displaystyle{ p }[/math]and edge [math]\displaystyle{ pq }[/math]respectively. We simply can write:
(6) [math]\displaystyle{ E(\theta,x)=\sum_{T \in \tau} E(\theta^T, x^T) }[/math]
And our constraints will be:
(7) [math]\displaystyle{ x^T \in \chi^T, x^T=x_{|T},\forall T \in \tau }[/math]
Our original MRF problem will become:
(8) [math]\displaystyle{ \min_{\{x^T\},x} \Sigma_{T \in \tau} E(\theta^T, x^T) }[/math]where [math]\displaystyle{ x^T \in \chi^T, \forall T \in \tau }[/math]and [math]\displaystyle{ x^T \in x_{|T}, \forall T \in \tau }[/math]
And we'll have the dual problem we were seeking:
(9) [math]\displaystyle{ \max_{\{\lambda^T\} \in \Lambda} g(\{\lambda^T\})= \sum_{T \in \tau} g^T(\lambda^T), }[/math]The Master problem
where each function [math]\displaystyle{ g^T(.) }[/math]is defined as:
(10) [math]\displaystyle{ g^T(\lambda^T)=\min_{x^T} E(\theta^T+ \lambda^T, x^T) }[/math]where [math]\displaystyle{ x^T \in \chi^T }[/math]The Slave problems
Theoretical Properties
Theorem 1. Lagrangian relaxation (9) is equivalent to the LP relaxation of (2).
[math]\displaystyle{ \min_{\{x^T\},x} \{E(x, \theta)|x_p^T=s_p,x^T \in \text{CONVEXHULL}(\chi^T)\} }[/math]
Theorem 2. If the sequence of multipliers [math]\displaystyle{ \{\alpha_t\} }[/math]satisfies [math]\displaystyle{ \alpha_t \geq 0, \lim_{t \to \infin} \alpha_t = 0, \sum_{t=0}^\infin \alpha_t=\infin }[/math]then the algorithm converges to the optimal solution of (9).
Theorem 3. The distance of the current solution [math]\displaystyle{ \{\theta^T\} }[/math]to the optimal solution [math]\displaystyle{ \{\bar \theta^T\} }[/math], which decreases at every iteration.
Theorem 4. Any solution obtained by the method satisfies the WTA (weak tree agreement) condition.
Theorem 5. For binary MRFs with sub-modular energies, the method computes a globally optimal solution.
References
- ↑ MRF Optimization via Dual Decomposition. http://www.csd.uoc.gr/~komod/publications/docs/ICCV07_Dual_Decomposition.pdf.
- ↑ Feng Niu and Ce Zhang and Christopher Re and Jude Shavlik (2012). "Scaling Inference for Markov Logic via Dual Decomposition". 2012 IEEE 12th International Conference on Data Mining. IEEE. doi:10.1109/icdm.2012.96.
- ↑ Shervin Rahimzadeh Arashloo and Josef Kittler (2013). "Efficient processing of MRFs for unconstrained-pose face recognition". 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS). IEEE. doi:10.1109/btas.2013.6712721.
Original source: https://en.wikipedia.org/wiki/MRF optimization via dual decomposition.
Read more |