# MRF optimization via dual decomposition

In dual decomposition a problem is broken into smaller subproblems and a solution to the relaxed problem is found. This method can be employed for MRF optimization. Dual decomposition is applied to markov logic programs as an inference technique.

## Background

Discrete MRF Optimization (inference) is very important in Machine Learning and Computer vision, which is realized on CUDA graphical processing units. Consider a graph $\displaystyle{ G = (V,E) }$with nodes $\displaystyle{ V }$and Edges $\displaystyle{ E }$. The goal is to assign a label $\displaystyle{ l_p }$to each $\displaystyle{ p \in V }$so that the MRF Energy is minimized:

(1) $\displaystyle{ \min \Sigma_{p\in V} \theta_p(l_p) + \Sigma_{pq\in \varepsilon} \theta_{pq}(l_p)(l_q) }$

Major MRF Optimization methods are based on Graph cuts or Message passing. They rely on the following integer linear programming formulation

(2) $\displaystyle{ \min_x E(\theta, x)= \theta.x = \sum_{p \in V} \theta_p.x_p+\sum_{pq \in \varepsilon} \theta_{pq}.x_{pq} }$

In many applications, the MRF-variables are {0,1}-variables that satisfy: $\displaystyle{ x_p(l)=1 }$$\displaystyle{ \Leftrightarrow }$label $\displaystyle{ l }$ is assigned to $\displaystyle{ p }$, while $\displaystyle{ x_{pq}(l,l^\prime) =1 }$ , labels $\displaystyle{ l,l^\prime }$ are assigned to $\displaystyle{ p,q }$.

## Dual Decomposition

The main idea behind decomposition is surprisingly simple:

1. decompose your original complex problem into smaller solvable subproblems,
2. extract a solution by cleverly combining the solutions from these subproblems.

A sample problem to decompose:

$\displaystyle{ \min_x \Sigma_{i} f^i(x) }$where $\displaystyle{ x \in C }$

In this problem, separately minimizing every single $\displaystyle{ f^i(x) }$over $\displaystyle{ x }$ is easy; but minimizing their sum is a complex problem. So the problem needs to get decomposed using auxiliary variables $\displaystyle{ \{x^i\} }$and the problem will be as follows:

$\displaystyle{ \min_{\{x^i\},x} \Sigma_i f^i(x^i) }$where $\displaystyle{ x^i \in C, x^i=x }$

Now we can relax the constraints by multipliers $\displaystyle{ \{\lambda^i\} }$ which gives us the following Lagrangian dual function:

$\displaystyle{ g(\{\lambda^i\}) =\min_{\{x^i \in C\},x} \Sigma_i f^i(x^i) + \Sigma_i \lambda^i.(x^i-x)=\min_{\{x^i \in C\},x} \Sigma_i [f^i(x^i)+\lambda^i.x^i]-(\Sigma_i \lambda^i)x }$

Now we eliminate $\displaystyle{ x }$ from the dual function by minimizing over $\displaystyle{ x }$ and dual function becomes:

$\displaystyle{ g(\{\lambda^i\})=\min_{\{x^i \in C\}}\Sigma_i[f^i(x^i) + \lambda^i.x^i] }$

We can setup a Lagrangian dual problem:

(3) $\displaystyle{ \max_{\{\lambda^i\}\in \Lambda} g({\lambda^i})=\Sigma_i g^i(x^i), }$ The Master problem

(4) $\displaystyle{ g^i(x^i)=min_{x^i} f^i(x^i) +\lambda^i.x^i }$where $\displaystyle{ x^i \in C }$The Slave problems

## MRF optimization via Dual Decomposition

The original MRF optimization problem is NP-hard and we need to transform it into something easier.

$\displaystyle{ \tau }$is a set of sub-trees of graph $\displaystyle{ G }$where its trees cover all nodes and edges of the main graph. And MRFs defined for every tree $\displaystyle{ T }$ in $\displaystyle{ \tau }$will be smaller. The vector of MRF parameters is $\displaystyle{ \theta^T }$and the vector of MRF variables is $\displaystyle{ x^T }$, these two are just smaller in comparison with original MRF vectors $\displaystyle{ \theta, x }$. For all vectors $\displaystyle{ \theta^T }$we'll have the following:

(5) $\displaystyle{ \sum_{T \in \tau(p)} \theta_p^T= \theta_p, \sum_{T \in \tau(pq)} \theta_{pq}^T= \theta_{pq}. }$

Where $\displaystyle{ \tau(p) }$and $\displaystyle{ \tau(pq) }$denote all trees of $\displaystyle{ \tau }$than contain node $\displaystyle{ p }$and edge $\displaystyle{ pq }$respectively. We simply can write:

(6) $\displaystyle{ E(\theta,x)=\sum_{T \in \tau} E(\theta^T, x^T) }$

And our constraints will be:

(7) $\displaystyle{ x^T \in \chi^T, x^T=x_{|T},\forall T \in \tau }$

Our original MRF problem will become:

(8) $\displaystyle{ \min_{\{x^T\},x} \Sigma_{T \in \tau} E(\theta^T, x^T) }$where $\displaystyle{ x^T \in \chi^T, \forall T \in \tau }$and $\displaystyle{ x^T \in x_{|T}, \forall T \in \tau }$

And we'll have the dual problem we were seeking:

(9) $\displaystyle{ \max_{\{\lambda^T\} \in \Lambda} g(\{\lambda^T\})= \sum_{T \in \tau} g^T(\lambda^T), }$The Master problem

where each function $\displaystyle{ g^T(.) }$is defined as:

(10) $\displaystyle{ g^T(\lambda^T)=\min_{x^T} E(\theta^T+ \lambda^T, x^T) }$where $\displaystyle{ x^T \in \chi^T }$The Slave problems

## Theoretical Properties

Theorem 1. Lagrangian relaxation (9) is equivalent to the LP relaxation of (2).

$\displaystyle{ \min_{\{x^T\},x} \{E(x, \theta)|x_p^T=s_p,x^T \in \text{CONVEXHULL}(\chi^T)\} }$

Theorem 2. If the sequence of multipliers $\displaystyle{ \{\alpha_t\} }$satisfies $\displaystyle{ \alpha_t \geq 0, \lim_{t \to \infin} \alpha_t = 0, \sum_{t=0}^\infin \alpha_t=\infin }$then the algorithm converges to the optimal solution of (9).

Theorem 3. The distance of the current solution $\displaystyle{ \{\theta^T\} }$to the optimal solution $\displaystyle{ \{\bar \theta^T\} }$, which decreases at every iteration.

Theorem 4. Any solution obtained by the method satisfies the WTA (weak tree agreement) condition.

Theorem 5. For binary MRFs with sub-modular energies, the method computes a globally optimal solution.