Short description: Markov Chain Monte Carlo algorithm

In computational statistics, the Metropolis-adjusted Langevin algorithm (MALA) or Langevin Monte Carlo (LMC) is a Markov chain Monte Carlo (MCMC) method for obtaining random samples – sequences of random observations – from a probability distribution for which direct sampling is difficult. As the name suggests, MALA uses a combination of two mechanisms to generate the states of a random walk that has the target probability distribution as an invariant measure:

Informally, the Langevin dynamics drive the random walk towards regions of high probability in the manner of a gradient flow, while the Metropolis–Hastings accept/reject mechanism improves the mixing and convergence properties of this random walk. MALA was originally proposed by Julian Besag in 1994,[1] (although the method Smart Monte Carlo was already introduced in 1978 [2]) and its properties were examined in detail by Gareth Roberts together with Richard Tweedie[3] and Jeff Rosenthal.[4] Many variations and refinements have been introduced since then, e.g. the manifold variant of Girolami and Calderhead (2011).[5] The method is equivalent to using the Hamiltonian Monte Carlo (hybrid Monte Carlo) algorithm with only a single discrete time step.[5]

## Further details

Let $\displaystyle{ \pi }$ denote a probability density function on $\displaystyle{ \mathbb{R}^{d} }$, one from which it is desired to draw an ensemble of independent and identically distributed samples. We consider the overdamped Langevin Itô diffusion

$\displaystyle{ \dot{X} = \nabla \log \pi(X) + \sqrt{2} \dot{W} }$

driven by the time derivative of a standard Brownian motion $\displaystyle{ W }$. (Note that another commonly-used normalization for this diffusion is

$\displaystyle{ \dot{X} = \frac{1}{2} \nabla \log \pi(X) + \dot{W}, }$

which generates the same dynamics.) In the limit as $\displaystyle{ t \to \infty }$, this probability distribution $\displaystyle{ \rho(t) }$ of $\displaystyle{ X(t) }$ approaches a stationary distribution, which is also invariant under the diffusion, which we denote $\displaystyle{ \rho_\infty }$. It turns out that, in fact, $\displaystyle{ \rho_\infty = \pi }$.

Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. One of the simplest is the Euler–Maruyama method with a fixed time step $\displaystyle{ \tau \gt 0 }$. We set $\displaystyle{ X_0 := x_0 }$ and then recursively define an approximation $\displaystyle{ X_k }$ to the true solution $\displaystyle{ X(k \tau) }$ by

$\displaystyle{ X_{k + 1} := X_k + \tau \nabla \log \pi(X_k) + \sqrt{2 \tau} \xi_k, }$

where each $\displaystyle{ \xi_{k} }$ is an independent draw from a multivariate normal distribution on $\displaystyle{ \mathbb{R}^{d} }$ with mean 0 and covariance matrix equal to the $\displaystyle{ d \times d }$ identity matrix. Note that $\displaystyle{ X_{k + 1} }$ is normally distributed with mean $\displaystyle{ X_k + \tau \nabla \log \pi(X_k) }$ and covariance equal to $\displaystyle{ 2 \tau }$ times the $\displaystyle{ d \times d }$ identity matrix.

In contrast to the Euler–Maruyama method for simulating the Langevin diffusion, which always updates $\displaystyle{ X_k }$ according to the update rule

$\displaystyle{ X_{k + 1} := X_k + \tau \nabla \log \pi(X_k) + \sqrt{2 \tau} \xi_k, }$

MALA incorporates an additional step. We consider the above update rule as defining a proposal $\displaystyle{ \tilde{X}_{k + 1} }$ for a new state,

$\displaystyle{ \tilde{X}_{k + 1} := X_k + \tau \nabla \log \pi(X_k) + \sqrt{2 \tau} \xi_k. }$

This proposal is accepted or rejected according to the Metropolis-Hastings algorithm: set

$\displaystyle{ \alpha := \min \left\{ 1 , \frac{\pi(\tilde{X}_{k + 1}) q(X_{k}\mid\tilde{X}_{k + 1})}{\pi({X}_{k}) q(\tilde{X}_{k + 1}\mid X_k)} \right\}, }$

where

$\displaystyle{ q(x'\mid x) \propto \exp \left( - \frac{1}{4 \tau} \| x' - x - \tau \nabla \log \pi(x) \|_2^2 \right) }$

is the transition probability density from $\displaystyle{ x }$ to $\displaystyle{ x' }$ (note that, in general $\displaystyle{ q(x'\mid x) \neq q(x\mid x') }$). Let $\displaystyle{ u }$ be drawn from the continuous uniform distribution on the interval $\displaystyle{ [0, 1] }$. If $\displaystyle{ u \leq \alpha }$, then the proposal is accepted, and we set $\displaystyle{ X_{k + 1} := \tilde{X}_{k + 1} }$; otherwise, the proposal is rejected, and we set $\displaystyle{ X_{k + 1} := X_k }$.

The combined dynamics of the Langevin diffusion and the Metropolis–Hastings algorithm satisfy the detailed balance conditions necessary for the existence of a unique, invariant, stationary distribution $\displaystyle{ \rho_{\infty} = \pi }$. Compared to naive Metropolis–Hastings, MALA has the advantage that it usually proposes moves into regions of higher $\displaystyle{ \pi }$ probability, which are then more likely to be accepted. On the other hand, when $\displaystyle{ \pi }$ is strongly anisotropic (i.e. it varies much more quickly in some directions than others), it is necessary to take $\displaystyle{ 0 \lt \tau \ll 1 }$ in order to properly capture the Langevin dynamics; the use of a positive-definite preconditioning matrix $\displaystyle{ A \in \mathbb{R}^{d \times d} }$ can help to alleviate this problem, by generating proposals according to

$\displaystyle{ \tilde{X}_{k + 1} := X_k + \tau A \nabla \log \pi(X_k) + \sqrt{2 \tau A} \xi_k, }$

so that $\displaystyle{ \tilde{X}_{k + 1} }$ has mean $\displaystyle{ X_k + \tau A \nabla \log \pi(X_k) }$ and covariance $\displaystyle{ 2 \tau A }$.

For limited classes of target distributions, the optimal acceptance rate for this algorithm can be shown to be $\displaystyle{ 0.574 }$; if it is discovered to be substantially different in practice, $\displaystyle{ \tau }$ should be modified accordingly.[4]

## References

1. J. Besag (1994). "Comments on "Representations of knowledge in complex systems" by U. Grenander and MI Miller". Journal of the Royal Statistical Society, Series B 56: 591–592.
2. Rossky, P.J.; Doll, J.D.; Friedman, H.L. (1978). "Brownian Dynamics as smart Monte Carlo simulation". J. Chem. Physics 69 (10): 4628.
3. G. O. Roberts and R. L. Tweedie (1996). "Exponential convergence of Langevin distributions and their discrete approximations". Bernoulli 2 (4): 341–363. doi:10.2307/3318418.
4. G. O. Roberts and J. S. Rosenthal (1998). "Optimal scaling of discrete approximations to Langevin diffusions". Journal of the Royal Statistical Society, Series B 60 (1): 255–268. doi:10.1111/1467-9868.00123.
5. M. Girolami and B. Calderhead (2011). "Riemann manifold Langevin and Hamiltonian Monte Carlo methods". Journal of the Royal Statistical Society, Series B 73 (2): 123–214. doi:10.1111/j.1467-9868.2010.00765.x.