# Metropolis-adjusted Langevin algorithm

__: Markov Chain Monte Carlo algorithm__

**Short description**In computational statistics, the **Metropolis-adjusted Langevin algorithm (MALA)** or **Langevin Monte Carlo (LMC)** is a Markov chain Monte Carlo (MCMC) method for obtaining random samples – sequences of random observations – from a probability distribution for which direct sampling is difficult. As the name suggests, MALA uses a combination of two mechanisms to generate the states of a random walk that has the target probability distribution as an invariant measure:

- new states are proposed using (overdamped)
**Langevin dynamics**, which use evaluations of the gradient of the target probability density function; - these proposals are accepted or rejected using the
**Metropolis–Hastings algorithm**, which uses evaluations of the target probability density (but not its gradient).

Informally, the Langevin dynamics drive the random walk towards regions of high probability in the manner of a gradient flow, while the Metropolis–Hastings accept/reject mechanism improves the mixing and convergence properties of this random walk. MALA was originally proposed by Julian Besag in 1994,^{[1]} (although the method Smart Monte Carlo was already introduced in 1978 ^{[2]}) and its properties were examined in detail by Gareth Roberts together with Richard Tweedie^{[3]} and Jeff Rosenthal.^{[4]} Many variations and refinements have been introduced since then, e.g. the manifold variant of Girolami and Calderhead (2011).^{[5]} The method is equivalent to using the Hamiltonian Monte Carlo (hybrid Monte Carlo) algorithm with only a single discrete time step.^{[5]}

## Further details

Let [math]\displaystyle{ \pi }[/math] denote a probability density function on [math]\displaystyle{ \mathbb{R}^{d} }[/math], one from which it is desired to draw an ensemble of independent and identically distributed samples. We consider the overdamped Langevin Itô diffusion

- [math]\displaystyle{ \dot{X} = \nabla \log \pi(X) + \sqrt{2} \dot{W} }[/math]

driven by the time derivative of a standard Brownian motion [math]\displaystyle{ W }[/math]. (Note that another commonly-used normalization for this diffusion is

- [math]\displaystyle{ \dot{X} = \frac{1}{2} \nabla \log \pi(X) + \dot{W}, }[/math]

which generates the same dynamics.) In the limit as [math]\displaystyle{ t \to \infty }[/math], this probability distribution [math]\displaystyle{ \rho(t) }[/math] of [math]\displaystyle{ X(t) }[/math] approaches a stationary distribution, which is also invariant under the diffusion, which we denote [math]\displaystyle{ \rho_\infty }[/math]. It turns out that, in fact, [math]\displaystyle{ \rho_\infty = \pi }[/math].

Approximate sample paths of the Langevin diffusion can be generated by many discrete-time methods. One of the simplest is the Euler–Maruyama method with a fixed time step [math]\displaystyle{ \tau \gt 0 }[/math]. We set [math]\displaystyle{ X_0 := x_0 }[/math] and then recursively define an approximation [math]\displaystyle{ X_k }[/math] to the true solution [math]\displaystyle{ X(k \tau) }[/math] by

- [math]\displaystyle{ X_{k + 1} := X_k + \tau \nabla \log \pi(X_k) + \sqrt{2 \tau} \xi_k, }[/math]

where each [math]\displaystyle{ \xi_{k} }[/math] is an independent draw from a multivariate normal distribution on [math]\displaystyle{ \mathbb{R}^{d} }[/math] with mean 0 and covariance matrix equal to the [math]\displaystyle{ d \times d }[/math] identity matrix. Note that [math]\displaystyle{ X_{k + 1} }[/math] is normally distributed with mean [math]\displaystyle{ X_k + \tau \nabla \log \pi(X_k) }[/math] and covariance equal to [math]\displaystyle{ 2 \tau }[/math] times the [math]\displaystyle{ d \times d }[/math] identity matrix.

In contrast to the Euler–Maruyama method for simulating the Langevin diffusion, which always updates [math]\displaystyle{ X_k }[/math] according to the update rule

- [math]\displaystyle{ X_{k + 1} := X_k + \tau \nabla \log \pi(X_k) + \sqrt{2 \tau} \xi_k, }[/math]

MALA incorporates an additional step. We consider the above update rule as defining a *proposal* [math]\displaystyle{ \tilde{X}_{k + 1} }[/math] for a new state,

- [math]\displaystyle{ \tilde{X}_{k + 1} := X_k + \tau \nabla \log \pi(X_k) + \sqrt{2 \tau} \xi_k. }[/math]

This proposal is accepted or rejected according to the Metropolis-Hastings algorithm: set

- [math]\displaystyle{ \alpha := \min \left\{ 1 , \frac{\pi(\tilde{X}_{k + 1}) q(X_{k}\mid\tilde{X}_{k + 1})}{\pi({X}_{k}) q(\tilde{X}_{k + 1}\mid X_k)} \right\}, }[/math]

where

- [math]\displaystyle{ q(x'\mid x) \propto \exp \left( - \frac{1}{4 \tau} \| x' - x - \tau \nabla \log \pi(x) \|_2^2 \right) }[/math]

is the transition probability density from [math]\displaystyle{ x }[/math] to [math]\displaystyle{ x' }[/math] (note that, in general [math]\displaystyle{ q(x'\mid x) \neq q(x\mid x') }[/math]). Let [math]\displaystyle{ u }[/math] be drawn from the continuous uniform distribution on the interval [math]\displaystyle{ [0, 1] }[/math]. If [math]\displaystyle{ u \leq \alpha }[/math], then the proposal is accepted, and we set [math]\displaystyle{ X_{k + 1} := \tilde{X}_{k + 1} }[/math]; otherwise, the proposal is rejected, and we set [math]\displaystyle{ X_{k + 1} := X_k }[/math].

The combined dynamics of the Langevin diffusion and the Metropolis–Hastings algorithm satisfy the detailed balance conditions necessary for the existence of a unique, invariant, stationary distribution [math]\displaystyle{ \rho_{\infty} = \pi }[/math]. Compared to naive Metropolis–Hastings, MALA has the advantage that it usually proposes moves into regions of higher [math]\displaystyle{ \pi }[/math] probability, which are then more likely to be accepted. On the other hand, when [math]\displaystyle{ \pi }[/math] is strongly anisotropic (i.e. it varies much more quickly in some directions than others), it is necessary to take [math]\displaystyle{ 0 \lt \tau \ll 1 }[/math] in order to properly capture the Langevin dynamics; the use of a positive-definite preconditioning matrix [math]\displaystyle{ A \in \mathbb{R}^{d \times d} }[/math] can help to alleviate this problem, by generating proposals according to

- [math]\displaystyle{ \tilde{X}_{k + 1} := X_k + \tau A \nabla \log \pi(X_k) + \sqrt{2 \tau A} \xi_k, }[/math]

so that [math]\displaystyle{ \tilde{X}_{k + 1} }[/math] has mean [math]\displaystyle{ X_k + \tau A \nabla \log \pi(X_k) }[/math] and covariance [math]\displaystyle{ 2 \tau A }[/math].

For limited classes of target distributions, the optimal acceptance rate for this algorithm can be shown to be [math]\displaystyle{ 0.574 }[/math]; if it is discovered to be substantially different in practice, [math]\displaystyle{ \tau }[/math] should be modified accordingly.^{[4]}

## References

- ↑ J. Besag (1994). "Comments on "Representations of knowledge in complex systems" by U. Grenander and MI Miller".
*Journal of the Royal Statistical Society, Series B***56**: 591–592. - ↑ Rossky, P.J.; Doll, J.D.; Friedman, H.L. (1978). "Brownian Dynamics as smart Monte Carlo simulation".
*J. Chem. Physics***69**(10): 4628. - ↑ G. O. Roberts and R. L. Tweedie (1996). "Exponential convergence of Langevin distributions and their discrete approximations".
*Bernoulli***2**(4): 341–363. doi:10.2307/3318418. - ↑
^{4.0}^{4.1}G. O. Roberts and J. S. Rosenthal (1998). "Optimal scaling of discrete approximations to Langevin diffusions".*Journal of the Royal Statistical Society, Series B***60**(1): 255–268. doi:10.1111/1467-9868.00123. - ↑
^{5.0}^{5.1}M. Girolami and B. Calderhead (2011). "Riemann manifold Langevin and Hamiltonian Monte Carlo methods".*Journal of the Royal Statistical Society, Series B***73**(2): 123–214. doi:10.1111/j.1467-9868.2010.00765.x.

Original source: https://en.wikipedia.org/wiki/Metropolis-adjusted Langevin algorithm.
Read more |