Preconditioned Crank–Nicolson algorithm

From HandWiki
Revision as of 21:44, 6 March 2023 by Ohm (talk | contribs) (over-write)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In computational statistics, the preconditioned Crank–Nicolson algorithm (pCN) is a Markov chain Monte Carlo (MCMC) method for obtaining random samples – sequences of random observations – from a target probability distribution for which direct sampling is difficult. The most significant feature of the pCN algorithm is its dimension robustness, which makes it well-suited for high-dimensional sampling problems. The pCN algorithm is well-defined, with non-degenerate acceptance probability, even for target distributions on infinite-dimensional Hilbert spaces. As a consequence, when pCN is implemented on a real-world computer in large but finite dimension N, i.e. on an N-dimensional subspace of the original Hilbert space, the convergence properties (such as ergodicity) of the algorithm are independent of N. This is in strong contrast to schemes such as Gaussian random walk Metropolis–Hastings and the Metropolis-adjusted Langevin algorithm, whose acceptance probability degenerates to zero as N tends to infinity.

The algorithm as named was highlighted in 2013 by Cotter, Roberts, Stuart and White,[1] and its ergodicity properties were proved a year later by Hairer, Stuart and Vollmer.[2] In the specific context of sampling diffusion bridges, the method was introduced in 2008.[3]

Description of the algorithm

Overview

The pCN algorithm generates a Markov chain [math]\displaystyle{ (X_{n})_{n \in \mathbb{N}} }[/math] on a Hilbert space [math]\displaystyle{ \mathcal{H} }[/math] whose invariant measure is a probability measure [math]\displaystyle{ \mu }[/math] of the form

[math]\displaystyle{ \mu(E) = \frac{1}{Z} \int_{E} \exp(- \Phi(x)) \, \mu_{0} (\mathrm{d} x) }[/math]

for each measurable set [math]\displaystyle{ E \subseteq \mathcal{H} }[/math], with normalising constant [math]\displaystyle{ Z }[/math] given by

[math]\displaystyle{ Z = \int_{\mathcal{H}} \exp(- \Phi(x)) \, \mu_{0} (\mathrm{d} x) , }[/math]

where [math]\displaystyle{ \mu_{0} = \mathcal{N}(0, C_{0}) }[/math] is a Gaussian measure on [math]\displaystyle{ \mathcal{H} }[/math] with covariance operator [math]\displaystyle{ C_{0} }[/math] and [math]\displaystyle{ \Phi \colon \mathcal{H} \to \mathbb{R} }[/math] is some function. Thus, the pCN method applied to target probability measures that are re-weightings of a reference Gaussian measure.

The Metropolis–Hastings algorithm is a general class of methods that try to produce such Markov chains [math]\displaystyle{ (X_{n})_{n \in \mathbb{N}} }[/math], and do so by a two-step procedure of first proposing a new state [math]\displaystyle{ X'_{n + 1} }[/math] given the current state [math]\displaystyle{ X_{n} }[/math] and then accepting or rejecting this proposal, according to a particular acceptance probability, to define the next state [math]\displaystyle{ X_{n + 1} }[/math]. The idea of the pCN algorithm is that a clever choice of (non-symmetric) proposal for a new state [math]\displaystyle{ X'_{n + 1} }[/math] given [math]\displaystyle{ X_{n} }[/math] might have an associated acceptance probability function with very desirable properties.

The pCN proposal

The special form of this pCN proposal is to take

[math]\displaystyle{ X'_{n + 1} = \sqrt{ 1 - \beta^{2} } X_{n} + \beta \Xi_{n + 1} , }[/math]
[math]\displaystyle{ \Xi_{n + 1} \sim \mu_{0} \text{ i.i.d.} }[/math]

or, equivalently,

[math]\displaystyle{ X'_{n + 1} | X_{n} \sim \mathcal{N} \left( \sqrt{ 1 - \beta^{2} } X_{n} , \beta^2 C_{0} \right) . }[/math]

The parameter [math]\displaystyle{ 0 \lt \beta \lt 1 }[/math] is a step size that can be chosen freely (and even optimised for statistical efficiency). One then generates [math]\displaystyle{ Z_{n + 1} \sim \mathrm{Unif}([0, 1]) }[/math] and sets

[math]\displaystyle{ X_{n + 1} = X'_{n + 1} \text{ if } Z_{n + 1} \leq \alpha(X_{n}, X'_{n + 1}) , }[/math]
[math]\displaystyle{ X_{n + 1} = X_{n} \text{ if } Z_{n + 1} \gt \alpha(X_{n}, X'_{n + 1}) . }[/math]

The acceptance probability takes the simple form

[math]\displaystyle{ \alpha(x, x') = \min ( 1 , \exp ( \phi(x) - \phi(x') ) ). }[/math]

It can be shown[2] that this method not only defines a Markov chain that satisfies detailed balance with respect to the target distribution [math]\displaystyle{ \mu }[/math], and hence has [math]\displaystyle{ \mu }[/math] as an invariant measure, but also possesses a spectral gap that is independent of the dimension of [math]\displaystyle{ \mathcal{H} }[/math], and so the law of [math]\displaystyle{ X_{n} }[/math] converges to [math]\displaystyle{ \mu }[/math] as [math]\displaystyle{ n \to \infty }[/math]. Thus, although one may still have to tune the step size parameter [math]\displaystyle{ \beta }[/math] to achieve a desired level of statistical efficiency, the performance of the pCN method is robust to the dimension of the sampling problem being considered.

Contrast with symmetric proposals

This behaviour of pCN is in stark contrast to the Gaussian random walk proposal

[math]\displaystyle{ X'_{n + 1} \mid X_n \sim \mathcal{N} \left( X_n, \beta \Gamma \right) }[/math]

with any choice of proposal covariance [math]\displaystyle{ \Gamma }[/math], or indeed any symmetric proposal mechanism. It can be shown using the Cameron–Martin theorem that for infinite-dimensional [math]\displaystyle{ \mathcal{H} }[/math] this proposal has acceptance probability zero for [math]\displaystyle{ \mu }[/math]-almost all [math]\displaystyle{ X'_{n+1} }[/math] and [math]\displaystyle{ X_n }[/math]. In practice, when one implements the Gaussian random walk proposal in dimension [math]\displaystyle{ N }[/math], this phenomenon can be seen in the way that

  • for fixed [math]\displaystyle{ \beta }[/math], the acceptance probability tends to zero as [math]\displaystyle{ N \to \infty }[/math], and
  • for a fixed desired positive acceptance probability, [math]\displaystyle{ \beta \to 0 }[/math] as [math]\displaystyle{ N \to \infty }[/math].

References

  1. Cotter, S. L.; Roberts, G. O.; Stuart, A. M.; White, D. (2013). "MCMC methods for functions: modifying old algorithms to make them faster". Statist. Sci. 28 (3): 424–446. doi:10.1214/13-STS421. ISSN 0883-4237. 
  2. 2.0 2.1 Hairer, M.; Stuart, A. M.; Vollmer, S. J. (2014). "Spectral gaps for a Metropolis–Hastings algorithm in infinite dimensions". Ann. Appl. Probab. 24 (6): 2455–2490. doi:10.1214/13-AAP982. ISSN 1050-5164. 
  3. Beskos, A.; Roberts, G. O.; Stuart, A. M.; Voss, J. (2008). "MCMC Methods for Diffusion Bridges". Stochastics and Dynamics 8 (3): 319–350. doi:10.1142/S0219493708002378. https://authors.library.caltech.edu/69496/.