# Coupling from the past

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Among Markov chain Monte Carlo (MCMC) algorithms, coupling from the past is a method for sampling from the stationary distribution of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from the stationary distribution. It was invented by James Propp and David Wilson in 1996.

## The basic idea

Consider a finite state irreducible aperiodic Markov chain $\displaystyle{ M }$ with state space $\displaystyle{ S }$ and (unique) stationary distribution $\displaystyle{ \pi }$ ($\displaystyle{ \pi }$ is a probability vector). Suppose that we come up with a probability distribution $\displaystyle{ \mu }$ on the set of maps $\displaystyle{ f:S\to S }$ with the property that for every fixed $\displaystyle{ s\in S }$, its image $\displaystyle{ f(s) }$ is distributed according to the transition probability of $\displaystyle{ M }$ from state $\displaystyle{ s }$. An example of such a probability distribution is the one where $\displaystyle{ f(s) }$ is independent from $\displaystyle{ f(s') }$ whenever $\displaystyle{ s\ne s' }$, but it is often worthwhile to consider other distributions. Now let $\displaystyle{ f_j }$ for $\displaystyle{ j\in\mathbb Z }$ be independent samples from $\displaystyle{ \mu }$.

Suppose that $\displaystyle{ x }$ is chosen randomly according to $\displaystyle{ \pi }$ and is independent from the sequence $\displaystyle{ f_j }$. (We do not worry for now where this $\displaystyle{ x }$ is coming from.) Then $\displaystyle{ f_{-1}(x) }$ is also distributed according to $\displaystyle{ \pi }$, because $\displaystyle{ \pi }$ is $\displaystyle{ M }$-stationary and our assumption on the law of $\displaystyle{ f }$. Define

$\displaystyle{ F_j:= f_{-1}\circ f_{-2}\circ\cdots\circ f_{-j}. }$

Then it follows by induction that $\displaystyle{ F_j(x) }$ is also distributed according to $\displaystyle{ \pi }$ for every $\displaystyle{ j\in\mathbb{N} }$. However, it may happen that for some $\displaystyle{ n\in\mathbb{N} }$ the image of the map $\displaystyle{ F_n }$ is a single element of $\displaystyle{ S }$. In other words, $\displaystyle{ F_n(x)=F_n(y) }$ for each $\displaystyle{ y\in S }$. Therefore, we do not need to have access to $\displaystyle{ x }$ in order to compute $\displaystyle{ F_n(x) }$. The algorithm then involves finding some $\displaystyle{ n\in \mathbb N }$ such that $\displaystyle{ F_n(S) }$ is a singleton, and outputting the element of that singleton. The design of a good distribution $\displaystyle{ \mu }$ for which the task of finding such an $\displaystyle{ n }$ and computing $\displaystyle{ F_n }$ is not too costly is not always obvious, but has been accomplished successfully in several important instances.[1]

## The monotone case

There is a special class of Markov chains in which there are particularly good choices for $\displaystyle{ \mu }$ and a tool for determining if $\displaystyle{ |F_n(S)|=1 }$. (Here $\displaystyle{ |\cdot| }$ denotes cardinality.) Suppose that $\displaystyle{ S }$ is a partially ordered set with order $\displaystyle{ \le }$, which has a unique minimal element $\displaystyle{ s_0 }$ and a unique maximal element $\displaystyle{ s_1 }$; that is, every $\displaystyle{ s\in S }$ satisfies $\displaystyle{ s_0\le s\le s_1 }$. Also, suppose that $\displaystyle{ \mu }$ may be chosen to be supported on the set of monotone maps $\displaystyle{ f:S\to S }$. Then it is easy to see that $\displaystyle{ |F_n(S)|=1 }$ if and only if $\displaystyle{ F_n(s_0)=F_n(s_1) }$, since $\displaystyle{ F_n }$ is monotone. Thus, checking this becomes rather easy. The algorithm can proceed by choosing $\displaystyle{ n:=n_0 }$ for some constant $\displaystyle{ n_0 }$, sampling the maps $\displaystyle{ f_{-1},\dots,f_{-n} }$, and outputting $\displaystyle{ F_n(s_0) }$ if $\displaystyle{ F_n(s_0)=F_n(s_1) }$. If $\displaystyle{ F_n(s_0)\ne F_n(s_1) }$ the algorithm proceeds by doubling $\displaystyle{ n }$ and repeating as necessary until an output is obtained. (But the algorithm does not resample the maps $\displaystyle{ f_{-j} }$ which were already sampled; it uses the previously sampled maps when needed.)

## References

• Propp, James Gary; Wilson, David Bruce (1996). pp. 223–252.
• Propp, James; Wilson, David (1998), "Coupling from the past: a user's guide", Microsurveys in discrete probability (Princeton, NJ, 1997), DIMACS Ser. Discrete Math. Theoret. Comput. Sci., 41, Providence, R.I.: American Mathematical Society, pp. 181–192, doi:10.1090/dimacs/041/09, ISBN 9780821808276