Coupling from the past

From HandWiki

Among Markov chain Monte Carlo (MCMC) algorithms, coupling from the past is a method for sampling from the stationary distribution of a Markov chain. Contrary to many MCMC algorithms, coupling from the past gives in principle a perfect sample from the stationary distribution. It was invented by James Propp and David Wilson in 1996.

The basic idea

Consider a finite state irreducible aperiodic Markov chain [math]\displaystyle{ M }[/math] with state space [math]\displaystyle{ S }[/math] and (unique) stationary distribution [math]\displaystyle{ \pi }[/math] ([math]\displaystyle{ \pi }[/math] is a probability vector). Suppose that we come up with a probability distribution [math]\displaystyle{ \mu }[/math] on the set of maps [math]\displaystyle{ f:S\to S }[/math] with the property that for every fixed [math]\displaystyle{ s\in S }[/math], its image [math]\displaystyle{ f(s) }[/math] is distributed according to the transition probability of [math]\displaystyle{ M }[/math] from state [math]\displaystyle{ s }[/math]. An example of such a probability distribution is the one where [math]\displaystyle{ f(s) }[/math] is independent from [math]\displaystyle{ f(s') }[/math] whenever [math]\displaystyle{ s\ne s' }[/math], but it is often worthwhile to consider other distributions. Now let [math]\displaystyle{ f_j }[/math] for [math]\displaystyle{ j\in\mathbb Z }[/math] be independent samples from [math]\displaystyle{ \mu }[/math].

Suppose that [math]\displaystyle{ x }[/math] is chosen randomly according to [math]\displaystyle{ \pi }[/math] and is independent from the sequence [math]\displaystyle{ f_j }[/math]. (We do not worry for now where this [math]\displaystyle{ x }[/math] is coming from.) Then [math]\displaystyle{ f_{-1}(x) }[/math] is also distributed according to [math]\displaystyle{ \pi }[/math], because [math]\displaystyle{ \pi }[/math] is [math]\displaystyle{ M }[/math]-stationary and our assumption on the law of [math]\displaystyle{ f }[/math]. Define

[math]\displaystyle{ F_j:= f_{-1}\circ f_{-2}\circ\cdots\circ f_{-j}. }[/math]

Then it follows by induction that [math]\displaystyle{ F_j(x) }[/math] is also distributed according to [math]\displaystyle{ \pi }[/math] for every [math]\displaystyle{ j\in\mathbb{N} }[/math]. However, it may happen that for some [math]\displaystyle{ n\in\mathbb{N} }[/math] the image of the map [math]\displaystyle{ F_n }[/math] is a single element of [math]\displaystyle{ S }[/math]. In other words, [math]\displaystyle{ F_n(x)=F_n(y) }[/math] for each [math]\displaystyle{ y\in S }[/math]. Therefore, we do not need to have access to [math]\displaystyle{ x }[/math] in order to compute [math]\displaystyle{ F_n(x) }[/math]. The algorithm then involves finding some [math]\displaystyle{ n\in \mathbb N }[/math] such that [math]\displaystyle{ F_n(S) }[/math] is a singleton, and outputting the element of that singleton. The design of a good distribution [math]\displaystyle{ \mu }[/math] for which the task of finding such an [math]\displaystyle{ n }[/math] and computing [math]\displaystyle{ F_n }[/math] is not too costly is not always obvious, but has been accomplished successfully in several important instances.[1]

The monotone case

There is a special class of Markov chains in which there are particularly good choices for [math]\displaystyle{ \mu }[/math] and a tool for determining if [math]\displaystyle{ |F_n(S)|=1 }[/math]. (Here [math]\displaystyle{ |\cdot| }[/math] denotes cardinality.) Suppose that [math]\displaystyle{ S }[/math] is a partially ordered set with order [math]\displaystyle{ \le }[/math], which has a unique minimal element [math]\displaystyle{ s_0 }[/math] and a unique maximal element [math]\displaystyle{ s_1 }[/math]; that is, every [math]\displaystyle{ s\in S }[/math] satisfies [math]\displaystyle{ s_0\le s\le s_1 }[/math]. Also, suppose that [math]\displaystyle{ \mu }[/math] may be chosen to be supported on the set of monotone maps [math]\displaystyle{ f:S\to S }[/math]. Then it is easy to see that [math]\displaystyle{ |F_n(S)|=1 }[/math] if and only if [math]\displaystyle{ F_n(s_0)=F_n(s_1) }[/math], since [math]\displaystyle{ F_n }[/math] is monotone. Thus, checking this becomes rather easy. The algorithm can proceed by choosing [math]\displaystyle{ n:=n_0 }[/math] for some constant [math]\displaystyle{ n_0 }[/math], sampling the maps [math]\displaystyle{ f_{-1},\dots,f_{-n} }[/math], and outputting [math]\displaystyle{ F_n(s_0) }[/math] if [math]\displaystyle{ F_n(s_0)=F_n(s_1) }[/math]. If [math]\displaystyle{ F_n(s_0)\ne F_n(s_1) }[/math] the algorithm proceeds by doubling [math]\displaystyle{ n }[/math] and repeating as necessary until an output is obtained. (But the algorithm does not resample the maps [math]\displaystyle{ f_{-j} }[/math] which were already sampled; it uses the previously sampled maps when needed.)

References

  • Propp, James Gary; Wilson, David Bruce (1996). pp. 223–252. 
  • Propp, James; Wilson, David (1998), "Coupling from the past: a user's guide", Microsurveys in discrete probability (Princeton, NJ, 1997), DIMACS Ser. Discrete Math. Theoret. Comput. Sci., 41, Providence, R.I.: American Mathematical Society, pp. 181–192, doi:10.1090/dimacs/041/09, ISBN 9780821808276