Auxiliary particle filter

From HandWiki

The auxiliary particle filter is a particle filtering algorithm introduced by Pitt and Shephard in 1999 to improve some deficiencies of the sequential importance resampling (SIR) algorithm when dealing with tailed observation densities.

Motivation

Particle filters approximate continuous random variable by [math]\displaystyle{ M }[/math] particles with discrete probability mass [math]\displaystyle{ \pi_t }[/math], say [math]\displaystyle{ 1/M }[/math] for uniform distribution. The random sampled particles can be used to approximate the probability density function of the continuous random variable if the value [math]\displaystyle{ M\rightarrow\infin }[/math].

The empirical prediction density is produced as the weighted summation of these particles:[1]

[math]\displaystyle{ \widehat{f}(\alpha_{t+1}|Y_t)=\sum_{j=1}^Mf(\alpha_{t+1}|\alpha_t^j)\pi_t^j }[/math], and we can view it as the "prior" density. Note that the particles are assumed to have the same weight [math]\displaystyle{ \pi_t^j=\frac{1}{M} }[/math].

Combining the prior density [math]\displaystyle{ \widehat{f}(\alpha_{t+1}|Y_t) }[/math] and the likelihood [math]\displaystyle{ f(y_{t+1}|\alpha_{t+1}) }[/math], the empirical filtering density can be produced as:

[math]\displaystyle{ \widehat{f}(\alpha_{t+1}|Y_{t+1})=\frac {f(y_{t+1}|\alpha_{t+1}) \widehat{f}(\alpha_{t+1}|Y_t)}{f(y_{t+1}|Y_t)} \propto f(y_{t+1}|\alpha_{t+1}) \sum_{j=1}^Mf(\alpha_{t+1}|\alpha_t^j)\pi_t^j }[/math], where [math]\displaystyle{ f(y_{t+1}|Y_t) =\int f(y_{t+1}|\alpha_{t+1})dF(\alpha_{t+1}|Y_t) }[/math].

On the other hand, the true filtering density which we want to estimate is

[math]\displaystyle{ f(\alpha_{t+1}|Y_{t+1})=\frac{f(y_{t+1}|\alpha_{t+1})f(\alpha_{t+1}|Y_t)}{f(y_{t+1}|Y_t)} }[/math].

The prior density [math]\displaystyle{ \widehat{f}(\alpha_{t+1}|Y_t) }[/math] can be used to approximate the true filtering density [math]\displaystyle{ f(\alpha_{t+1}|Y_{t+1}) }[/math]:

  • The particle filters draw [math]\displaystyle{ R }[/math] samples from the prior density [math]\displaystyle{ \widehat{f}(\alpha_{t+1}|Y_t) }[/math]. Each sample are drawn with equal probability.
  • Assign each sample with the weights [math]\displaystyle{ \pi_j=\frac{\omega_j}{\sum_{i=1}^R\omega_i},\omega_j=f(y|\alpha^j) }[/math]. The weights represent the likelihood function [math]\displaystyle{ f(y_{t+1}|\alpha_{t+1}) }[/math].
  • If the number [math]\displaystyle{ R\rightarrow\infin }[/math], than the samples converge to the desired true filtering density.
  • The [math]\displaystyle{ R }[/math] particles are resampled to [math]\displaystyle{ M }[/math] particles with the weight [math]\displaystyle{ \pi_j }[/math].

The weakness of the particle filters includes:

  • If the weight {[math]\displaystyle{ \omega_j }[/math]} has a large variance, the sample amount [math]\displaystyle{ R }[/math] must be large enough for the samples to approximate the empirical filtering density. In other words, while the weight is widely distributed, the SIR method will be imprecise and the adaption is difficult.

Therefore, the auxiliary particle filter is proposed to solve this problem.

Auxiliary particle filter

Auxiliary variable

Comparing with the empirical filtering density which has [math]\displaystyle{ \widehat{f}(\alpha_{t+1}|Y_{t+1})\propto f(y_{t+1}|\alpha_{t+1}) \sum_{j=1}^Mf(\alpha_{t+1}|\alpha_t^j)\pi_t^j }[/math],

we now define [math]\displaystyle{ \widehat{f}(\alpha_{t+1},k|Y_{t+1})\propto f(y_{t+1}|\alpha_{t+1}) f(\alpha_{t+1}|\alpha_t^k)\pi^k }[/math], where [math]\displaystyle{ k=1,...,M }[/math].

Being aware that [math]\displaystyle{ \widehat{f}(\alpha_{t+1}|Y_{t+1}) }[/math] is formed by the summation of [math]\displaystyle{ M }[/math] particles, the auxiliary variable [math]\displaystyle{ k }[/math] represents one specific particle. With the aid of [math]\displaystyle{ k }[/math], we can form a set of samples which has the distribution [math]\displaystyle{ g(\alpha_{t+1},k|Y_{t+1}) }[/math]. Then, we draw from these sample set [math]\displaystyle{ g(\alpha_{t+1},k|Y_{t+1}) }[/math] instead of directly from [math]\displaystyle{ \widehat{f}(\alpha_{t+1}|Y_{t+1}) }[/math]. In other words, the samples are drawn from [math]\displaystyle{ \widehat{f}(\alpha_{t+1}|Y_{t+1}) }[/math] with different probability. The samples are ultimately utilized to approximate [math]\displaystyle{ f(\alpha_{t+1}|Y_{t+1}) }[/math].

Take the SIR method for example:

  • The particle filters draw [math]\displaystyle{ R }[/math] samples from [math]\displaystyle{ g(\alpha_{t+1},k|Y_{t+1}) }[/math].
  • Assign each samples with the weight [math]\displaystyle{ \pi_j=\frac{\omega_j}{\sum_{i=1}^R\omega_i}, \omega_j=\frac{f(y_{t+1}|\alpha_{t+1}^j)f(\alpha_{t+1}^j|\alpha_t^k)}{g(\alpha_{t+1}^j,k^j|Y_{t+1})} }[/math].
  • By controlling [math]\displaystyle{ y_{t+1} }[/math] and [math]\displaystyle{ \alpha_t^k }[/math], the weights are adjusted to be even.
  • Similarly, the [math]\displaystyle{ R }[/math] particles are resampled to [math]\displaystyle{ M }[/math] particles with the weight [math]\displaystyle{ \pi_j }[/math].

The original particle filters draw samples from the prior density, while the auxiliary filters draw from the joint distribution of the prior density and the likelihood. In other words, the auxiliary particle filters avoid the circumstance which the particles are generated in the regions with low likelihood. As a result, the samples can approximate [math]\displaystyle{ f(\alpha_{t+1}|Y_{t+1}) }[/math] more precisely.

Selection of the auxiliary variable

The selection of the auxiliary variable affects [math]\displaystyle{ g(\alpha_{t+1},k|Y_{t+1}) }[/math] and controls the distribution of the samples. A possible selection of [math]\displaystyle{ g(\alpha_{t+1},k|Y_{t+1}) }[/math] can be:
[math]\displaystyle{ g(\alpha_{t+1},k|Y_{t+1})\propto f(y_{t+1}|\mu_{t+1}^k) f(\alpha_{t+1}|\alpha_t^k)\pi^k }[/math], where [math]\displaystyle{ k=1,...,M }[/math] and [math]\displaystyle{ \mu_{t+1}^k }[/math] is the mean.

We sample from [math]\displaystyle{ g(\alpha_{t+1},k|Y_{t+1}) }[/math] to approximate [math]\displaystyle{ f(\alpha_{t+1}|Y_{t+1}) }[/math] by the following procedure:

  • First, we assign probabilities to the indexes of [math]\displaystyle{ f(\alpha_{t+1}|\alpha_t^k) }[/math]. We named these probabilities as the first-stage weights [math]\displaystyle{ \lambda_k }[/math], which are proportional to [math]\displaystyle{ g(k|Y_{t+1})\propto \pi^kf(y_{t+1}|\mu_{t+1}^k) }[/math].
  • Then, we draw [math]\displaystyle{ R }[/math] samples from [math]\displaystyle{ f(\alpha_{t+1}|\alpha_t^k) }[/math] with the weighted indexes. By doing so, we are actually drawing the samples from [math]\displaystyle{ g(\alpha_{t+1},k|Y_{t+1}) }[/math].
  • Moreover, we reassign the second-stage weights [math]\displaystyle{ \pi_j=\frac{\omega_j}{\sum_{i=1}^R\omega_i} }[/math] as the probabilities of the [math]\displaystyle{ R }[/math] samples, where [math]\displaystyle{ \omega_j=\frac{f(y_{t+1}|\alpha_{t+1}^j)}{f(y_{t+1}|\mu_{t+1}^j)} }[/math]. The weights are aim to compensate the effect of [math]\displaystyle{ \mu_{t+1}^k }[/math].
  • Finally, the [math]\displaystyle{ R }[/math] particles are resampled to [math]\displaystyle{ M }[/math] particles with the weights [math]\displaystyle{ \pi_j }[/math].

Following the procedure, we draw the [math]\displaystyle{ R }[/math] samples from [math]\displaystyle{ g(\alpha_{t+1},k|Y_{t+1}) }[/math]. Since [math]\displaystyle{ g(\alpha_{t+1},k|Y_{t+1}) }[/math] is closely related to the mean [math]\displaystyle{ \mu_{t+1}^k }[/math], it has high conditional likelihood. As a result, the sampling procedure is more efficient and the value [math]\displaystyle{ R }[/math] can be reduced.

Other point of view

Assume that the filtered posterior is described by the following M weighted samples:

[math]\displaystyle{ p(x_t|z_{1:t}) \approx \sum_{i=1}^M \omega^{(i)}_t \delta \left( x_t - x^{(i)}_t \right). }[/math]

Then, each step in the algorithm consists of first drawing a sample of the particle index [math]\displaystyle{ k }[/math] which will be propagated from [math]\displaystyle{ t-1 }[/math] into the new step [math]\displaystyle{ t }[/math]. These indexes are auxiliary variables only used as an intermediary step, hence the name of the algorithm. The indexes are drawn according to the likelihood of some reference point [math]\displaystyle{ \mu^{(i)}_t }[/math] which in some way is related to the transition model [math]\displaystyle{ x_t|x_{t-1} }[/math] (for example, the mean, a sample, etc.):

[math]\displaystyle{ k^{(i)} \sim P(i=k|z_t) \propto \omega^{(i)}_t p( z_t | \mu^{(i)}_t ) }[/math]

This is repeated for [math]\displaystyle{ i=1,2,\dots,M }[/math], and using these indexes we can now draw the conditional samples:

[math]\displaystyle{ x_t^{(i)} \sim p( x | x^{k^{(i)}}_{t-1}). }[/math]

Finally, the weights are updated to account for the mismatch between the likelihood at the actual sample and the predicted point [math]\displaystyle{ \mu_t^{k^{(i)}} }[/math]:

[math]\displaystyle{ \omega_t^{(i)} \propto \frac{p( z_t | x^{(i)}_t) } { p( z_t | \mu^{k^{(i)}}_t) }. }[/math]

References

  1. Pitt, Michael K.; Shephard, Neil. "Filtering Via Simulation: Auxiliary Particle Filters". Journal of the American Statistical Association. http://scholar.harvard.edu/files/filterSim.pdf. 

Sources