Integrated nested Laplace approximations

From HandWiki
Short description: Bayesian inference method

Integrated nested Laplace approximations (INLA) is a method for approximate Bayesian inference based on Laplace's method.[1] It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for Markov chain Monte Carlo methods to compute posterior marginal distributions.[2][3][4] Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular spatial statistics, ecology, and epidemiology.[5][6][7] It is also possible to combine INLA with a finite element method solution of a stochastic partial differential equation to study e.g. spatial point processes and species distribution models.[8][9] The INLA method is implemented in the R-INLA R package.[10]

Latent Gaussian models

Let [math]\displaystyle{ \boldsymbol{y}=(y_1,\dots,y_n) }[/math] denote the response variable (that is, the observations) which belongs to an exponential family, with the mean [math]\displaystyle{ \mu_i }[/math] (of [math]\displaystyle{ y_i }[/math]) being linked to a linear predictor [math]\displaystyle{ \eta_i }[/math] via an appropriate link function. The linear predictor can take the form of a (Bayesian) additive model. All latent effects (the linear predictor, the intercept, coefficients of possible covariates, and so on) are collectively denoted by the vector [math]\displaystyle{ \boldsymbol{x} }[/math]. The hyperparameters of the model are denoted by [math]\displaystyle{ \boldsymbol{\theta} }[/math]. As per Bayesian statistics, [math]\displaystyle{ \boldsymbol{x} }[/math] and [math]\displaystyle{ \boldsymbol{\theta} }[/math] are random variables with prior distributions.

The observations are assumed to be conditionally independent given [math]\displaystyle{ \boldsymbol{x} }[/math] and [math]\displaystyle{ \boldsymbol{\theta} }[/math]: [math]\displaystyle{ \pi(\boldsymbol{y} |\boldsymbol{x}, \boldsymbol{\theta}) = \prod_{i \in \mathcal{I}}\pi(y_i | \eta_i, \boldsymbol{\theta}), }[/math] where [math]\displaystyle{ \mathcal{I} }[/math] is the set of indices for observed elements of [math]\displaystyle{ \boldsymbol{y} }[/math] (some elements may be unobserved, and for these INLA computes a posterior predictive distribution). Note that the linear predictor [math]\displaystyle{ \boldsymbol{\eta} }[/math] is part of [math]\displaystyle{ \boldsymbol{x} }[/math].

For the model to be a latent Gaussian model, it is assumed that [math]\displaystyle{ \boldsymbol{x}|\boldsymbol{\theta} }[/math] is a Gaussian Markov Random Field (GMRF)[1] (that is, a multivariate Gaussian with additional conditional independence properties) with probability density [math]\displaystyle{ \pi(\boldsymbol{x} | \boldsymbol{\theta}) \propto \left| \boldsymbol{Q_{\theta}} \right|^{1/2} \exp \left( -\frac{1}{2} \boldsymbol{x}^T \boldsymbol{Q_{\theta}} \boldsymbol{x} \right), }[/math] where [math]\displaystyle{ \boldsymbol{Q_{\theta}} }[/math] is a [math]\displaystyle{ \boldsymbol{\theta} }[/math]-dependent sparse precision matrix and [math]\displaystyle{ \left| \boldsymbol{Q_{\theta}} \right| }[/math] is its determinant. The precision matrix is sparse due to the GMRF assumption. The prior distribution [math]\displaystyle{ \pi(\boldsymbol{\theta}) }[/math] for the hyperparameters need not be Gaussian. However, the number of hyperparameters, [math]\displaystyle{ m=\mathrm{dim}(\boldsymbol{\theta}) }[/math], is assumed to be small (say, less than 15).

Approximate Bayesian inference with INLA

In Bayesian inference, one wants to solve for the posterior distribution of the latent variables [math]\displaystyle{ \boldsymbol{x} }[/math] and [math]\displaystyle{ \boldsymbol{\theta} }[/math]. Applying Bayes' theorem [math]\displaystyle{ \pi(\boldsymbol{x}, \boldsymbol{\theta} | \boldsymbol{y}) = \frac{\pi(\boldsymbol{y} | \boldsymbol{x}, \boldsymbol{\theta})\pi(\boldsymbol{x} | \boldsymbol{\theta}) \pi(\boldsymbol{\theta}) }{\pi(\boldsymbol{y})}, }[/math] the joint posterior distribution of [math]\displaystyle{ \boldsymbol{x} }[/math] and [math]\displaystyle{ \boldsymbol{\theta} }[/math] is given by [math]\displaystyle{ \begin{align} \pi(\boldsymbol{x}, \boldsymbol{\theta} | \boldsymbol{y}) & \propto \pi(\boldsymbol{\theta})\pi(\boldsymbol{x}|\boldsymbol{\theta}) \prod_i \pi(y_i | \eta_i, \boldsymbol{\theta}) \\ & \propto \pi(\boldsymbol{\theta}) \left| \boldsymbol{Q_{\theta}} \right|^{1/2} \exp \left( -\frac{1}{2} \boldsymbol{x}^T \boldsymbol{Q_{\theta}} \boldsymbol{x} + \sum_i \log \left[\pi(y_i | \eta_i, \boldsymbol{\theta}) \right] \right). \end{align} }[/math] Obtaining the exact posterior is generally a very difficult problem. In INLA, the main aim is to approximate the posterior marginals [math]\displaystyle{ \begin{array}{rcl} \pi(x_i | \boldsymbol{y}) &=& \int \pi(x_i | \boldsymbol{\theta}, \boldsymbol{y}) \pi(\boldsymbol{\theta} | \boldsymbol{y}) d\boldsymbol{\theta} \\ \pi(\theta_j | \boldsymbol{y}) &=& \int \pi(\boldsymbol{\theta} | \boldsymbol{y}) d \boldsymbol{\theta}_{-j} , \end{array} }[/math] where [math]\displaystyle{ \boldsymbol{\theta}_{-j} = \left(\theta_1, \dots, \theta_{j-1}, \theta_{j+1}, \dots, \theta_m \right) }[/math].

A key idea of INLA is to construct nested approximations given by [math]\displaystyle{ \begin{array}{rcl} \widetilde{\pi}(x_i | \boldsymbol{y}) &=& \int \widetilde{\pi}(x_i | \boldsymbol{\theta}, \boldsymbol{y}) \widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y}) d\boldsymbol{\theta} \\ \widetilde{\pi}(\theta_j | \boldsymbol{y}) &=& \int \widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y}) d \boldsymbol{\theta}_{-j} , \end{array} }[/math] where [math]\displaystyle{ \widetilde{\pi}(\cdot | \cdot) }[/math] is an approximated posterior density. The approximation to the marginal density [math]\displaystyle{ \pi(x_i | \boldsymbol{y}) }[/math] is obtained in a nested fashion by first approximating [math]\displaystyle{ \pi(\boldsymbol{\theta} | \boldsymbol{y}) }[/math] and [math]\displaystyle{ \pi(x_i | \boldsymbol{\theta}, \boldsymbol{y}) }[/math], and then numerically integrating out [math]\displaystyle{ \boldsymbol{\theta} }[/math] as [math]\displaystyle{ \begin{align} \widetilde{\pi}(x_i | \boldsymbol{y}) = \sum_k \widetilde{\pi}\left( x_i | \boldsymbol{\theta}_k, \boldsymbol{y} \right) \times \widetilde{\pi}( \boldsymbol{\theta}_k | \boldsymbol{y}) \times \Delta_k, \end{align} }[/math] where the summation is over the values of [math]\displaystyle{ \boldsymbol{\theta} }[/math], with integration weights given by [math]\displaystyle{ \Delta_k }[/math]. The approximation of [math]\displaystyle{ \pi(\theta_j | \boldsymbol{y}) }[/math] is computed by numerically integrating [math]\displaystyle{ \boldsymbol{\theta}_{-j} }[/math] out from [math]\displaystyle{ \widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y}) }[/math].

To get the approximate distribution [math]\displaystyle{ \widetilde{\pi}(\boldsymbol{\theta} | \boldsymbol{y}) }[/math], one can use the relation [math]\displaystyle{ \begin{align} {\pi}( \boldsymbol{\theta} | \boldsymbol{y}) = \frac{\pi\left(\boldsymbol{x}, \boldsymbol{\theta}, \boldsymbol{y} \right)}{\pi\left(\boldsymbol{x} | \boldsymbol{\theta}, \boldsymbol{y} \right) \pi(\boldsymbol{y})}, \end{align} }[/math] as the starting point. Then [math]\displaystyle{ \widetilde{\pi}( \boldsymbol{\theta} | \boldsymbol{y}) }[/math] is obtained at a specific value of the hyperparameters [math]\displaystyle{ \boldsymbol{\theta} = \boldsymbol{\theta}_k }[/math] with Laplace's approximation[1] [math]\displaystyle{ \begin{align} \widetilde{\pi}( \boldsymbol{\theta}_k | \boldsymbol{y}) &\propto \left . \frac{\pi\left(\boldsymbol{x}, \boldsymbol{\theta}_k, \boldsymbol{y} \right)}{\widetilde{\pi}_G\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right)} \right \vert_{\boldsymbol{x} = \boldsymbol{x}^{*}(\boldsymbol{\theta}_k)}, \\ & \propto \left . \frac{\pi(\boldsymbol{y} | \boldsymbol{x}, \boldsymbol{\theta}_k)\pi(\boldsymbol{x} | \boldsymbol{\theta}_k) \pi(\boldsymbol{\theta}_k)}{\widetilde{\pi}_G\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right)} \right \vert_{\boldsymbol{x} = \boldsymbol{x}^{*}(\boldsymbol{\theta}_k)}, \end{align} }[/math] where [math]\displaystyle{ \widetilde{\pi}_G\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right) }[/math] is the Gaussian approximation to [math]\displaystyle{ {\pi}\left(\boldsymbol{x} | \boldsymbol{\theta}_k, \boldsymbol{y} \right) }[/math] whose mode at a given [math]\displaystyle{ \boldsymbol{\theta}_k }[/math] is [math]\displaystyle{ \boldsymbol{x}^{*}(\boldsymbol{\theta}_k) }[/math]. The mode can be found numerically for example with the Newton-Raphson method.

The trick in the Laplace approximation above is the fact that the Gaussian approximation is applied on the full conditional of [math]\displaystyle{ \boldsymbol{x} }[/math] in the denominator since it is usually close to a Gaussian due to the GMRF property of [math]\displaystyle{ \boldsymbol{x} }[/math]. Applying the approximation here improves the accuracy of the method, since the posterior [math]\displaystyle{ {\pi}( \boldsymbol{\theta} | \boldsymbol{y}) }[/math] itself need not be close to a Gaussian, and so the Gaussian approximation is not directly applied on [math]\displaystyle{ {\pi}( \boldsymbol{\theta} | \boldsymbol{y}) }[/math]. The second important property of a GMRF, the sparsity of the precision matrix [math]\displaystyle{ \boldsymbol{Q}_{\boldsymbol{\theta}_k} }[/math], is required for efficient computation of [math]\displaystyle{ \widetilde{\pi}( \boldsymbol{\theta}_k | \boldsymbol{y}) }[/math] for each value [math]\displaystyle{ {\boldsymbol{\theta}_k} }[/math].[1]

Obtaining the approximate distribution [math]\displaystyle{ \widetilde{\pi}\left( x_i | \boldsymbol{\theta}_k, \boldsymbol{y} \right) }[/math] is more involved, and the INLA method provides three options for this: Gaussian approximation, Laplace approximation, or the simplified Laplace approximation.[1] For the numerical integration to obtain [math]\displaystyle{ \widetilde{\pi}(x_i | \boldsymbol{y}) }[/math], also three options are available: grid search, central composite design, or empirical Bayes.[1]

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 Rue, Håvard; Martino, Sara; Chopin, Nicolas (2009). "Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations". J. R. Statist. Soc. B 71 (2): 319–392. doi:10.1111/j.1467-9868.2008.00700.x. 
  2. Taylor, Benjamin M.; Diggle, Peter J. (2014). "INLA or MCMC? A tutorial and comparative evaluation for spatial prediction in log-Gaussian Cox processes". Journal of Statistical Computation and Simulation 84 (10): 2266–2284. doi:10.1080/00949655.2013.788653. 
  3. Teng, M.; Nathoo, F.; Johnson, T. D. (2017). "Bayesian computation for Log-Gaussian Cox processes: a comparative analysis of methods". Journal of Statistical Computation and Simulation 87 (11): 2227–2252. doi:10.1080/00949655.2017.1326117. PMID 29200537. 
  4. Wang, Xiaofeng; Yue, Yu Ryan; Faraway, Julian J. (2018). Bayesian Regression Modeling with INLA. Chapman and Hall/CRC. ISBN 9781498727259. 
  5. Blangiardo, Marta; Cameletti, Michela (2015). Spatial and Spatio‐temporal Bayesian Models with R‐INLA. John Wiley & Sons, Ltd. ISBN 9781118326558. 
  6. Opitz, T. (2017). "Latent Gaussian modeling and INLA: A review with focus on space-time applications". Journal de la Société Française de Statistique 158: 62–85. 
  7. Moraga, Paula (2019). Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Chapman and Hall/CRC. ISBN 9780367357955. 
  8. Lindgren, Finn; Rue, Håvard; Lindström, Johan (2011). "An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach". J. R. Statist. Soc. B 73 (4): 423–498. doi:10.1111/j.1467-9868.2011.00777.x. 
  9. Lezama-Ochoa, N.; Grazia Pennino, M.; Hall, M. A.; Lopez, J.; Murua, H. (2020). "Using a Bayesian modelling approach (INLA-SPDE) to predict the occurrence of the Spinetail Devil Ray (Mobular mobular)". Scientific Reports 10 (1): 18822. doi:10.1038/s41598-020-73879-3. PMID 33139744. 
  10. "R-INLA Project". https://www.r-inla.org. 

Further reading

  • Gomez-Rubio, Virgilio (2021). Bayesian inference with INLA. Chapman and Hall/CRC. ISBN 978-1-03-217453-2.