Normal-inverse-Wishart distribution

normal-inverse-Wishart
Notation	[math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) }[/math]
Parameters	[math]\displaystyle{ \boldsymbol\mu_0\in\mathbb{R}^D\, }[/math] location (vector of real); [math]\displaystyle{ \lambda \gt 0\, }[/math] (real); [math]\displaystyle{ \boldsymbol\Psi \in\mathbb{R}^{D\times D} }[/math] inverse scale matrix (pos. def.); [math]\displaystyle{ \nu \gt D-1\, }[/math] (real)
Support	[math]\displaystyle{ \boldsymbol\mu\in\mathbb{R}^D ; \boldsymbol\Sigma \in\mathbb{R}^{D\times D} }[/math] covariance matrix (pos. def.)
PDF	[math]\displaystyle{ f(\boldsymbol\mu,\boldsymbol\Sigma\|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}(\boldsymbol\mu\|\boldsymbol\mu_0,\tfrac{1}{\lambda}\boldsymbol\Sigma)\ \mathcal{W}^{-1}(\boldsymbol\Sigma\|\boldsymbol\Psi,\nu) }[/math]

Short description: Multivariate parameter family of continuous probability distributions

In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).^[1]

Definition

Suppose

[math]\displaystyle{ \boldsymbol\mu|\boldsymbol\mu_0,\lambda,\boldsymbol\Sigma \sim \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right) }[/math]

has a multivariate normal distribution with mean [math]\displaystyle{ \boldsymbol\mu_0 }[/math] and covariance matrix [math]\displaystyle{ \tfrac{1}{\lambda}\boldsymbol\Sigma }[/math], where

[math]\displaystyle{ \boldsymbol\Sigma|\boldsymbol\Psi,\nu \sim \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu) }[/math]

has an inverse Wishart distribution. Then [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) }[/math] has a normal-inverse-Wishart distribution, denoted as

[math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) . }[/math]

Characterization

Probability density function

[math]\displaystyle{ f(\boldsymbol\mu,\boldsymbol\Sigma|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right) \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu) }[/math]

The full version of the PDF is as follows:^[2]

[math]\displaystyle{ f(\boldsymbol{\mu},\boldsymbol{\Sigma} | \boldsymbol{\mu}_0,\lambda,\boldsymbol{\Psi},\nu ) =\frac{\lambda^{D/2}|\boldsymbol{\Psi}|^{\nu / 2}|\boldsymbol{\Sigma}|^{-\frac{\nu + D + 2}{2}}}{(2 \pi)^{D/2}2^{\frac{\nu D}{2}}\Gamma_D(\frac{\nu}{2})}\text{exp}\left\{ -\frac{1}{2}Tr(\boldsymbol{\Psi \Sigma}^{-1})-\frac{\lambda}{2}(\boldsymbol{\mu}-\boldsymbol{\mu}_0)^T\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu} - \boldsymbol{\mu}_0) \right\} }[/math]

Here [math]\displaystyle{ \Gamma_D[\cdot] }[/math] is the multivariate gamma function and [math]\displaystyle{ Tr(\boldsymbol{\Psi}) }[/math] is the Trace of the given matrix.

Properties

Scaling

Marginal distributions

By construction, the marginal distribution over [math]\displaystyle{ \boldsymbol\Sigma }[/math] is an inverse Wishart distribution, and the conditional distribution over [math]\displaystyle{ \boldsymbol\mu }[/math] given [math]\displaystyle{ \boldsymbol\Sigma }[/math] is a multivariate normal distribution. The marginal distribution over [math]\displaystyle{ \boldsymbol\mu }[/math] is a multivariate t-distribution.

Posterior distribution of the parameters

Suppose the sampling density is a multivariate normal distribution

[math]\displaystyle{ \boldsymbol{y_i}|\boldsymbol\mu,\boldsymbol\Sigma \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma) }[/math]

where [math]\displaystyle{ \boldsymbol{y} }[/math] is an [math]\displaystyle{ n\times p }[/math] matrix and [math]\displaystyle{ \boldsymbol{y_i} }[/math] (of length [math]\displaystyle{ p }[/math]) is row [math]\displaystyle{ i }[/math] of the matrix .

With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly

[math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu). }[/math]

The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart

[math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma|y) \sim \mathrm{NIW}(\boldsymbol\mu_n,\lambda_n,\boldsymbol\Psi_n,\nu_n), }[/math]

where

[math]\displaystyle{ \boldsymbol\mu_n = \frac{\lambda\boldsymbol\mu_0 + n \bar{\boldsymbol y}}{\lambda+n} }[/math]

[math]\displaystyle{ \lambda_n = \lambda + n }[/math]

[math]\displaystyle{ \nu_n = \nu + n }[/math]

[math]\displaystyle{ \boldsymbol\Psi_n = \boldsymbol{\Psi + S} +\frac{\lambda n}{\lambda+n} (\boldsymbol{\bar{y}-\mu_0})(\boldsymbol{\bar{y}-\mu_0})^T ~~~\mathrm{ with }~~\boldsymbol{S}= \sum_{i=1}^{n} (\boldsymbol{y_i-\bar{y}})(\boldsymbol{y_i-\bar{y}})^T }[/math].

To sample from the joint posterior of [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) }[/math], one simply draws samples from [math]\displaystyle{ \boldsymbol\Sigma|\boldsymbol y \sim \mathcal{W}^{-1}(\boldsymbol\Psi_n,\nu_n) }[/math], then draw [math]\displaystyle{ \boldsymbol\mu | \boldsymbol{\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu_n,\boldsymbol\Sigma/\lambda_n) }[/math]. To draw from the posterior predictive of a new observation, draw [math]\displaystyle{ \boldsymbol\tilde{y}|\boldsymbol{\mu,\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma) }[/math] , given the already drawn values of [math]\displaystyle{ \boldsymbol\mu }[/math] and [math]\displaystyle{ \boldsymbol\Sigma }[/math].^[3]

Generating normal-inverse-Wishart random variates

Generation of random variates is straightforward:

Sample [math]\displaystyle{ \boldsymbol\Sigma }[/math] from an inverse Wishart distribution with parameters [math]\displaystyle{ \boldsymbol\Psi }[/math] and [math]\displaystyle{ \nu }[/math]
Sample [math]\displaystyle{ \boldsymbol\mu }[/math] from a multivariate normal distribution with mean [math]\displaystyle{ \boldsymbol\mu_0 }[/math] and variance [math]\displaystyle{ \boldsymbol \tfrac{1}{\lambda} \boldsymbol\Sigma }[/math]

Related distributions

The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) }[/math] then [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma^{-1}) \sim \mathrm{NW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi^{-1},\nu) }[/math] .
The normal-inverse-gamma distribution is the one-dimensional equivalent.
The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made.

Notes

↑ Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1]
↑ Simon J.D. Prince(June 2012). Computer Vision: Models, Learning, and Inference. Cambridge University Press. 3.8: "Normal inverse Wishart distribution".
↑ Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.

References

Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [2]

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Normal-inverse-Wishart distribution. Read more

[murphy-1] Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1]

[2] Simon J.D. Prince(June 2012). Computer Vision: Models, Learning, and Inference. Cambridge University Press. 3.8: "Normal inverse Wishart distribution".

[3] Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.

[1]

[2]

[3]

Anonymous

Search

Normal-inverse-Wishart distribution

Namespaces

More

Page actions

Contents

Definition

Characterization

Probability density function

Properties

Scaling

Marginal distributions

Posterior distribution of the parameters

Generating normal-inverse-Wishart random variates

Related distributions

Notes

References

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Normal-inverse-Wishart distribution

Definition

Characterization

Probability density function

Properties

Scaling

Marginal distributions

Posterior distribution of the parameters

Generating normal-inverse-Wishart random variates

Related distributions

Notes

References

Navigation

Wiki tools

Page tools

Other projects

Categories