Normal-inverse-Wishart distribution
Notation | [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) }[/math] | ||
---|---|---|---|
Parameters |
[math]\displaystyle{ \boldsymbol\mu_0\in\mathbb{R}^D\, }[/math] location (vector of real) [math]\displaystyle{ \lambda \gt 0\, }[/math] (real) [math]\displaystyle{ \boldsymbol\Psi \in\mathbb{R}^{D\times D} }[/math] inverse scale matrix (pos. def.) [math]\displaystyle{ \nu \gt D-1\, }[/math] (real) | ||
Support | [math]\displaystyle{ \boldsymbol\mu\in\mathbb{R}^D ; \boldsymbol\Sigma \in\mathbb{R}^{D\times D} }[/math] covariance matrix (pos. def.) | ||
[math]\displaystyle{ f(\boldsymbol\mu,\boldsymbol\Sigma|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}(\boldsymbol\mu|\boldsymbol\mu_0,\tfrac{1}{\lambda}\boldsymbol\Sigma)\ \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu) }[/math] |
In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).[1]
Definition
Suppose
- [math]\displaystyle{ \boldsymbol\mu|\boldsymbol\mu_0,\lambda,\boldsymbol\Sigma \sim \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right) }[/math]
has a multivariate normal distribution with mean [math]\displaystyle{ \boldsymbol\mu_0 }[/math] and covariance matrix [math]\displaystyle{ \tfrac{1}{\lambda}\boldsymbol\Sigma }[/math], where
- [math]\displaystyle{ \boldsymbol\Sigma|\boldsymbol\Psi,\nu \sim \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu) }[/math]
has an inverse Wishart distribution. Then [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) }[/math] has a normal-inverse-Wishart distribution, denoted as
- [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) . }[/math]
Characterization
Probability density function
- [math]\displaystyle{ f(\boldsymbol\mu,\boldsymbol\Sigma|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right) \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu) }[/math]
The full version of the PDF is as follows:[2]
[math]\displaystyle{ f(\boldsymbol{\mu},\boldsymbol{\Sigma} | \boldsymbol{\mu}_0,\lambda,\boldsymbol{\Psi},\nu ) =\frac{\lambda^{D/2}|\boldsymbol{\Psi}|^{\nu / 2}|\boldsymbol{\Sigma}|^{-\frac{\nu + D + 2}{2}}}{(2 \pi)^{D/2}2^{\frac{\nu D}{2}}\Gamma_D(\frac{\nu}{2})}\text{exp}\left\{ -\frac{1}{2}Tr(\boldsymbol{\Psi \Sigma}^{-1})-\frac{\lambda}{2}(\boldsymbol{\mu}-\boldsymbol{\mu}_0)^T\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu} - \boldsymbol{\mu}_0) \right\} }[/math]
Here [math]\displaystyle{ \Gamma_D[\cdot] }[/math] is the multivariate gamma function and [math]\displaystyle{ Tr(\boldsymbol{\Psi}) }[/math] is the Trace of the given matrix.
Properties
Scaling
Marginal distributions
By construction, the marginal distribution over [math]\displaystyle{ \boldsymbol\Sigma }[/math] is an inverse Wishart distribution, and the conditional distribution over [math]\displaystyle{ \boldsymbol\mu }[/math] given [math]\displaystyle{ \boldsymbol\Sigma }[/math] is a multivariate normal distribution. The marginal distribution over [math]\displaystyle{ \boldsymbol\mu }[/math] is a multivariate t-distribution.
Posterior distribution of the parameters
Suppose the sampling density is a multivariate normal distribution
- [math]\displaystyle{ \boldsymbol{y_i}|\boldsymbol\mu,\boldsymbol\Sigma \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma) }[/math]
where [math]\displaystyle{ \boldsymbol{y} }[/math] is an [math]\displaystyle{ n\times p }[/math] matrix and [math]\displaystyle{ \boldsymbol{y_i} }[/math] (of length [math]\displaystyle{ p }[/math]) is row [math]\displaystyle{ i }[/math] of the matrix .
With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly
- [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu). }[/math]
The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart
- [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma|y) \sim \mathrm{NIW}(\boldsymbol\mu_n,\lambda_n,\boldsymbol\Psi_n,\nu_n), }[/math]
where
- [math]\displaystyle{ \boldsymbol\mu_n = \frac{\lambda\boldsymbol\mu_0 + n \bar{\boldsymbol y}}{\lambda+n} }[/math]
- [math]\displaystyle{ \lambda_n = \lambda + n }[/math]
- [math]\displaystyle{ \nu_n = \nu + n }[/math]
- [math]\displaystyle{ \boldsymbol\Psi_n = \boldsymbol{\Psi + S} +\frac{\lambda n}{\lambda+n} (\boldsymbol{\bar{y}-\mu_0})(\boldsymbol{\bar{y}-\mu_0})^T ~~~\mathrm{ with }~~\boldsymbol{S}= \sum_{i=1}^{n} (\boldsymbol{y_i-\bar{y}})(\boldsymbol{y_i-\bar{y}})^T }[/math].
To sample from the joint posterior of [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) }[/math], one simply draws samples from [math]\displaystyle{ \boldsymbol\Sigma|\boldsymbol y \sim \mathcal{W}^{-1}(\boldsymbol\Psi_n,\nu_n) }[/math], then draw [math]\displaystyle{ \boldsymbol\mu | \boldsymbol{\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu_n,\boldsymbol\Sigma/\lambda_n) }[/math]. To draw from the posterior predictive of a new observation, draw [math]\displaystyle{ \boldsymbol\tilde{y}|\boldsymbol{\mu,\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma) }[/math] , given the already drawn values of [math]\displaystyle{ \boldsymbol\mu }[/math] and [math]\displaystyle{ \boldsymbol\Sigma }[/math].[3]
Generating normal-inverse-Wishart random variates
Generation of random variates is straightforward:
- Sample [math]\displaystyle{ \boldsymbol\Sigma }[/math] from an inverse Wishart distribution with parameters [math]\displaystyle{ \boldsymbol\Psi }[/math] and [math]\displaystyle{ \nu }[/math]
- Sample [math]\displaystyle{ \boldsymbol\mu }[/math] from a multivariate normal distribution with mean [math]\displaystyle{ \boldsymbol\mu_0 }[/math] and variance [math]\displaystyle{ \boldsymbol \tfrac{1}{\lambda} \boldsymbol\Sigma }[/math]
Related distributions
- The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) }[/math] then [math]\displaystyle{ (\boldsymbol\mu,\boldsymbol\Sigma^{-1}) \sim \mathrm{NW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi^{-1},\nu) }[/math] .
- The normal-inverse-gamma distribution is the one-dimensional equivalent.
- The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made.
Notes
- ↑ Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [1]
- ↑ Simon J.D. Prince(June 2012). Computer Vision: Models, Learning, and Inference. Cambridge University Press. 3.8: "Normal inverse Wishart distribution".
- ↑ Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.
References
- Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
- Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution." [2]
Original source: https://en.wikipedia.org/wiki/Normal-inverse-Wishart distribution.
Read more |