Matrix t-distribution

From HandWiki
Short description: Concept in statistics


Matrix t
Notation Tn,p(ν,𝐌,Σ,Ω)
Parameters

𝐌 location (real n×p matrix)
Ω scale (positive-definite real n×n matrix)
Σ scale (positive-definite real p×p matrix)

ν>0 degrees of freedom (real)
Support 𝐗n×p
PDF

Γp(ν+n+p12)(π)np2Γp(ν+p12)|Ω|n2|Σ|p2

×|𝐈p+Σ1(𝐗𝐌)Ω1(𝐗𝐌)T|ν+n+p12
CDF No analytic expression
Mean 𝐌 if ν>1, else undefined
Mode 𝐌
Variance cov(vec(𝐗))=ΣΩν2 if ν>2, else undefined
CF see below

In statistics, the matrix t-distribution (or matrix variate t-distribution) is the generalization of the multivariate t-distribution from vectors to matrices.[1][2]

The matrix t-distribution shares the same relationship with the multivariate t-distribution that the matrix normal distribution shares with the multivariate normal distribution: If the matrix has only one row, or only one column, the distributions become equivalent to the corresponding (vector-)multivariate distribution. The matrix t-distribution is the compound distribution that results from an infinite mixture of a matrix normal distribution with an inverse Wishart distribution placed over either of its covariance matrices,[1] and the multivariate t-distribution can be generated in a similar way.[2]

In a Bayesian analysis of a multivariate linear regression model based on the matrix normal distribution, the matrix t-distribution is the posterior predictive distribution.[3]

Definition

For a matrix t-distribution, the probability density function at the point 𝐗 of an n×p space is

f(𝐗;ν,𝐌,Σ,Ω)=K×|𝐈n+Σ1(𝐗𝐌)Ω1(𝐗𝐌)T|ν+n+p12,

where the constant of integration K is given by

K=Γp(ν+n+p12)(π)np2Γp(ν+p12)|Ω|n2|Σ|p2.

Here Γp is the multivariate gamma function.

Properties

If 𝐗𝒯n×p(ν,𝐌,Σ,Ω), then we have the following properties:[2]

Expected values

The mean, or expected value is, if ν>1:

E[𝐗]=𝐌

and we have the following second-order expectations, if ν>2:

E[(𝐗𝐌)(𝐗𝐌)T]=Σtr(Ω)ν2
E[(𝐗𝐌)T(𝐗𝐌)]=Ωtr(Σ)ν2

where tr denotes trace.

More generally, for appropriately dimensioned matrices A,B,C:

E[(𝐗𝐌)𝐀(𝐗𝐌)T]=Σtr(𝐀TΩ)ν2E[(𝐗𝐌)T𝐁(𝐗𝐌)]=Ωtr(𝐁TΣ)ν2E[(𝐗𝐌)𝐂(𝐗𝐌)]=Σ𝐂TΩν2

Transformation

Transpose transform:

𝐗T𝒯p×n(ν,𝐌T,Ω,Σ)

Linear transform: let A (r-by-n), be of full rank r ≤ n and B (p-by-s), be of full rank s ≤ p, then:

𝐀𝐗𝐁𝒯r×s(ν,𝐀𝐌𝐁,𝐀Σ𝐀T,𝐁TΩ𝐁)

The characteristic function and various other properties can be derived from the re-parameterised formulation (see below).

Re-parameterized matrix t-distribution

Re-parameterized matrix t
Notation Tn,p(α,β,𝐌,Σ,Ω)
Parameters

𝐌 location (real n×p matrix)
Ω scale (positive-definite real p×p matrix)
Σ scale (positive-definite real n×n matrix)
α>(p1)/2 shape parameter

β>0 scale parameter
Support 𝐗n×p
PDF

Γp(α+n/2)(2π/β)np2Γp(α)|Ω|n2|Σ|p2

×|𝐈n+β2Σ1(𝐗𝐌)Ω1(𝐗𝐌)T|(α+n/2)
CDF No analytic expression
Mean 𝐌 if α>p/2, else undefined
Variance 2(ΣΩ)β(2αp1) if α>(p+1)/2, else undefined
CF see below

An alternative parameterisation of the matrix t-distribution uses two parameters α and β in place of ν.[3]

This formulation reduces to the standard matrix t-distribution with β=2,α=ν+p12.

This formulation of the matrix t-distribution can be derived as the compound distribution that results from an infinite mixture of a matrix normal distribution with an inverse multivariate gamma distribution placed over either of its covariance matrices.

Properties

If 𝐗Tn,p(α,β,𝐌,Σ,Ω) then[2][3]

𝐗TTp,n(α,β,𝐌T,Ω,Σ).

The property above comes from Sylvester's determinant theorem:

det(𝐈n+β2Σ1(𝐗𝐌)Ω1(𝐗𝐌)T)=
det(𝐈p+β2Ω1(𝐗T𝐌T)Σ1(𝐗T𝐌T)T).

If 𝐗Tn,p(α,β,𝐌,Σ,Ω) and 𝐀(n×n) and 𝐁(p×p) are nonsingular matrices then[2][3]

𝐀𝐗𝐁Tn,p(α,β,𝐀𝐌𝐁,𝐀Σ𝐀T,𝐁TΩ𝐁).

The characteristic function is[3]

ϕT(𝐙)=exp(tr(i𝐙𝐌))|Ω|αΓp(α)(2β)αp|𝐙Σ𝐙|αBα(12β𝐙Σ𝐙Ω),

where

Bδ(𝐖𝐙)=|𝐖|δ𝐒>0exp(tr(𝐒𝐖𝐒𝟏𝐙))|𝐒|δ12(p+1)d𝐒,

and where Bδ is the type-two Bessel function of Herz[clarification needed] of a matrix argument.

See also

Notes

  1. 1.0 1.1 Zhu, Shenghuo and Kai Yu and Yihong Gong (2007). "Predictive Matrix-Variate t Models." In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, NIPS '07: Advances in Neural Information Processing Systems 20, pages 1721–1728. MIT Press, Cambridge, MA, 2008. The notation is changed a bit in this article for consistency with the matrix normal distribution article.
  2. 2.0 2.1 2.2 2.3 2.4 Gupta, Arjun K and Nagar, Daya K (1999). Matrix variate distributions. CRC Press. pp. Chapter 4. 
  3. 3.0 3.1 3.2 3.3 3.4 Iranmanesh, Anis, M. Arashi and S. M. M. Tabatabaey (2010). "On Conditional Applications of Matrix Variate Normal Distribution". Iranian Journal of Mathematical Sciences and Informatics, 5:2, pp. 33–43.


Template:Random matrix theory