Multivariate t-distribution

From HandWiki
Short description: Multivariable generalization of the Student's t-distribution


Multivariate t
Notation [math]\displaystyle{ t_\nu(\boldsymbol\mu,\boldsymbol\Sigma) }[/math]
Parameters [math]\displaystyle{ \boldsymbol\mu = [\mu_1, \dots, \mu_p]^T }[/math] location (real [math]\displaystyle{ p\times 1 }[/math] vector)
[math]\displaystyle{ \boldsymbol\Sigma }[/math] scale matrix (positive-definite real [math]\displaystyle{ p\times p }[/math] matrix)
[math]\displaystyle{ \nu \gt 0 }[/math] (real) represents the degrees of freedom
Support [math]\displaystyle{ \mathbf{x} \in\mathbb{R}^p\! }[/math]
PDF [math]\displaystyle{ \frac{\Gamma\left[(\nu+p)/2\right]}{\Gamma(\nu/2)\nu^{p/2}\pi^{p/2}\left|{\boldsymbol\Sigma}\right|^{1/2}}\left[1+\frac{1}{\nu}({\mathbf x}-{\boldsymbol\mu})^{\rm T}{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu})\right]^{-(\nu+p)/2} }[/math]
CDF No analytic expression, but see text for approximations
Mean [math]\displaystyle{ \boldsymbol\mu }[/math] if [math]\displaystyle{ \nu \gt 1 }[/math]; else undefined
Median [math]\displaystyle{ \boldsymbol\mu }[/math]
Mode [math]\displaystyle{ \boldsymbol\mu }[/math]
Variance [math]\displaystyle{ \frac{\nu}{\nu-2} \boldsymbol\Sigma }[/math] if [math]\displaystyle{ \nu \gt 2 }[/math]; else undefined
Skewness 0

In statistics, the multivariate t-distribution (or multivariate Student distribution) is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

Definition

One common method of construction of a multivariate t-distribution, for the case of [math]\displaystyle{ p }[/math] dimensions, is based on the observation that if [math]\displaystyle{ \mathbf y }[/math] and [math]\displaystyle{ u }[/math] are independent and distributed as [math]\displaystyle{ N({\mathbf 0},{\boldsymbol\Sigma}) }[/math] and [math]\displaystyle{ \chi^2_\nu }[/math] (i.e. multivariate normal and chi-squared distributions) respectively, the matrix [math]\displaystyle{ \mathbf{\Sigma}\, }[/math] is a p × p matrix, and [math]\displaystyle{ {\boldsymbol\mu} }[/math] is a constant vector then the random variable [math]\displaystyle{ {\mathbf x}={\mathbf y}/\sqrt{u/\nu} +{\boldsymbol\mu} }[/math] has the density[1]

[math]\displaystyle{ \frac{\Gamma\left[(\nu+p)/2\right]}{\Gamma(\nu/2)\nu^{p/2}\pi^{p/2}\left|{\boldsymbol\Sigma}\right|^{1/2}}\left[1+\frac{1}{\nu}({\mathbf x}-{\boldsymbol\mu})^T{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu})\right]^{-(\nu+p)/2} }[/math]

and is said to be distributed as a multivariate t-distribution with parameters [math]\displaystyle{ {\boldsymbol\Sigma},{\boldsymbol\mu},\nu }[/math]. Note that [math]\displaystyle{ \mathbf\Sigma }[/math] is not the covariance matrix since the covariance is given by [math]\displaystyle{ \nu/(\nu-2)\mathbf\Sigma }[/math] (for [math]\displaystyle{ \nu\gt 2 }[/math]).

The constructive definition of a multivariate t-distribution simultaneously serves as a sampling algorithm:

  1. Generate [math]\displaystyle{ u \sim \chi^2_\nu }[/math] and [math]\displaystyle{ \mathbf{y} \sim N(\mathbf{0}, \boldsymbol{\Sigma}) }[/math], independently.
  2. Compute [math]\displaystyle{ \mathbf{x} \gets \sqrt{\nu/u}\mathbf{y}+ \boldsymbol{\mu} }[/math].

This formulation gives rise to the hierarchical representation of a multivariate t-distribution as a scale-mixture of normals: [math]\displaystyle{ u \sim \mathrm{Ga}(\nu/2,\nu/2) }[/math] where [math]\displaystyle{ \mathrm{Ga}(a,b) }[/math] indicates a gamma distribution with density proportional to [math]\displaystyle{ x^{a-1}e^{-bx} }[/math], and [math]\displaystyle{ \mathbf{x}\mid u }[/math] conditionally follows [math]\displaystyle{ N(\boldsymbol{\mu},u^{-1}\boldsymbol{\Sigma}) }[/math].

In the special case [math]\displaystyle{ \nu=1 }[/math], the distribution is a multivariate Cauchy distribution.

Derivation

There are in fact many candidates for the multivariate generalization of Student's t-distribution. An extensive survey of the field has been given by Kotz and Nadarajah (2004). The essential issue is to define a probability density function of several variables that is the appropriate generalization of the formula for the univariate case. In one dimension ([math]\displaystyle{ p=1 }[/math]), with [math]\displaystyle{ t=x-\mu }[/math] and [math]\displaystyle{ \Sigma=1 }[/math], we have the probability density function

[math]\displaystyle{ f(t) = \frac{\Gamma[(\nu+1)/2]}{\sqrt{\nu\pi\,}\,\Gamma[\nu/2]} (1+t^2/\nu)^{-(\nu+1)/2} }[/math]

and one approach is to write down a corresponding function of several variables. This is the basic idea of elliptical distribution theory, where one writes down a corresponding function of [math]\displaystyle{ p }[/math] variables [math]\displaystyle{ t_i }[/math] that replaces [math]\displaystyle{ t^2 }[/math] by a quadratic function of all the [math]\displaystyle{ t_i }[/math]. It is clear that this only makes sense when all the marginal distributions have the same degrees of freedom [math]\displaystyle{ \nu }[/math]. With [math]\displaystyle{ \mathbf{A} = \boldsymbol\Sigma^{-1} }[/math], one has a simple choice of multivariate density function

[math]\displaystyle{ f(\mathbf t) = \frac{\Gamma((\nu+p)/2)\left|\mathbf{A}\right|^{1/2}}{\sqrt{\nu^p\pi^p\,}\,\Gamma(\nu/2)} \left(1+\sum_{i,j=1}^{p,p} A_{ij} t_i t_j/\nu\right)^{-(\nu+p)/2} }[/math]

which is the standard but not the only choice.

An important special case is the standard bivariate t-distribution, p = 2:

[math]\displaystyle{ f(t_1,t_2) = \frac{\left|\mathbf{A}\right|^{1/2}}{2\pi} \left(1+\sum_{i,j=1}^{2,2} A_{ij} t_i t_j/\nu\right)^{-(\nu+2)/2} }[/math]

Note that [math]\displaystyle{ \frac{\Gamma \left(\frac{\nu +2}{2}\right)}{\pi \ \nu \Gamma \left(\frac{\nu }{2}\right)}= \frac {1} {2\pi} }[/math].

Now, if [math]\displaystyle{ \mathbf{A} }[/math] is the identity matrix, the density is

[math]\displaystyle{ f(t_1,t_2) = \frac{1}{2\pi} \left(1+(t_1^2 + t_2^2)/\nu\right)^{-(\nu+2)/2}. }[/math]

The difficulty with the standard representation is revealed by this formula, which does not factorize into the product of the marginal one-dimensional distributions. When [math]\displaystyle{ \Sigma }[/math] is diagonal the standard representation can be shown to have zero correlation but the marginal distributions do not agree with statistical independence.

Cumulative distribution function

The definition of the cumulative distribution function (cdf) in one dimension can be extended to multiple dimensions by defining the following probability (here [math]\displaystyle{ \mathbf{x} }[/math] is a real vector):

[math]\displaystyle{ F(\mathbf{x}) = \mathbb{P}(\mathbf{X}\leq \mathbf{x}), \quad \textrm{where}\;\; \mathbf{X}\sim t_\nu(\boldsymbol\mu,\boldsymbol\Sigma). }[/math]

There is no simple formula for [math]\displaystyle{ F(\mathbf{x}) }[/math], but it can be approximated numerically via Monte Carlo integration.[2][3][4]

Conditional Distribution

This was demonstrated by Muirhead [5] though previously derived using the simpler ratio representation above, by Cornish.[6] Let vector [math]\displaystyle{ X }[/math] follow the multivariate t distribution and partition into two subvectors of [math]\displaystyle{ p_1, p_2 }[/math] elements:

[math]\displaystyle{ X_p = \begin{bmatrix} X_1 \\ X_2 \end{bmatrix} \sim t_p \left (\mu_p, \Sigma_{p \times p}, \nu \right ) }[/math]

where [math]\displaystyle{ p_1 + p_2 = p }[/math], the known mean vector is [math]\displaystyle{ \mu_p = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix} }[/math] and the scale matrix is [math]\displaystyle{ \Sigma_{p \times p} = \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} }[/math].

Then

[math]\displaystyle{ X_2|X_1 \sim t_{ p_2 }\left( \mu_{2|1},\frac{\nu + d_1}{\nu + p_1} \Sigma_{22|1}, \nu + p_1 \right) }[/math]

where

[math]\displaystyle{ \mu_{2|1} = \mu_2 + \Sigma_{21} \Sigma_{11}^{-1} \left(X_1 - \mu_1 \right ) }[/math] is the conditional mean where it exists or median otherwise.
[math]\displaystyle{ \Sigma_{22|1} = \Sigma_{22} - \Sigma_{21} \Sigma_{11}^{-1} \Sigma_{12} }[/math] is the Schur complement of [math]\displaystyle{ \Sigma_{11} \text{ in } \Sigma. }[/math]
[math]\displaystyle{ d_1 = (X_1 - \mu_1)^T \Sigma_{11}^{-1} (X_1 - \mu_1) }[/math] is the squared Mahalanobis distance of [math]\displaystyle{ X_1 }[/math] from [math]\displaystyle{ \mu_1 }[/math] with scale matrix [math]\displaystyle{ \Sigma_{11} }[/math]

See [7] for a simple proof of the above conditional distribution.

Copulas based on the multivariate t

The use of such distributions is enjoying renewed interest due to applications in mathematical finance, especially through the use of the Student's t copula.[8]

Elliptical Representation

Constructed as an elliptical distribution,[9] take the simplest centralised case with spherical symmetry and no scaling, [math]\displaystyle{ \Sigma = \operatorname{I} \, }[/math], then the multivariate t-PDF takes the form

[math]\displaystyle{ f_X(X)= g(X^T X) = \frac{\Gamma \big ( \frac{1}{2} (\nu + p ) \, \big )}{ ( \nu \pi)^{\,p/2} \Gamma \big( \frac{1}{2} \nu \big)} \bigg( 1 + \nu^{-1} X^T X \bigg)^{-( \nu + p )/2 } }[/math]

where [math]\displaystyle{ X =(x_1, \cdots ,x_p )^T\text { is a } p\text{-vector} }[/math] and [math]\displaystyle{ \nu }[/math] = degrees of freedom as defined in Muirhead section 1.5. The covariance of [math]\displaystyle{ X }[/math] is

[math]\displaystyle{ \operatorname{E} \left( XX^T \right) = \int_{-\infty}^\infty \cdots \int_{-\infty}^\infty f_X(x_1,\dots, x_p) XX^T \, dx_1 \dots dx_p = \frac{ \nu }{ \nu - 2 } \operatorname{I} }[/math]

The aim is to convert the Cartesian PDF to a radial one. Kibria and Joarder,[10] in a tutorial-style paper, define radial measure [math]\displaystyle{ r_2 = R^2 = \frac{X^TX}{p} }[/math] and, noting that the density is dependent only on r2, we get

[math]\displaystyle{ \operatorname{E} [ r_2 ] = \int_{-\infty}^\infty \cdots \int_{-\infty}^\infty f_X(x_1,\dots, x_p) \frac {X^TX}{p}\, dx_1 \dots dx_p = \frac{\nu}{ \nu -2} }[/math]

which is equivalent to the variance of [math]\displaystyle{ p }[/math]-element vector [math]\displaystyle{ X }[/math] treated as a univariate heavy-tail zero-mean random sequence with uncorrelated, yet statistically dependent, elements.

Radial Distribution

[math]\displaystyle{ r_2 }[/math] follows the Fisher-Snedecor or [math]\displaystyle{ F }[/math] distribution:

[math]\displaystyle{ r_2 \sim F_{F}( p,\nu) = B \bigg( \frac {p}{2}, \frac {\nu}{2} \bigg ) ^{-1} \bigg (\frac{p}{\nu} \bigg )^{ p/2 } r_2^ { p/2 -1 } \bigg( 1 + \frac{p}{\nu} r_2 \bigg) ^{-(p + \nu)/2 } }[/math]

having mean value [math]\displaystyle{ \operatorname{E} [ r_2 ] = \frac { \nu }{ \nu - 2 } }[/math]. [math]\displaystyle{ F }[/math]-distributions arise naturally in tests of sums of squares of sampled data after normalization by the sample standard deviation.

By a change of random variable to [math]\displaystyle{ y = \frac{p}{\nu} r_2 = \frac {X^T X}{\nu} }[/math] in the equation above, retaining [math]\displaystyle{ p }[/math]-vector [math]\displaystyle{ X }[/math], we have [math]\displaystyle{ \operatorname{E} [ y ] = \int_{-\infty}^\infty \cdots \int_{-\infty}^\infty f_X(X) \frac {X^TX}{ \nu}\, dx_1 \dots dx_p = \frac { p }{ \nu - 2 } }[/math] and probability distribution

[math]\displaystyle{ \begin{align} f_Y(y| \,p,\nu) & = \frac {\nu}{p} B \bigg( \frac {p}{2}, \frac {\nu}{2} \bigg )^{-1} \big (\frac{p}{\nu} \big )^{ \,p/2 } \big (\frac{p}{\nu} \big )^{ -p/2 -1} y^ {\, p/2 -1 } \big( 1 + y \big) ^{-(p + \nu)/2 } \\ \\ & = B \bigg ( \frac {p}{2}, \frac {\nu}{2} \bigg )^{-1} y^{ \,p/2 -1 }(1+ y )^{-(\nu + p)/2} \end{align} }[/math]

which is a regular Beta-prime distribution [math]\displaystyle{ y \sim \beta \, ' \bigg(y; \frac {p}{2}, \frac {\nu}{2} \bigg ) }[/math] having mean value [math]\displaystyle{ \frac { \frac{1}{2} p }{ \frac{1}{2}\nu - 1 } = \frac { p }{ \nu - 2 } }[/math].

Cumulative Radial Distribution

Given the Beta-prime distribution, the radial cumulative distribution function of [math]\displaystyle{ y }[/math] is known:

[math]\displaystyle{ F_Y(y) \sim I \, \bigg(\frac {y}{1+y}; \, \frac {p}{2}, \frac {\nu}{2} \bigg ) B\bigg( \frac {p}{2}, \frac {\nu}{2} \bigg )^{-1} }[/math]

where [math]\displaystyle{ I }[/math] is the incomplete Beta function and applies with a spherical [math]\displaystyle{ \Sigma }[/math] assumption.

In the scalar case, [math]\displaystyle{ p = 1 }[/math], the distribution is equivalent to Student-t with the equivalence [math]\displaystyle{ t^2 = y^2 \sigma^{-1} }[/math] and the variable t has double-sided tails for CDF purposes.

The radial distribution can also be derived via a straightforward coordinate transformation from Cartesian to spherical. A constant radius surface at [math]\displaystyle{ R = (X^TX)^{1/2} }[/math] with PDF [math]\displaystyle{ p_X(X) \propto \bigg( 1 + \nu^{-1} R^2 \bigg)^{-(\nu+p)/2} }[/math] is an iso-density surface. Given this density value, the quantum of probability on a shell of surface area [math]\displaystyle{ A_R }[/math] and thickness [math]\displaystyle{ \delta R }[/math] at [math]\displaystyle{ R }[/math] is [math]\displaystyle{ \delta P = p_X(R) \, A_R \delta R }[/math].

The enclosed [math]\displaystyle{ p }[/math]-sphere of radius [math]\displaystyle{ R }[/math] has surface area [math]\displaystyle{ A_R = \frac { 2\pi^{p/2 } R^{ \, p-1 } }{ \Gamma (p/2)} }[/math], and substitution into [math]\displaystyle{ \delta P }[/math] shows that the shell has element of probability [math]\displaystyle{ \delta P = p_X(R) \frac { 2\pi^{p/2 } R^{ p-1 } }{ \Gamma (p/2)} \delta R }[/math] which is equivalent to radial density function

[math]\displaystyle{ f_R(R) = \frac{\Gamma \big ( \frac{1}{2} (\nu + p ) \, \big )}{\nu^{\,p/2} \pi^{\,p/2} \Gamma \big( \frac{1}{2} \nu \big)} \frac { 2 \pi^{p/2 } R^{ p-1 } }{ \Gamma (p/2)} \bigg( 1 + \frac{ R^2 }{\nu} \bigg)^{-( \nu + p )/2 } }[/math]

which further simplifies to [math]\displaystyle{ f_R(R) = \frac { 2}{ \nu ^{1/2} B \big( \frac{1}{2} p, \frac{1}{2} \nu \big)} \bigg( \frac {R^2}{ \nu } \bigg)^{ (p-1)/2 } \bigg( 1 + \frac{ R^2 }{\nu} \bigg)^{-( \nu + p )/2 } }[/math] where [math]\displaystyle{ B(*,*) }[/math] is the Beta function.

Changing the radial variable to [math]\displaystyle{ y=R^2 / \nu }[/math] returns the previous Beta Prime distribution [math]\displaystyle{ f_Y(y) = \frac { 1}{ B \big( \frac{1}{2} p, \frac{1}{2} \nu \big)} y^{\, p/2 - 1 } \bigg( 1 + y \bigg)^{-( \nu + p )/2 } }[/math]

To scale the radial variables without changing the radial shape function, define scale matrix [math]\displaystyle{ \Sigma = \alpha \operatorname{I} }[/math] , yielding a 3-parameter Cartesian density function, ie. the probability [math]\displaystyle{ \Delta_P }[/math] in volume element [math]\displaystyle{ dx_1 \dots dx_p }[/math] is

[math]\displaystyle{ \Delta_P \big (f_X(X \,|\alpha, p, \nu) \big ) = \frac{\Gamma \big ( \frac{1}{2} (\nu + p ) \, \big )}{ ( \nu \pi)^{\,p/2} \alpha^{\,p/2} \Gamma \big( \frac{1}{2} \nu \big)} \bigg( 1 + \frac{X^T X }{ \alpha \nu} \bigg)^{-( \nu + p )/2 } \; dx_1 \dots dx_p }[/math]

or, in terms of scalar radial variable [math]\displaystyle{ R }[/math],

[math]\displaystyle{ f_R(R \,|\alpha, p, \nu) = \frac { 2}{\alpha^{1/2} \; \nu ^{1/2} B \big( \frac{1}{2} p, \frac{1}{2} \nu \big)} \bigg( \frac {R^2}{ \alpha \, \nu } \bigg)^{ (p-1)/2 } \bigg( 1 + \frac{ R^2 }{ \alpha \, \nu} \bigg)^{-( \nu + p )/2 } }[/math]

Radial Moments

The moments of all the radial variables , with the spherical distribution assumption, can be derived from the Beta Prime distribution. If [math]\displaystyle{ Z \sim \beta'(a,b) }[/math] then [math]\displaystyle{ \operatorname{E} (Z^m) = {\frac {B(a + m, b - m)}{B( a ,b )}} }[/math], a known result. Thus, for variable [math]\displaystyle{ y = \frac {p}{\nu} R^2 }[/math] we have

[math]\displaystyle{ \operatorname{E} (y^m) = {\frac {B(\frac{1}{2}p + m, \frac{1}{2} \nu - m)}{B( \frac{1}{2} p ,\frac{1}{2} \nu )}} = \frac{\Gamma \big(\frac{1}{2} p + m \big)\; \Gamma \big(\frac{1}{2} \nu - m \big) }{ \Gamma \big( \frac{1}{2} p \big) \; \Gamma \big( \frac{1}{2} \nu \big) }, \; \nu/2 \gt m }[/math]

The moments of [math]\displaystyle{ r_2 = \nu \, y }[/math] are

[math]\displaystyle{ \operatorname{E} (r_2^m) = \nu^m\operatorname{E} (y^m) }[/math]

while introducing the scale matrix [math]\displaystyle{ \alpha \operatorname{I} }[/math] yields

[math]\displaystyle{ \operatorname{E} (r_2^m | \alpha) = \alpha^m \nu^m \operatorname{E} (y^m) }[/math]

Moments relating to radial variable [math]\displaystyle{ R }[/math] are found by setting [math]\displaystyle{ R =(\alpha\nu y)^{1/2} }[/math] and [math]\displaystyle{ M=2m }[/math] whereupon

[math]\displaystyle{ \operatorname{E} (R^M ) =\operatorname{E} \big((\alpha \nu y)^{1/2} \big)^{2 m } = (\alpha \nu )^{M/2} \operatorname{E} (y^{M/2})= (\alpha \nu )^{M/2} {\frac {B \big(\frac{1}{2} (p + M), \frac{1}{2} (\nu - M) \big )}{B( \frac{1}{2} p ,\frac{1}{2} \nu )}} }[/math]

Linear Combinations and Affine Transformation

This closely relates to the multivariate normal method and is described in Kotz and Nadarajah, Kibria and Joarder, Roth, and Cornish. Starting from a somewhat simplified version of the central MV-t pdf: [math]\displaystyle{ f_X(X) = \frac {\Kappa }{ \left|\Sigma \right|^{1/2} } \left( 1+ \nu^{-1} X^T \Sigma^{-1} X \right) ^ { -\left(\nu + p \right)/2} }[/math], where [math]\displaystyle{ \Kappa }[/math] is a constant and [math]\displaystyle{ \nu }[/math] is arbitrary but fixed, let [math]\displaystyle{ \Theta \in \mathbb{R}^{p \times p} }[/math] be a non-singular matrix and form vector [math]\displaystyle{ Y = \Theta X }[/math]. Then, by a straightforward change of variables we get

[math]\displaystyle{ f_Y(Y) = \frac {\Kappa }{ \left|\Sigma \right|^{1/2} } \left( 1+ \nu^{-1}Y^T \Theta^{-T} \Sigma^{-1} \Theta^{-1} Y \right) ^ { -\left(\nu + p \right)/2} \left| \frac{\partial Y }{\partial X} \right| ^{-1} }[/math]

The matrix of partial derivatives is [math]\displaystyle{ \frac{\partial Y_i }{\partial X_j} = \Theta_{i,j} }[/math] and the Jacobian becomes [math]\displaystyle{ \left| \frac{\partial Y }{\partial X} \right| = \left| \Theta \right| }[/math]. Thus

[math]\displaystyle{ f_Y(Y) = \frac {\Kappa }{ \left|\Sigma \right|^{1/2} \left| \Theta \right| } \left( 1 + \nu^{-1} Y^T \Theta^{-T} \Sigma^{-1} \Theta^{-1} Y \right) ^ { -\left(\nu + p \right)/2} }[/math]

The denominator reduces to

[math]\displaystyle{ \left|\Sigma \right|^{1/2} \left| \Theta \right| = \left|\Sigma \right|^{1/2} \left| \Theta \cdot \Theta \right|^{1/2} = \left| \Theta \Sigma \Theta^T \right|^{1/2} }[/math]

where [math]\displaystyle{ \left| \Theta^T \right| = \left| \Theta \right| }[/math]. Finally

[math]\displaystyle{ f_Y(Y) = \frac {\Kappa }{ \left| \Theta \Sigma \Theta^T \right|^{1/2} } \left( 1 + \nu^{-1} Y^T \left( \Theta \Sigma \Theta^T \right) ^{-1} Y \right) ^ { -\left(\nu + p \right)/2} }[/math]

which is a regular MV-t distribution.

In general if [math]\displaystyle{ X \sim t_\nu ( \mu, \Sigma ) }[/math] then [math]\displaystyle{ \Theta X + c \sim t_\nu ( \Theta \mu +c, \Theta \Sigma \Theta^T ) }[/math]. Roth shows that the transformation remains valid if [math]\displaystyle{ \Theta }[/math] is a rectangular matrix [math]\displaystyle{ \Theta \in \mathbb{R}^{m \times p}, m \lt p }[/math] which results in dimensionality reduction. Here, the Jacobian [math]\displaystyle{ \left| \Theta \right| }[/math] is seemingly rectangular but the value [math]\displaystyle{ \left| \Theta \Sigma \Theta^T \right|^{1/2} }[/math] in the denominator pdf is nevertheless correct and there is a discussion of rectangular matrix product determinants in Aitken.[11] In extremis, if m = 1 and [math]\displaystyle{ \Theta }[/math] becomes a row vector, then scalar Y follows a univariate double-sided Student-t distribution defined by [math]\displaystyle{ t^2 = Y^2 / \sigma^2 }[/math] with the same [math]\displaystyle{ \nu }[/math] degrees of freedom. Kibria et. al. use the affine transformation to find the marginal distributions which are also MV-t.

During affine transformations of variables with elliptical distributions all vectors must ultimately derive from one initial isotropic spherical vector [math]\displaystyle{ Z }[/math] whose elements remain 'entangled' and are not statistically independent. A vector of independent student-t samples is not consistent with the multivariate t distribution. Adding two sample multivariate t vectors generated with independent Chi-squared samples and different [math]\displaystyle{ \nu }[/math] values: [math]\displaystyle{ {1}/\sqrt{u_1/\nu_1}, \; \; {1}/\sqrt{u_2/\nu_2} }[/math] will not produce internally consistent distributions, though they will yield a Behrens-Fisher problem.[12] Taleb compares many examples of elliptical vs non-elliptical multivariate distributions

Related concepts

In univariate statistics, the Student's t-test makes use of Student's t-distribution. Hotelling's T-squared distribution is a distribution that arises in multivariate statistics. The matrix t-distribution is a distribution for random variables arranged in a matrix structure.

See also

References

  1. Roth, Michael (17 April 2013). "On the Multivariate t Distribution". http://users.isy.liu.se/en/rt/roth/student.pdf. 
  2. Botev, Z.; Chen, Y.-L. (2022). "Chapter 4: Truncated Multivariate Student Computations via Exponential Tilting.". Advances in Modeling and Simulation: Festschrift for Pierre L'Ecuyer. Springer. pp. 65–87. ISBN 978-3-031-10192-2. https://doi.org/10.1007/978-3-031-10193-9_4. 
  3. Botev, Z. I.; L'Ecuyer, P. (6 December 2015). "Efficient probability estimation and simulation of the truncated multivariate student-t distribution". Huntington Beach, CA, USA: IEEE. pp. 380–391. doi:10.1109/WSC.2015.7408180. 
  4. Genz, Alan (2009). Computation of Multivariate Normal and t Probabilities. Lecture Notes in Statistics. 195. Springer. doi:10.1007/978-3-642-01689-9. ISBN 978-3-642-01689-9. https://www.springer.com/statistics/computational+statistics/book/978-3-642-01688-2. Retrieved 2017-09-05. 
  5. Muirhead, Robb (1982). Aspects of Multivariate Statistical Theory. USA: Wiley. pp. 32–36 Theorem 1.5.4. ISBN 978-0-47 1-76985-9. 
  6. Cornish, E A (1954). "The Multivariate t-Distribution Associated with a Set of Normal Sample Deviates.". Australian Journal of Physics 7: 531–542. doi:10.1071/PH550193. https://www.publish.csiro.au/PH/pdf/PH540531. 
  7. Ding, Peng (2016). "On the Conditional Distribution of the Multivariate t Distribution". The American Statistician 70 (3): 293–295. doi:10.1080/00031305.2016.1164756. https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1164756. 
  8. Demarta, Stefano; McNeil, Alexander (2004). "The t Copula and Related Copulas". https://www.risknet.de/uploads/tx_bxelibrary/t-Copula-Demarta-ETH.pdf. 
  9. Osiewalski, Jacek; Steele, Mark (1996). "Posterior Moments of Scale Parameters in Elliptical Sampling Models". Bayesian Analysis in Statistics and Econometrics. Wiley. pp. 323–335. ISBN 0-471-11856-7. 
  10. Kibria, K M G; Joarder, A H (Jan 2006). "A short review of multivariate t distribution". Journal of Statistical Research 40 (1): 59–72. doi:10.1007/s42979-021-00503-0. https://link.springer.com/content/pdf/10.1007/s42979-021-00503-0.pdf. 
  11. Aitken, A C (1948). Determinants and Matrices (5th ed.). Edinburgh: Oliver and Boyd. pp. Chapter IV, section 36. 
  12. Giron, Javier; del Castilo, Carmen (2010). "The multivariate Behrens–Fisher distribution". Journal of Multivariate Analysis 101 (9): 2091–2102. doi:10.1016/j.jmva.2010.04.008. 

Literature

  • Kotz, Samuel; Nadarajah, Saralees (2004). Multivariate t Distributions and Their Applications. Cambridge University Press. ISBN 978-0521826549. 
  • Cherubini, Umberto; Luciano, Elisa; Vecchiato, Walter (2004). Copula methods in finance. John Wiley & Sons. ISBN 978-0470863442. 
  • Taleb, Nassim Nicholas (2023). Statistical Consequences of Fat Tails (1st ed.). Academic Press. ISBN 979-8218248031. 

External links