Projected normal distribution

From HandWiki
Short description: Probability distribution


Projected normal distribution
Notation 𝒫𝒩n(μ,Σ)
Parameters μn (location)
Σn×n (scale)
Support

Unit n-sphere, with angular or Cartesian coordinates:
Θ=[0,π]n2×[0,2π)

𝕊n1={zn:z=1}
PDF complicated, see text

In directional statistics, the projected normal distribution (also known as offset normal distribution, angular normal distribution or angular Gaussian distribution)[1][2] is a probability distribution over directions that describes the radial projection of a random variable with n-variate normal distribution over the unit (n-1)-sphere.

Definition and properties

Given a random variable Xn that follows a multivariate normal distribution 𝒩n(μ,Σ), the projected normal distribution 𝒫𝒩n(μ,Σ) represents the distribution of the random variable Y=XX obtained projecting X over the unit sphere. In the general case, the projected normal distribution can be asymmetric and multimodal. In case μ is parallel to an eigenvector of Σ, the distribution is symmetric.[3] The first version of such distribution was introduced in Pukkila and Rao (1988).[4]

Support

The support of this distribution is the unit (n-1)-sphere, which can be variously given in terms of a set of (n1)-dimensional angular spherical cooordinates:

Θ=[0,π]n2×[0,2π)n1

or in terms of n-dimensional Cartesian coordinates:

𝕊n1={zn:z=1}n

The two are linked via the embedding function, e:Θn, with range e(Θ)=𝕊n1. This function is defined by the formula for spherical coordinates at r=1.

Density function

The density of the projected normal distribution 𝒫𝒩n(μ,Σ) can be constructed from the density of its generator n-variate normal distribution 𝒩n(μ,Σ) by re-parametrising to n-dimensional spherical coordinates and then integrating over the radial coordinate.

In full spherical coordinates with radial component r[0,) and angles θ=(θ1,,θn1)Θ, a point x=(x1,,xn)n can be written as x=rv, with v𝕊n1. To be clear, v=e(θ), as given by the above-defined embedding function. The joint density becomes

p(r,θ|μ,Σ)=rn1𝒩n(rvμ,Σ)=rn1|Σ|(2π)n2e12(rvμ)Σ1(rvμ)

where the factor rn1 is due to the change of variables x=rv. The density of 𝒫𝒩n(μ,Σ) can then be obtained via marginalization over r as[5]

p(θ|μ,Σ)=0p(r,θ|μ,Σ)dr.

The same density had been previously obtained in Pukkila and Rao (1988, Eq. (2.4))[4] using a different notation.

Note on density definition

This subsection gives some clarification lest the various forms of probability density used in this article be misunderstood. Take for example a random variate u(0,1], with uniform density, pU(u)=1. If =logu, it has density, pL()=e. This works if both densities are defined with respect to Lebesgue measure on the real line. By default convention:

  • Density functions are Lebesgue-densities, defined with respect to Lebesgue measure, applied in the space where the argument of the density function lives, so that:
  • The Lebesgue-densities involved in a change of variables are related by a factor dependent on the derivative(s) of the transformation (d/du=e in this example; and rn1 for the above change of variables, x=rv).

Neither of these conventions apply to the 𝒫N𝓃 densities in this article:

  • For n3 the density, p(θμ,Σ) is not defined w.r.t. Lebesgue measure in n1 where θ lives, because that measure does not agree with the standard notion of hyperspherical area. Instead, the density is defined w.r.t. a measure that is pulled back (via the embedding function) to angular coordinate space, from Lebesgue measure in the (n1)-dimensional tangent space of the hypersphere. This will be explained below.
  • With the embedding v=e(θ), a density, p~(vμ,Σ) cannot be defined w.r.t. Lebesgue measure, because 𝕊n1n has Lebesgue measure zero. Instead, p~ is defined w.r.t. scaled Hausdorff measure.

The pullback and Hausdorff measures agree, so that:

p(θμ,Σ)=p~(vμ,Σ)

where there is no change-of-variables factor, because the densities use different measures.

To better understand what is meant by a density being defined w.r.t. a measure (a function that maps subsets in sample space to a non-negative real-valued 'volume'), consider a measureable subset, UΘ, with embedded image V=e(U)𝕊n1 and let v=e(θ)𝒫N𝓃, then the probability for finding the sample in the subset is:

P(θU)=Updπ=P(vV)=Vp~dh

where π,h are respectively the pullback and Hausdorff measures; and the integrals are Lebesgue integrals, which can be rewritten as Riemann integrals thus:

Updπ=0π({θU:p(θ)>t})dt(1)

Pullback measure

The tangent space at v𝕊n1 is the (n1)-dimensional linear subspace perpendicular to v, where Lebesgue measure can be used. At very small scale, the tangent space is indistinguishable from the sphere (e.g. Earth looks locally flat), so that Lebesgue measure in tangent space agrees with area on the hypersphere. The tangent space Lebesgue measure is pulled back via the embedding function, as follows, to define the measure in coordinate space. For UΘ, a measureable subset in coordinate space, the pullback measure, as a Riemann integral is:

π(U)=U|det(𝐄θ𝐄θ)|dθ1dθn1(2)

where the Jacobian of the embedding function, e(θ), is the n-by-(n1) matrix 𝐄θ, the columns of which span the (n1)-dimensional tangent space where the Lebesgue measure is applied. It can be shown: |det(𝐄θ𝐄θ)|=i=1n2sinn1i(θi). When plugging the pullback measure (2), into equation (1) and exchanging the order of integration:[6]

P(θ𝒰)=Updπ=Up(θμ,Σ)|det(𝐄θ𝐄θ)|dθ1dθn1

where the first integral is Lebesgue and the second Riemann. Finally, for better geometric understanding of the square-root factor, consider:

  • For n=2, when integrating over the unitcircle, w.r.t. θ1, with embedding e(θ1)=(cosθ1,sinθ1), the Jacobian is 𝐄θ=[sinθ1cosθ1], so that |det(𝐄θ𝐄θ)|=1. The angular differential, dθ1 directly gives the subtended arc length on the circle.
  • For n=3, when integrating over the unitsphere, w.r.t. θ1,θ2, we get |det(𝐄θ𝐄θ)|=sinθ1, which is the radius of the circle of latitude at θ1 (compare equator to polar circle). The area of the surface patch subtended by the two angular differentials is: sinθ1dθ1dθ2.
  • More generally, for n2, let 𝐓 be a square or tall matrix and let /𝐓/ denote the parallelotope spanned by its colums (which represent the edges meeting at a common vertex). The parallelotope volume is |det(𝐓𝐓)|, the square root of the absolute value of the Gram determinant. For square 𝐓, the volume simplifies to |det(𝐓)|. Now let 𝐑=diag(dθ1,,dθn1), so that /𝐑/Θ is a rectangle with infinitessimally small volume, |det(𝐑)|=i=1n1dθi. Since the smooth embedding function is linear at small scale, the embedded image is the paralleotope, e(/𝐑/)=/Eθ𝐑/, with volume (area of the subtended hyperspherical surface patch):|det(𝐑EθEθ𝐑)|=|det(EθEθ)|dθ1dθn1.

Circular distribution

For n=2, parametrising the position on the unit circle in polar coordinates as v=(cosθ,sinθ), the density function can be written with respect to the parameters μ and Σ of the initial normal distribution as

p(θ|μ,Σ)=e12μΣ1μ2π|Σ|vΣ1v(1+T(θ)Φ(T(θ))ϕ(T(θ)))I[0,2π)(θ)

where ϕ and Φ are the density and cumulative distribution of a standard normal distribution, T(θ)=vΣ1μvΣ1v, and I is the indicator function.[3]

In the circular case, if the mean vector μ is parallel to the eigenvector associated to the largest eigenvalue of the covariance, the distribution is symmetric and has a mode at θ=α and either a mode or an antimode at θ=α+π, where α is the polar angle of μ=(rcosα,rsinα). If the mean is parallel to the eigenvector associated to the smallest eigenvalue instead, the distribution is also symmetric but has either a mode or an antimode at θ=α and an antimode at θ=α+π.[7]

Spherical distribution

For n=3, parametrising the position on the unit sphere in spherical coordinates as v=(cosθ1sinθ2,sinθ1sinθ2,cosθ2) where θ=(θ1,θ2) are the azimuth θ1[0,2π) and inclination θ2[0,π] angles respectively, the density function becomes

p(θ|μ,Σ)=e12μΣ1μ|Σ|(2πvΣ1v)32(Φ(T(θ))ϕ(T(θ))+T(θ)(1+T(θ)Φ(T(θ))ϕ(T(θ))))I[0,2π)(θ1)I[0,π](θ2)

where ϕ, Φ, T, and I have the same meaning as the circular case.[8]

Angular Central Gaussian Distribution

In the special case, μ=𝟎, the projected normal distribution, with n2 is known as the angular central Gaussian (ACG)[9] and in this case, the density function can be obtained in closed form as a function of Cartesian coordinates. Let 𝐱𝒩n(𝟎,Σ) and project radially: 𝐯=𝐱1𝐱 so that 𝐯𝕊n1={𝐳n:𝐳=1} (the unit hypersphere). We write 𝐯ACG(Σ), which as explained above, at v=e(θ), has density:

p~ACG(𝐯Σ)=p(θ0,Σ)=0rn1𝒩n(r𝐯𝟎,Σ)dr=Γ(n2)2πn2|Σ|12(𝐯Σ1𝐯)n2

where the integral can be solved by a change of variables and then using the standard definition of the gamma function. Notice that:

  • For any k>0 there is the parameter indeterminacy:
p~ACG(𝐯kΣ)=p~ACG(𝐯Σ).
p~ACG(𝐯k𝐈n)=puniform=Γ(n2)2πn2

ACG via transformation of normal or uniform variates

Let 𝐓 be any n-by-n invertible matrix such that 𝐓𝐓=Σ. Let 𝐮ACG(𝐈n) (uniform) and sχ(n) (chi distribution), so that: 𝐱=s𝐓𝐮𝒩n(𝟎,Σ) (multivariate normal). Now consider:

𝐯=𝐓𝐮𝐓𝐮=𝐱𝐱ACG(Σ)

which shows that the ACG distribution also results from applying, to uniform variates, the normalized linear transform:[9]

f𝐓(𝐮)=𝐓𝐮𝐓𝐮

Some further explanation of these two ways to obtain 𝐯ACG(Σ) may be helpful:

  • If we start with 𝐱n, sampled from a multivariate normal, we can project radially onto 𝕊n1 to obtain ACG variates. To derive the ACG density, we first do a change of variables: 𝐱(r,𝐯), which is still an n-dimensional representation, and this transformation induces the differential volume change factor, rn1, which is proportional to volume in the (n1)-dimensional tangent space perpendicular to 𝐱. Then, to finally obtain the ACG density on the (n1)-dimensional unitsphere, we need to marginalize over r.
  • If we start with 𝐮𝕊n1, sampled from the uniform distribution, we do not need to marginalize, because we are already in n1 dimensions. Instead, to obtain ACG variates (and the associated density), we can directly do the change of variables, 𝐯=f𝐓(𝐮), for which further details are given in the next subsection.

Caveat: when μ is nonzero, although s𝐓𝐮+μ𝒩d(μ,Σ), a similar duality does not hold:

𝐓𝐮+μ𝐓𝐮+μs𝐓𝐮+μs𝐓𝐮+μ𝒫𝒩n(μ,Σ)

Although we can radially project affine-transformed normal variates to get 𝒫𝒩n variates, this does not work for uniform variates.

Wider application of the normalized linear transform

The normalized linear transform, 𝐯=f𝐓(𝐮), is a bijection from the unitsphere to itself; the inverse is 𝐮=f𝐓1(𝐯). This transform is of independent interest, as it may be applied as a probabilistic flow on the hypersphere (similar to a normalizing flow) to generalize also other (non-uniform) distributions on hyperspheres, for example the Von Mises-Fisher distribution. The fact that we have a closed form for the ACG density allows us to recover also in closed form the differential volume change induced by this transform.

For the change of variables, 𝐯=f𝐓(𝐮) on the manifold, 𝕊n1, the uniform and ACG densities are related as:[6]

p~ACG(𝐯Σ)=puniformR(𝐯,Σ)

where the (constant) uniform density is puniform=Γ(n/2)2πn/2 and where R(𝐯,Σ) is the differential volume change factor from the input to the output of the transformation; specifically, it is given by the absolute value of the determinant of an (n1)-by-(n1) matrix:

R(𝐯,Σ)=abs|𝐐𝐯𝐉𝐮𝐐𝐮|

where 𝐉𝐮 is the n-by-n Jacobian matrix of the transformation in Euclidean space, f𝐓:nn, evaluated at 𝐮. In Euclidean space, the transformation and its Jacobian are non-invertible, but when the domain and co-domain are restricted to 𝕊n1, then f𝐓:𝕊n1𝕊n1 is a bijection and the induced differential volume ratio, R(𝐯,Σ) is obtained by projecting 𝐉𝐮 onto the (n1)-dimensional tangent spaces at the transformation input and output: 𝐐𝐮,𝐐𝐯 are n-by-(n1) matrices whose orthonormal columns span the tangent spaces. Although the above determinant formula is relatively easy to evaluate numerically on a software platform equipped with linear algebra and automatic differentiation, a simple closed form is hard to derive directly. However, since we already have p~ACG, we can recover:

R(𝐯,Σ)=|Σ|12(𝐯Σ1𝐯)n2=abs|𝐓|𝐓𝐮n

where in the final RHS it is understood that Σ=𝐓𝐓 and 𝐮=f𝐓1(𝐯).

The normalized linear transform can now be used, for example, to give a closed-form density for a more flexible distribution on the hypersphere, that is generalized from the Von Mises-Fisher. Let 𝐱VMF(μ,κ) and 𝐯=f𝐓(𝐱); the resulting density is:

p(𝐯μ,κ,𝐓)=p~VMF(𝐟T1(𝐯)μ,κ)R(𝐯,𝐓𝐓)

See also

References

Sources

  • Pukkila, Tarmo M.; Rao, C. Radhakrishna (1988). "Pattern recognition based on scale invariant discriminant functions". Information Sciences 45 (3): 379–389. doi:10.1016/0020-0255(88)90012-6. 
  • Hernandez-Stumpfhauser, Daniel; Breidt, F. Jay; van der Woerd, Mark J. (2017). "The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference". Bayesian Analysis 12 (1): 113–133. doi:10.1214/15-BA989. 
  • Wang, Fangpo; Gelfand, Alan E (2013). "Directional data analysis under the general projected normal distribution". Statistical Methodology (Elsevier) 10 (1): 113–127. doi:10.1016/j.stamet.2012.07.005. PMID 24046539. 
  • Tyler, David E (1987). "Statistical analysis for the angular central Gaussian distribution on the sphere". Biometrika 74 (3): 579–589. doi:10.2307/2336697. 
  • Sorrenson, Peter; Draxler, Felix; Rousselot, Armand; Hummerich, Sander; Köthe, Ullrich (2024). "Learning Distributions on Manifolds with Free-Form Flows". arXiv:2312.09852 [cs.LG].