Hellinger distance
In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.[1][2]
It is sometimes called the Jeffreys distance.[3][4]
Definition
Measure theory
To define the Hellinger distance in terms of measure theory, let [math]\displaystyle{ P }[/math] and [math]\displaystyle{ Q }[/math] denote two probability measures on a measure space [math]\displaystyle{ \mathcal{X} }[/math] that are absolutely continuous with respect to an auxiliary measure [math]\displaystyle{ \lambda }[/math]. Such a measure always exists, e.g [math]\displaystyle{ \lambda = (P + Q) }[/math]. The square of the Hellinger distance between [math]\displaystyle{ P }[/math] and [math]\displaystyle{ Q }[/math] is defined as the quantity
- [math]\displaystyle{ H^2(P,Q) = \frac{1}{2}\displaystyle \int_{\mathcal{X}} \left(\sqrt{p(x)} - \sqrt{q(x)}\right)^2 \lambda(dx). }[/math]
Here, [math]\displaystyle{ P(dx) = p(x)\lambda(dx) }[/math] and [math]\displaystyle{ Q(dx) = q(x) \lambda(dx) }[/math], i.e. [math]\displaystyle{ p }[/math] and [math]\displaystyle{ q }[/math] are the Radon–Nikodym derivatives of P and Q respectively with respect to [math]\displaystyle{ \lambda }[/math]. This definition does not depend on [math]\displaystyle{ \lambda }[/math], i.e. the Hellinger distance between P and Q does not change if [math]\displaystyle{ \lambda }[/math] is replaced with a different probability measure with respect to which both P and Q are absolutely continuous. For compactness, the above formula is often written as
- [math]\displaystyle{ H^2(P,Q) = \frac{1}{2}\int_{\mathcal{X}} \left(\sqrt{P(dx)} - \sqrt{Q(dx)}\right)^2. }[/math]
Probability theory using Lebesgue measure
To define the Hellinger distance in terms of elementary probability theory, we take λ to be the Lebesgue measure, so that dP / dλ and dQ / dλ are simply probability density functions. If we denote the densities as f and g, respectively, the squared Hellinger distance can be expressed as a standard calculus integral
- [math]\displaystyle{ H^2(f,g) =\frac{1}{2}\int \left(\sqrt{f(x)} - \sqrt{g(x)}\right)^2 \, dx = 1 - \int \sqrt{f(x) g(x)} \, dx, }[/math]
where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1.
The Hellinger distance H(P, Q) satisfies the property (derivable from the Cauchy–Schwarz inequality)
- [math]\displaystyle{ 0\le H(P,Q) \le 1. }[/math]
Discrete distributions
For two discrete probability distributions [math]\displaystyle{ P=(p_1, \ldots, p_k) }[/math] and [math]\displaystyle{ Q=(q_1, \ldots, q_k) }[/math], their Hellinger distance is defined as
- [math]\displaystyle{ H(P, Q) = \frac{1}{\sqrt{2}} \; \sqrt{\sum_{i=1}^k (\sqrt{p_i} - \sqrt{q_i})^2}, }[/math]
which is directly related to the Euclidean norm of the difference of the square root vectors, i.e.
- [math]\displaystyle{ H(P, Q) = \frac{1}{\sqrt{2}} \; \bigl\|\sqrt{P} - \sqrt{Q} \bigr\|_2 . }[/math]
Also, [math]\displaystyle{ 1 - H^2(P,Q) = \sum_{i=1}^k \sqrt{p_i q_i}. }[/math]
Properties
The Hellinger distance forms a bounded metric on the space of probability distributions over a given probability space.
The maximum distance 1 is achieved when P assigns probability zero to every set to which Q assigns a positive probability, and vice versa.
Sometimes the factor [math]\displaystyle{ 1/2 }[/math] in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two.
The Hellinger distance is related to the Bhattacharyya coefficient [math]\displaystyle{ BC(P,Q) }[/math] as it can be defined as
- [math]\displaystyle{ H(P,Q) = \sqrt{1 - BC(P,Q)}. }[/math]
Hellinger distances are used in the theory of sequential and asymptotic statistics.[5][6]
The squared Hellinger distance between two normal distributions [math]\displaystyle{ P \sim \mathcal{N}(\mu_1,\sigma_1^2) }[/math] and [math]\displaystyle{ Q \sim \mathcal{N}(\mu_2,\sigma_2^2) }[/math] is:
- [math]\displaystyle{ H^2(P, Q) = 1 - \sqrt{\frac{2\sigma_1\sigma_2}{\sigma_1^2+\sigma_2^2}} \, e^{-\frac{1}{4}\frac{(\mu_1-\mu_2)^2}{\sigma_1^2+\sigma_2^2}}. }[/math]
The squared Hellinger distance between two multivariate normal distributions [math]\displaystyle{ P \sim \mathcal{N}(\mu_1,\Sigma_1) }[/math] and [math]\displaystyle{ Q \sim \mathcal{N}(\mu_2,\Sigma_2) }[/math] is [7]
- [math]\displaystyle{ H^2(P, Q) = 1 - \frac{ \det (\Sigma_1)^{1/4} \det (\Sigma_2) ^{1/4}} { \det \left( \frac{\Sigma_1 + \Sigma_2}{2}\right)^{1/2} } \exp\left\{-\frac{1}{8}(\mu_1 - \mu_2)^T \left(\frac{\Sigma_1 + \Sigma_2}{2}\right)^{-1} (\mu_1 - \mu_2) \right\} }[/math]
The squared Hellinger distance between two exponential distributions [math]\displaystyle{ P \sim \mathrm{Exp}(\alpha) }[/math] and [math]\displaystyle{ Q \sim \mathrm{Exp}(\beta) }[/math] is:
- [math]\displaystyle{ H^2(P, Q) = 1 - \frac{2 \sqrt{\alpha \beta}}{\alpha + \beta}. }[/math]
The squared Hellinger distance between two Weibull distributions [math]\displaystyle{ P \sim \mathrm{W}(k,\alpha) }[/math] and [math]\displaystyle{ Q \sim \mathrm{W}(k,\beta) }[/math] (where [math]\displaystyle{ k }[/math] is a common shape parameter and [math]\displaystyle{ \alpha\, , \beta }[/math] are the scale parameters respectively):
- [math]\displaystyle{ H^2(P, Q) = 1 - \frac{2 (\alpha \beta)^{k/2}}{\alpha^k + \beta^k}. }[/math]
The squared Hellinger distance between two Poisson distributions with rate parameters [math]\displaystyle{ \alpha }[/math] and [math]\displaystyle{ \beta }[/math], so that [math]\displaystyle{ P \sim \mathrm{Poisson}(\alpha) }[/math] and [math]\displaystyle{ Q \sim \mathrm{Poisson}(\beta) }[/math], is:
- [math]\displaystyle{ H^2(P,Q) = 1-e^{-\frac{1}{2} (\sqrt{\alpha} - \sqrt{\beta})^2}. }[/math]
The squared Hellinger distance between two beta distributions [math]\displaystyle{ P \sim \text{Beta}(a_1,b_1) }[/math] and [math]\displaystyle{ Q \sim \text{Beta}(a_2, b_2) }[/math] is:
- [math]\displaystyle{ H^2(P,Q) = 1 - \frac{B\left(\frac{a_1 + a_2}{2}, \frac{b_1 + b_2}{2}\right)}{\sqrt{B(a_1, b_1) B(a_2, b_2)}} }[/math]
where [math]\displaystyle{ B }[/math] is the beta function.
The squared Hellinger distance between two gamma distributions [math]\displaystyle{ P \sim \text{Gamma}(a_1,b_1) }[/math] and [math]\displaystyle{ Q \sim \text{Gamma}(a_2, b_2) }[/math] is:
- [math]\displaystyle{ H^2(P,Q) = 1 - \Gamma\left({\scriptstyle\frac{a_1 + a_2}{2}}\right)\left(\frac{b_1+b_2}{2}\right)^{-(a_1+a_2)/2}\sqrt{\frac{b_1^{a_1}b_2^{a_2}}{\Gamma(a_1)\Gamma(a_2)}} }[/math]
where [math]\displaystyle{ \Gamma }[/math] is the gamma function.
Connection with total variation distance
The Hellinger distance [math]\displaystyle{ H(P,Q) }[/math] and the total variation distance (or statistical distance) [math]\displaystyle{ \delta(P,Q) }[/math] are related as follows:[8]
- [math]\displaystyle{ H^2(P,Q) \leq \delta(P,Q) \leq \sqrt{2}H(P,Q)\,. }[/math]
The constants in this inequality may change depending on which renormalization you choose ([math]\displaystyle{ 1/2 }[/math] or [math]\displaystyle{ 1/\sqrt{2} }[/math]).
These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.
See also
- Statistical distance
- Kullback–Leibler divergence
- Bhattacharyya distance
- Total variation distance
- Fisher information metric
Notes
- ↑ Hazewinkel, Michiel, ed. (2001), "Hellinger distance", Encyclopedia of Mathematics, Springer Science+Business Media B.V. / Kluwer Academic Publishers, ISBN 978-1-55608-010-4, https://www.encyclopediaofmath.org/index.php?title=h/h046890
- ↑ Hellinger, Ernst (1909), "Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen" (in de), Journal für die reine und angewandte Mathematik 1909 (136): 210–271, doi:10.1515/crll.1909.136.210, http://resolver.sub.uni-goettingen.de/purl?GDZPPN002166941
- ↑ "Jeffreys distance - Encyclopedia of Mathematics" (in en). https://encyclopediaofmath.org/wiki/Jeffreys_distance.
- ↑ Jeffreys, Harold (1946-09-24). "An invariant form for the prior probability in estimation problems". Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences 186 (1007): 453–461. doi:10.1098/rspa.1946.0056. ISSN 0080-4630. PMID 20998741. Bibcode: 1946RSPSA.186..453J. http://dx.doi.org/10.1098/rspa.1946.0056.
- ↑ Torgerson, Erik (1991). "Comparison of Statistical Experiments". Encyclopedia of Mathematics. 36. Cambridge University Press.
- ↑ Liese, Friedrich; Miescke, Klaus-J. (2008). Statistical Decision Theory: Estimation, Testing, and Selection. Springer. ISBN 978-0-387-73193-3.
- ↑ Pardo, L. (2006). Statistical Inference Based on Divergence Measures. New York: Chapman and Hall/CRC. p. 51. ISBN 1-58488-600-5.
- ↑ Harsha, Prahladh (September 23, 2011). "Lecture notes on communication complexity". https://www.tcs.tifr.res.in/~prahladh/teaching/2011-12/comm/lectures/l12.pdf.
References
- Yang, Grace Lo; Le Cam, Lucien M. (2000). Asymptotics in Statistics: Some Basic Concepts. Berlin: Springer. ISBN 0-387-95036-2.
- Vaart, A. W. van der (19 June 2000). Asymptotic Statistics (Cambridge Series in Statistical and Probabilistic Mathematics). Cambridge, UK: Cambridge University Press. ISBN 0-521-78450-6.
- Pollard, David E. (2002). A user's guide to measure theoretic probability. Cambridge, UK: Cambridge University Press. ISBN 0-521-00289-3.
Original source: https://en.wikipedia.org/wiki/Hellinger distance.
Read more |