Jensen–Shannon divergence

Short description: Statistical distance measure

In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad)^[1]^[2] or total divergence to the average.^[3] It is based on the Kullback–Leibler divergence, with some notable (and useful) differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen–Shannon distance.^[4]^[5]^[6]

Definition

Consider the set [math]\displaystyle{ M_+^1(A) }[/math] of probability distributions where [math]\displaystyle{ A }[/math] is a set provided with some σ-algebra of measurable subsets. In particular we can take [math]\displaystyle{ A }[/math] to be a finite or countable set with all subsets being measurable.

The Jensen–Shannon divergence (JSD) is a symmetrized and smoothed version of the Kullback–Leibler divergence [math]\displaystyle{ D(P \parallel Q) }[/math]. It is defined by

[math]\displaystyle{ {\rm JSD}(P \parallel Q)= \frac{1}{2}D(P \parallel M)+\frac{1}{2}D(Q \parallel M), }[/math]

where [math]\displaystyle{ M=\frac{1}{2}(P+Q) }[/math] is a mixture distribution of [math]\displaystyle{ P }[/math] and [math]\displaystyle{ Q }[/math].

The geometric Jensen–Shannon divergence^[7] (or G-Jensen–Shannon divergence) yields a closed-form formula for divergence between two Gaussian distributions by taking the geometric mean.

A more general definition, allowing for the comparison of more than two probability distributions, is:

[math]\displaystyle{ \begin{align} {\rm JSD}_{\pi_1, \ldots, \pi_n}(P_1, P_2, \ldots, P_n) &= \sum_i \pi_i D( P_i \parallel M ) \\ &= H\left(M\right) - \sum_{i=1}^n \pi_i H(P_i) \end{align} }[/math]

where

[math]\displaystyle{ \begin{align} M &:= \sum_{i=1}^n \pi_i P_i \end{align} }[/math]

and [math]\displaystyle{ \pi_1, \ldots, \pi_n }[/math] are weights that are selected for the probability distributions [math]\displaystyle{ P_1, P_2, \ldots, P_n }[/math], and [math]\displaystyle{ H(P) }[/math] is the Shannon entropy for distribution [math]\displaystyle{ P }[/math]. For the two-distribution case described above,

[math]\displaystyle{ P_1=P, P_2=Q, \pi_1 = \pi_2 = \frac{1}{2}.\ }[/math]

Hence, for those distributions [math]\displaystyle{ P, Q }[/math]

[math]\displaystyle{ JSD = H(M) - \frac{1}{2}\bigg(H(P) + H(Q)\bigg) }[/math]

Bounds

The Jensen–Shannon divergence is bounded by 1 for two probability distributions, given that one uses the base 2 logarithm:^[8]

[math]\displaystyle{ 0 \leq {\rm JSD}( P \parallel Q ) \leq 1 }[/math].

With this normalization, it is a lower bound on the total variation distance between P and Q:

[math]\displaystyle{ {\rm JSD}(P\parallel Q) \le \frac12\|P-Q\|_1=\frac12\sum_{\omega\in\Omega}|P(\omega)-Q(\omega)| }[/math].

With base-e logarithm, which is commonly used in statistical thermodynamics, the upper bound is [math]\displaystyle{ \ln(2) }[/math]. In general, the bound in base b is [math]\displaystyle{ \log_{b}(2) }[/math]:

[math]\displaystyle{ 0 \leq {\rm JSD}( P \parallel Q ) \leq \log_b(2) }[/math].

A more general bound, the Jensen–Shannon divergence is bounded by [math]\displaystyle{ \log_{b}(n) }[/math] for more than two probability distributions:^[8]

[math]\displaystyle{ 0 \leq {\rm JSD}_{\pi_1, \ldots, \pi_n}(P_1, P_2, \ldots, P_n) \leq \log_{b}(n) }[/math].

Relation to mutual information

The Jensen–Shannon divergence is the mutual information between a random variable [math]\displaystyle{ X }[/math] associated to a mixture distribution between [math]\displaystyle{ P }[/math] and [math]\displaystyle{ Q }[/math] and the binary indicator variable [math]\displaystyle{ Z }[/math] that is used to switch between [math]\displaystyle{ P }[/math] and [math]\displaystyle{ Q }[/math] to produce the mixture. Let [math]\displaystyle{ X }[/math] be some abstract function on the underlying set of events that discriminates well between events, and choose the value of [math]\displaystyle{ X }[/math] according to [math]\displaystyle{ P }[/math] if [math]\displaystyle{ Z = 0 }[/math] and according to [math]\displaystyle{ Q }[/math] if [math]\displaystyle{ Z = 1 }[/math], where [math]\displaystyle{ Z }[/math] is equiprobable. That is, we are choosing [math]\displaystyle{ X }[/math] according to the probability measure [math]\displaystyle{ M=(P+Q)/2 }[/math], and its distribution is the mixture distribution. We compute

[math]\displaystyle{ \begin{align} I(X; Z) &= H(X) - H(X|Z)\\ &= -\sum M \log M + \frac{1}{2} \left[ \sum P \log P + \sum Q \log Q \right] \\ &= -\sum \frac{P}{2} \log M - \sum \frac{Q}{2} \log M + \frac{1}{2} \left[ \sum P \log P + \sum Q \log Q \right] \\ &= \frac{1}{2} \sum P \left( \log P - \log M\right ) + \frac{1}{2} \sum Q \left( \log Q - \log M \right) \\ &= {\rm JSD}(P \parallel Q) \end{align} }[/math]

It follows from the above result that the Jensen–Shannon divergence is bounded by 0 and 1 because mutual information is non-negative and bounded by [math]\displaystyle{ H(Z) = 1 }[/math] in base 2 logarithm.

One can apply the same principle to a joint distribution and the product of its two marginal distribution (in analogy to Kullback–Leibler divergence and mutual information) and to measure how reliably one can decide if a given response comes from the joint distribution or the product distribution—subject to the assumption that these are the only two possibilities.^[9]

Quantum Jensen–Shannon divergence

The generalization of probability distributions on density matrices allows to define quantum Jensen–Shannon divergence (QJSD).^[10]^[11] It is defined for a set of density matrices [math]\displaystyle{ (\rho_1,\ldots,\rho_n) }[/math] and a probability distribution [math]\displaystyle{ \pi=(\pi_1,\ldots,\pi_n) }[/math] as

[math]\displaystyle{ {\rm QJSD}(\rho_1,\ldots,\rho_n)= S\left(\sum_{i=1}^n \pi_i \rho_i\right)-\sum_{i=1}^n \pi_i S(\rho_i) }[/math]

where [math]\displaystyle{ S(\rho) }[/math] is the von Neumann entropy of [math]\displaystyle{ \rho }[/math]. This quantity was introduced in quantum information theory, where it is called the Holevo information: it gives the upper bound for amount of classical information encoded by the quantum states [math]\displaystyle{ (\rho_1,\ldots,\rho_n) }[/math] under the prior distribution [math]\displaystyle{ \pi }[/math] (see Holevo's theorem).^[12] Quantum Jensen–Shannon divergence for [math]\displaystyle{ \pi=\left(\frac{1}{2},\frac{1}{2}\right) }[/math] and two density matrices is a symmetric function, everywhere defined, bounded and equal to zero only if two density matrices are the same. It is a square of a metric for pure states,^[13] and it was recently shown that this metric property holds for mixed states as well.^[14]^[15] The Bures metric is closely related to the quantum JS divergence; it is the quantum analog of the Fisher information metric.

Jensen–Shannon centroid

The centroid C* of a finite set of probability distributions can be defined as the minimizer of the average sum of the Jensen-Shannon divergences between a probability distribution and the prescribed set of distributions: [math]\displaystyle{ C^*=\arg\min_{Q} \sum_{i=1}^n {\rm JSD}(P_i \parallel Q) }[/math] An efficient algorithm^[16] (CCCP) based on difference of convex functions is reported to calculate the Jensen-Shannon centroid of a set of discrete distributions (histograms).

Applications

The Jensen–Shannon divergence has been applied in bioinformatics and genome comparison,^[17]^[18] in protein surface comparison,^[19] in the social sciences,^[20] in the quantitative study of history,^[21] in fire experiments,^[22] and in machine learning.^[23]

Notes

↑ Frank Nielsen (2021). "On a variational definition for the Jensen-Shannon symmetrization of distances based on the information radius". Entropy (MDPI) 23 (4): 464. doi:10.3390/e21050485. PMID 33267199.
↑ Hinrich Schütze; Christopher D. Manning (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: MIT Press. p. 304. ISBN 978-0-262-13360-9. http://nlp.stanford.edu/fsnlp/.
↑ Dagan, Ido; Lillian Lee; Fernando Pereira (1997). "Similarity-Based Methods For Word Sense Disambiguation". Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics: 56–63. doi:10.3115/979617.979625. Bibcode: 1997cmp.lg....8010D. http://citeseer.ist.psu.edu/dagan97similaritybased.html. Retrieved 2008-03-09.
↑ Endres, D. M.; J. E. Schindelin (2003). "A new metric for probability distributions". IEEE Trans. Inf. Theory 49 (7): 1858–1860. doi:10.1109/TIT.2003.813506. https://research-repository.st-andrews.ac.uk/bitstream/10023/1591/1/Endres2003-IEEETransInfTheory49-NewMetric.pdf.
↑ Ôsterreicher, F.; I. Vajda (2003). "A new class of metric divergences on probability spaces and its statistical applications". Ann. Inst. Statist. Math. 55 (3): 639–653. doi:10.1007/BF02517812.
↑ Fuglede, B.; Topsoe, F. (2004). "Jensen-Shannon divergence and Hilbert space embedding". Proceedings of the International Symposium on Information Theory, 2004. IEEE. pp. 30. doi:10.1109/ISIT.2004.1365067. ISBN 978-0-7803-8280-0. http://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf.
↑ Frank Nielsen (2019). "On the Jensen-Shannon symmetrization of distances relying on abstract means". Entropy (MDPI) 21 (5): 485. doi:10.3390/e21050485. PMID 33267199. Bibcode: 2019Entrp..21..485N.
↑ ^8.0 ^8.1 Lin, J. (1991). "Divergence measures based on the shannon entropy". IEEE Transactions on Information Theory 37 (1): 145–151. doi:10.1109/18.61115. https://www.cise.ufl.edu/~anand/sp06/jensen-shannon.pdf.
↑ Schneidman, Elad; Bialek, W; Berry, M.J. 2nd (2003). "Synergy, Redundancy, and Independence in Population Codes". Journal of Neuroscience 23 (37): 11539–11553. doi:10.1523/JNEUROSCI.23-37-11539.2003. PMID 14684857.
↑ Majtey, A.; Lamberti, P.; Prato, D. (2005). "Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states". Physical Review A 72 (5): 052310. doi:10.1103/PhysRevA.72.052310. Bibcode: 2005PhRvA..72e2310M.
↑ Briët, Jop; Harremoës, Peter (2009). "Properties of classical and quantum Jensen-Shannon divergence". Physical Review A 79 (5): 052311. doi:10.1103/PhysRevA.79.052311. Bibcode: 2009PhRvA..79e2311B.
↑ Holevo, A. S. (1973), "Bounds for the quantity of information transmitted by a quantum communication channel" (in ru), Problemy Peredachi Informatsii 9: 3–11 . English translation: Probl. Inf. Transm., 9: 177–183 (1975) MR 456936
↑ Braunstein, Samuel; Caves, Carlton (1994). "Statistical distance and the geometry of quantum states". Physical Review Letters 72 (22): 3439–3443. doi:10.1103/PhysRevLett.72.3439. PMID 10056200. Bibcode: 1994PhRvL..72.3439B.
↑ Virosztek, Dániel (2021). "The metric property of the quantum Jensen-Shannon divergence". Advances in Mathematics 380: 107595. doi:10.1016/j.aim.2021.107595.
↑ Sra, Suvrit (2019). "Metrics Induced by Quantum Jensen-Shannon-Renyí and Related Divergences". arXiv:1911.02643 [cs.IT].
↑ Frank Nielsen (2021). "On a generalization of the Jensen-Shannon divergence and the Jensen--Shannon centroid". Entropy (MDPI) 22 (2): 221. doi:10.3390/e22020221. PMID 33285995.
↑ Sims, GE; Jun, SR; Wu, GA; Kim, SH (2009). "Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions". Proceedings of the National Academy of Sciences of the United States of America 106 (8): 2677–82. doi:10.1073/pnas.0813249106. PMID 19188606. Bibcode: 2009PNAS..106.2677S.
↑ Itzkovitz, S; Hodis, E; Segal, E (2010). "Overlapping codes within protein-coding sequences". Genome Research 20 (11): 1582–9. doi:10.1101/gr.105072.110. PMID 20841429.
↑ Ofran, Y; Rost, B (2003). "Analysing six types of protein-protein interfaces". Journal of Molecular Biology 325 (2): 377–87. doi:10.1016/s0022-2836(02)01223-8. PMID 12488102.
↑ DeDeo, Simon; Hawkins, Robert X. D.; Klingenstein, Sara; Hitchcock, Tim (2013). "Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems". Entropy 15 (6): 2246–2276. doi:10.3390/e15062246. Bibcode: 2013Entrp..15.2246D.
↑ Klingenstein, Sara; Hitchcock, Tim; DeDeo, Simon (2014). "The civilizing process in London's Old Bailey". Proceedings of the National Academy of Sciences of the United States of America 111 (26): 9419–9424. doi:10.1073/pnas.1405984111. PMID 24979792. Bibcode: 2014PNAS..111.9419K.
↑ Flavia-Corina Mitroi-Symeonidis; Ion Anghel; Nicuşor Minculete (2020). "Parametric Jensen-Shannon statistical complexity and its applications on full-scale compartment fire data". Symmetry 12 (1): 22. doi:10.3390/sym12010022.
↑ Goodfellow, Ian J.; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua (2014). "Generative Adversarial Networks". NIPS. Bibcode: 2014arXiv1406.2661G.

External links

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Jensen–Shannon divergence. Read more

[1] Frank Nielsen (2021). "On a variational definition for the Jensen-Shannon symmetrization of distances based on the information radius". Entropy (MDPI) 23 (4): 464. doi:10.3390/e21050485. PMID 33267199.

[2] Hinrich Schütze; Christopher D. Manning (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: MIT Press. p. 304. ISBN 978-0-262-13360-9. http://nlp.stanford.edu/fsnlp/.

[3] Dagan, Ido; Lillian Lee; Fernando Pereira (1997). "Similarity-Based Methods For Word Sense Disambiguation". Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics: 56–63. doi:10.3115/979617.979625. Bibcode: 1997cmp.lg....8010D. http://citeseer.ist.psu.edu/dagan97similaritybased.html. Retrieved 2008-03-09.

[4] Endres, D. M.; J. E. Schindelin (2003). "A new metric for probability distributions". IEEE Trans. Inf. Theory 49 (7): 1858–1860. doi:10.1109/TIT.2003.813506. https://research-repository.st-andrews.ac.uk/bitstream/10023/1591/1/Endres2003-IEEETransInfTheory49-NewMetric.pdf.

[5] Ôsterreicher, F.; I. Vajda (2003). "A new class of metric divergences on probability spaces and its statistical applications". Ann. Inst. Statist. Math. 55 (3): 639–653. doi:10.1007/BF02517812.

[6] Fuglede, B.; Topsoe, F. (2004). "Jensen-Shannon divergence and Hilbert space embedding". Proceedings of the International Symposium on Information Theory, 2004. IEEE. pp. 30. doi:10.1109/ISIT.2004.1365067. ISBN 978-0-7803-8280-0. http://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf.

[7] Frank Nielsen (2019). "On the Jensen-Shannon symmetrization of distances relying on abstract means". Entropy (MDPI) 21 (5): 485. doi:10.3390/e21050485. PMID 33267199. Bibcode: 2019Entrp..21..485N.

[Lin-8] 8.0 ^8.1 Lin, J. (1991). "Divergence measures based on the shannon entropy". IEEE Transactions on Information Theory 37 (1): 145–151. doi:10.1109/18.61115. https://www.cise.ufl.edu/~anand/sp06/jensen-shannon.pdf.

[9] Schneidman, Elad; Bialek, W; Berry, M.J. 2nd (2003). "Synergy, Redundancy, and Independence in Population Codes". Journal of Neuroscience 23 (37): 11539–11553. doi:10.1523/JNEUROSCI.23-37-11539.2003. PMID 14684857.

[10] Majtey, A.; Lamberti, P.; Prato, D. (2005). "Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states". Physical Review A 72 (5): 052310. doi:10.1103/PhysRevA.72.052310. Bibcode: 2005PhRvA..72e2310M.

[briet-11] Briët, Jop; Harremoës, Peter (2009). "Properties of classical and quantum Jensen-Shannon divergence". Physical Review A 79 (5): 052311. doi:10.1103/PhysRevA.79.052311. Bibcode: 2009PhRvA..79e2311B.

[12] Holevo, A. S. (1973), "Bounds for the quantity of information transmitted by a quantum communication channel" (in ru), Problemy Peredachi Informatsii 9: 3–11 . English translation: Probl. Inf. Transm., 9: 177–183 (1975) MR 456936

[13] Braunstein, Samuel; Caves, Carlton (1994). "Statistical distance and the geometry of quantum states". Physical Review Letters 72 (22): 3439–3443. doi:10.1103/PhysRevLett.72.3439. PMID 10056200. Bibcode: 1994PhRvL..72.3439B.

[14] Virosztek, Dániel (2021). "The metric property of the quantum Jensen-Shannon divergence". Advances in Mathematics 380: 107595. doi:10.1016/j.aim.2021.107595.

[15] Sra, Suvrit (2019). "Metrics Induced by Quantum Jensen-Shannon-Renyí and Related Divergences". arXiv:1911.02643 [cs.IT].

[16] Frank Nielsen (2021). "On a generalization of the Jensen-Shannon divergence and the Jensen--Shannon centroid". Entropy (MDPI) 22 (2): 221. doi:10.3390/e22020221. PMID 33285995.

[Sims-17] Sims, GE; Jun, SR; Wu, GA; Kim, SH (2009). "Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions". Proceedings of the National Academy of Sciences of the United States of America 106 (8): 2677–82. doi:10.1073/pnas.0813249106. PMID 19188606. Bibcode: 2009PNAS..106.2677S.

[It-18] Itzkovitz, S; Hodis, E; Segal, E (2010). "Overlapping codes within protein-coding sequences". Genome Research 20 (11): 1582–9. doi:10.1101/gr.105072.110. PMID 20841429.

[Ofran-19] Ofran, Y; Rost, B (2003). "Analysing six types of protein-protein interfaces". Journal of Molecular Biology 325 (2): 377–87. doi:10.1016/s0022-2836(02)01223-8. PMID 12488102.

[DeDeo-20] DeDeo, Simon; Hawkins, Robert X. D.; Klingenstein, Sara; Hitchcock, Tim (2013). "Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems". Entropy 15 (6): 2246–2276. doi:10.3390/e15062246. Bibcode: 2013Entrp..15.2246D.

[Klingenstein-21] Klingenstein, Sara; Hitchcock, Tim; DeDeo, Simon (2014). "The civilizing process in London's Old Bailey". Proceedings of the National Academy of Sciences of the United States of America 111 (26): 9419–9424. doi:10.1073/pnas.1405984111. PMID 24979792. Bibcode: 2014PNAS..111.9419K.

[22] Flavia-Corina Mitroi-Symeonidis; Ion Anghel; Nicuşor Minculete (2020). "Parametric Jensen-Shannon statistical complexity and its applications on full-scale compartment fire data". Symmetry 12 (1): 22. doi:10.3390/sym12010022.

[23] Goodfellow, Ian J.; Pouget-Abadie, Jean; Mirza, Mehdi; Xu, Bing; Warde-Farley, David; Ozair, Sherjil; Courville, Aaron; Bengio, Yoshua (2014). "Generative Adversarial Networks". NIPS. Bibcode: 2014arXiv1406.2661G.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

Anonymous

Search

Jensen–Shannon divergence

Namespaces

More

Page actions

Contents

Definition

Bounds

Relation to mutual information

Quantum Jensen–Shannon divergence

Jensen–Shannon centroid

Applications

Notes

External links

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Jensen–Shannon divergence

Definition

Bounds

Relation to mutual information

Quantum Jensen–Shannon divergence

Jensen–Shannon centroid

Applications

Notes

External links

Navigation

Wiki tools

Page tools

Other projects

Categories