Data processing inequality

From HandWiki

The data processing inequality is an information theoretic concept that states that the information content of a signal cannot be increased via a local physical operation. This can be expressed concisely as 'post-processing cannot increase information'.[1]

Statement

Let three random variables form the Markov chain [math]\displaystyle{ X \rightarrow Y \rightarrow Z }[/math], implying that the conditional distribution of [math]\displaystyle{ Z }[/math] depends only on [math]\displaystyle{ Y }[/math] and is conditionally independent of [math]\displaystyle{ X }[/math]. Specifically, we have such a Markov chain if the joint probability mass function can be written as

[math]\displaystyle{ p(x,y,z) = p(x)p(y|x)p(z|y)=p(y)p(x|y)p(z|y) }[/math]

In this setting, no processing of [math]\displaystyle{ Y }[/math], deterministic or random, can increase the information that [math]\displaystyle{ Y }[/math] contains about [math]\displaystyle{ X }[/math]. Using the mutual information, this can be written as :

[math]\displaystyle{ I(X;Y) \geqslant I(X;Z), }[/math]

with the equality [math]\displaystyle{ I(X;Y) = I(X;Z) }[/math] if and only if [math]\displaystyle{ I(X;Y\mid Z)=0 }[/math]. That is, [math]\displaystyle{ Z }[/math] and [math]\displaystyle{ Y }[/math] contain the same information about [math]\displaystyle{ X }[/math], and [math]\displaystyle{ X \rightarrow Z \rightarrow Y }[/math] also forms a Markov chain.[2]

Proof

One can apply the chain rule for mutual information to obtain two different decompositions of [math]\displaystyle{ I(X;Y,Z) }[/math]:

[math]\displaystyle{ I(X;Z) + I(X;Y\mid Z) = I(X;Y,Z) = I(X;Y) + I(X;Z\mid Y) }[/math]

By the relationship [math]\displaystyle{ X \rightarrow Y \rightarrow Z }[/math], we know that [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Z }[/math] are conditionally independent, given [math]\displaystyle{ Y }[/math], which means the conditional mutual information, [math]\displaystyle{ I(X;Z\mid Y)=0 }[/math]. The data processing inequality then follows from the non-negativity of [math]\displaystyle{ I(X;Y\mid Z)\ge0 }[/math].

See also

References

  1. Beaudry, Normand (2012), "An intuitive proof of the data processing inequality", Quantum Information & Computation 12 (5–6): 432–441, doi:10.26421/QIC12.5-6-4, Bibcode2011arXiv1107.0740B 
  2. Cover; Thomas (2012). Elements of information theory. John Wiley & Sons. 

External links