Log sum inequality

From HandWiki

The log sum inequality is used for proving theorems in information theory.

Statement

Let [math]\displaystyle{ a_1,\ldots,a_n }[/math] and [math]\displaystyle{ b_1,\ldots,b_n }[/math] be nonnegative numbers. Denote the sum of all [math]\displaystyle{ a_i }[/math]s by [math]\displaystyle{ a }[/math] and the sum of all [math]\displaystyle{ b_i }[/math]s by [math]\displaystyle{ b }[/math]. The log sum inequality states that

[math]\displaystyle{ \sum_{i=1}^n a_i\log\frac{a_i}{b_i}\geq a\log\frac{a}{b}, }[/math]

with equality if and only if [math]\displaystyle{ \frac{a_i}{b_i} }[/math] are equal for all [math]\displaystyle{ i }[/math], in other words [math]\displaystyle{ a_i =c b_i }[/math] for all [math]\displaystyle{ i }[/math].[1]

(Take [math]\displaystyle{ a_i\log \frac{a_i}{b_i} }[/math] to be [math]\displaystyle{ 0 }[/math] if [math]\displaystyle{ a_i=0 }[/math] and [math]\displaystyle{ \infty }[/math] if [math]\displaystyle{ a_i\gt 0, b_i=0 }[/math]. These are the limiting values obtained as the relevant number tends to [math]\displaystyle{ 0 }[/math].)[1]

Proof

Notice that after setting [math]\displaystyle{ f(x)=x\log x }[/math] we have

[math]\displaystyle{ \begin{align} \sum_{i=1}^n a_i\log\frac{a_i}{b_i} & {} = \sum_{i=1}^n b_i f\left(\frac{a_i}{b_i}\right) = b\sum_{i=1}^n \frac{b_i}{b} f\left(\frac{a_i}{b_i}\right) \\ & {} \geq b f\left(\sum_{i=1}^n \frac{b_i}{b}\frac{a_i}{b_i}\right) = b f\left(\frac{1}{b}\sum_{i=1}^n a_i\right) = b f\left(\frac{a}{b}\right) \\ & {} = a\log\frac{a}{b}, \end{align} }[/math]

where the inequality follows from Jensen's inequality since [math]\displaystyle{ \frac{b_i}{b}\geq 0 }[/math], [math]\displaystyle{ \sum_{i=1}^n\frac{b_i}{b}= 1 }[/math], and [math]\displaystyle{ f }[/math] is convex.[1]

Generalizations

The inequality remains valid for [math]\displaystyle{ n=\infty }[/math] provided that [math]\displaystyle{ a\lt \infty }[/math] and [math]\displaystyle{ b\lt \infty }[/math].[citation needed] The proof above holds for any function [math]\displaystyle{ g }[/math] such that [math]\displaystyle{ f(x)=xg(x) }[/math] is convex, such as all continuous non-decreasing functions. Generalizations to non-decreasing functions other than the logarithm is given in Csiszár, 2004.

Another generalization is due to Dannan, Neff and Thiel, who showed that if [math]\displaystyle{ a_1, a_2 \cdots a_n }[/math] and [math]\displaystyle{ b_1, b_2 \cdots b_n }[/math] are positive real numbers with [math]\displaystyle{ a_1+ a_2 \cdots +a_n=a }[/math] and [math]\displaystyle{ b_1 + b_2 \cdots +b_n=b }[/math], and [math]\displaystyle{ k \geq 0 }[/math], then [math]\displaystyle{ \sum_{i=1}^n a_i \log\left( \frac{a_i}{b_i} +k \right) \geq a\log \left(\frac{a}{b}+k\right) }[/math]. [2]

Applications

The log sum inequality can be used to prove inequalities in information theory. Gibbs' inequality states that the Kullback-Leibler divergence is non-negative, and equal to zero precisely if its arguments are equal.[3] One proof uses the log sum inequality.

The inequality can also prove convexity of Kullback-Leibler divergence.[4]

Notes

  1. 1.0 1.1 1.2 1.3 Cover & Thomas (1991), p. 29.
  2. F. M. Dannan, P. Neff, C. Thiel (2016). "On the sum of squared logarithms inequality and related inequalities". Journal of Mathematical Inequalities 10 (1): 1–17. doi:10.7153/jmi-10-01. http://files.ele-math.com/articles/jmi-10-01.pdf. Retrieved 12 January 2023. 
  3. MacKay (2003), p. 34.
  4. Cover & Thomas (1991), p. 30.

References