# Averaging argument

In computational complexity theory and cryptography, averaging argument is a standard argument for proving theorems. It usually allows us to convert probabilistic polynomial-time algorithms into non-uniform polynomial-size circuits.

## Example

Example: If every person likes at least 1/3 of the books in a library, then the library has a book, which at least 1/3 of people like.

Proof: Suppose there are $\displaystyle{ N }$ people and B books. Each person likes at least $\displaystyle{ B/3 }$ of the books. Let people leave a mark on the book they like. Then, there will be at least $\displaystyle{ M=(NB)/3 }$ marks. The averaging argument claims that there exists a book with at least $\displaystyle{ N/3 }$ marks on it. Assume, to the contradiction, that no such book exists. Then, every book has fewer than $\displaystyle{ N/3 }$ marks. However, since there are $\displaystyle{ B }$ books, the total number of marks will be fewer than $\displaystyle{ (NB)/3 }$, contradicting the fact that there are at least $\displaystyle{ M }$ marks. $\displaystyle{ \scriptstyle\blacksquare }$

## Formalized definition of averaging argument

Let X and Y be sets, let p be a predicate on X × Y and let f be a real number in the interval [0, 1]. If for each x in X and at least f |Y| of the elements y in Y satisfy p(x, y), then there exists a y in Y such that there exist at least f |X| elements x in X that satisfy p(x, y).

There is another definition, defined using the terminology of probability theory.[1]

Let $\displaystyle{ f }$ be some function. The averaging argument is the following claim: if we have a circuit $\displaystyle{ C }$ such that $\displaystyle{ C(x, y) = f(x) }$ with probability at least $\displaystyle{ \rho }$, where $\displaystyle{ x }$ is chosen at random and $\displaystyle{ y }$ is chosen independently from some distribution $\displaystyle{ Y }$ over $\displaystyle{ \{0, 1\}^m }$ (which might not even be efficiently sampleable) then there exists a single string $\displaystyle{ y_0 \in \{0, 1\}^m }$ such that $\displaystyle{ \Pr_x[C(x, y_0) = f(x)] \ge \rho }$.

Indeed, for every $\displaystyle{ y }$ define $\displaystyle{ p_y }$ to be $\displaystyle{ \Pr_x[C(x, y) = f(x)] }$ then

$\displaystyle{ \Pr_{x,y}[C(x, y) = f(x)] = E_y[p_y] \, }$

and then this reduces to the claim that for every random variable $\displaystyle{ Z }$, if $\displaystyle{ E[Z] \ge \rho }$ then $\displaystyle{ \Pr[Z \ge \rho] \gt 0 }$ (this holds since $\displaystyle{ E[Z] }$ is the weighted average of $\displaystyle{ Z }$ and clearly if the average of some values is at least $\displaystyle{ \rho }$ then one of the values must be at least $\displaystyle{ \rho }$).

## Application

This argument has wide use in complexity theory (e.g. proving $\displaystyle{ \mathsf{BPP}\subsetneq\mathsf{P/poly} }$) and cryptography (e.g. proving that indistinguishable encryption results in semantic security). A plethora of such applications can be found in Goldreich's books.[2][3][4]

## References

1. Barak, Boaz (March 2006). "Note on the averaging and hybrid arguments and prediction vs. distinguishing". Princeton University.
2. Oded Goldreich, Foundations of Cryptography, Volume 1: Basic Tools, Cambridge University Press, 2001, ISBN:0-521-79172-3
3. Oded Goldreich, Foundations of Cryptography, Volume 2: Basic Applications, Cambridge University Press, 2004, ISBN:0-521-83084-2
4. Oded Goldreich, Computational Complexity: A Conceptual Perspective, Cambridge University Press, 2008, ISBN:0-521-88473-X