Law of total expectation
The proposition in probability theory known as the law of total expectation,[1] the law of iterated expectations[2] (LIE), Adam's law,[3] the tower rule,[4] and the smoothing theorem,[5] among other names, states that if [math]\displaystyle{ X }[/math] is a random variable whose expected value [math]\displaystyle{ \operatorname{E}(X) }[/math] is defined, and [math]\displaystyle{ Y }[/math] is any random variable on the same probability space, then
- [math]\displaystyle{ \operatorname{E} (X) = \operatorname{E} ( \operatorname{E} ( X \mid Y)), }[/math]
i.e., the expected value of the conditional expected value of [math]\displaystyle{ X }[/math] given [math]\displaystyle{ Y }[/math] is the same as the expected value of [math]\displaystyle{ X }[/math].
One special case states that if [math]\displaystyle{ {\left\{A_i\right\}}_i }[/math] is a finite or countable partition of the sample space, then
- [math]\displaystyle{ \operatorname{E} (X) = \sum_i{\operatorname{E}(X \mid A_i) \operatorname{P}(A_i)}. }[/math]
Note: The conditional expected value E(X | Y), with Y a random variable, is not a simple number; it is a random variable whose value depends on the value of Y. That is, the conditional expected value of X given the event Y = y is a number and it is a function of y. If we write g(y) for the value of E(X | Y = y) then the random variable E(X | Y) is g(Y).
Example
Suppose that only two factories supply light bulbs to the market. Factory [math]\displaystyle{ X }[/math]'s bulbs work for an average of 5000 hours, whereas factory [math]\displaystyle{ Y }[/math]'s bulbs work for an average of 4000 hours. It is known that factory [math]\displaystyle{ X }[/math] supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?
Applying the law of total expectation, we have:
- [math]\displaystyle{ \begin{align} \operatorname{E} (L) &= \operatorname{E}(L \mid X) \operatorname{P}(X)+\operatorname{E}(L \mid Y) \operatorname{P}(Y) \\[3pt] &= 5000(0.6)+4000(0.4)\\[2pt] &=4600 \end{align} }[/math]
where
- [math]\displaystyle{ \operatorname{E} (L) }[/math] is the expected life of the bulb;
- [math]\displaystyle{ \operatorname{P}(X)={6 \over 10} }[/math] is the probability that the purchased bulb was manufactured by factory [math]\displaystyle{ X }[/math];
- [math]\displaystyle{ \operatorname{P}(Y)={4 \over 10} }[/math] is the probability that the purchased bulb was manufactured by factory [math]\displaystyle{ Y }[/math];
- [math]\displaystyle{ \operatorname{E}(L \mid X)=5000 }[/math] is the expected lifetime of a bulb manufactured by [math]\displaystyle{ X }[/math];
- [math]\displaystyle{ \operatorname{E}(L \mid Y)=4000 }[/math] is the expected lifetime of a bulb manufactured by [math]\displaystyle{ Y }[/math].
Thus each purchased light bulb has an expected lifetime of 4600 hours.
Informal proof
When a joint probability density function is well defined and the expectations are integrable, we write for the general case [math]\displaystyle{ \begin{align} \operatorname E(X) &= \int x \Pr[X=x] ~dx \\ \operatorname E(X\mid Y=y) &= \int x \Pr[X=x\mid Y=y] ~dx \\ \operatorname E( \operatorname E(X\mid Y)) &= \int \left(\int x \Pr[X=x\mid Y=y] ~dx \right) \Pr[Y=y] ~dy \\ &= \int \int x \Pr[X = x, Y= y] ~dx ~dy \\ &= \int x \left( \int \Pr[X = x, Y = y] ~dy \right) ~dx \\ &= \int x \Pr[X = x] ~dx \\ &= \operatorname E(X)\,.\end{align} }[/math] A similar derivation works for discrete distributions using summation instead of integration. For the specific case of a partition, give each cell of the partition a unique label and let the random variable Y be the function of the sample space that assigns a cell's label to each point in that cell.
Proof in the general case
Let [math]\displaystyle{ (\Omega,\mathcal{F},\operatorname{P}) }[/math] be a probability space on which two sub σ-algebras [math]\displaystyle{ \mathcal{G}_1 \subseteq \mathcal{G}_2 \subseteq \mathcal{F} }[/math] are defined. For a random variable [math]\displaystyle{ X }[/math] on such a space, the smoothing law states that if [math]\displaystyle{ \operatorname{E}[X] }[/math] is defined, i.e. [math]\displaystyle{ \min(\operatorname{E}[X_+], \operatorname{E}[X_-])\lt \infty }[/math], then
- [math]\displaystyle{ \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] = \operatorname{E}[X \mid \mathcal{G}_1]\quad\text{(a.s.)}. }[/math]
Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:
- [math]\displaystyle{ \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \mbox{ is } \mathcal{G}_1 }[/math]-measurable
- [math]\displaystyle{ \int_{G_1} \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \, d\operatorname{P} = \int_{G_1} X \, d\operatorname{P}, }[/math] for all [math]\displaystyle{ G_1 \in \mathcal{G}_1. }[/math]
The first of these properties holds by definition of the conditional expectation. To prove the second one,
- [math]\displaystyle{ \begin{align} \min\left(\int_{G_1}X_+\, d\operatorname{P}, \int_{G_1}X_-\, d\operatorname{P} \right) &\leq \min\left(\int_\Omega X_+\, d\operatorname{P}, \int_\Omega X_-\, d\operatorname{P}\right)\\[4pt] &=\min(\operatorname{E}[X_+], \operatorname{E}[X_-]) \lt \infty, \end{align} }[/math]
so the integral [math]\displaystyle{ \textstyle \int_{G_1}X\, d\operatorname{P} }[/math] is defined (not equal [math]\displaystyle{ \infty - \infty }[/math]).
The second property thus holds since [math]\displaystyle{ G_1 \in \mathcal{G}_1 \subseteq \mathcal{G}_2 }[/math] implies
- [math]\displaystyle{ \int_{G_1} \operatorname{E}[ \operatorname{E}[X \mid \mathcal{G}_2] \mid \mathcal{G}_1] \, d\operatorname{P} = \int_{G_1} \operatorname{E}[X \mid \mathcal{G}_2] \, d\operatorname{P} = \int_{G_1} X \, d\operatorname{P}. }[/math]
Corollary. In the special case when [math]\displaystyle{ \mathcal{G}_1 = \{\empty,\Omega \} }[/math] and [math]\displaystyle{ \mathcal{G}_2 = \sigma(Y) }[/math], the smoothing law reduces to
- [math]\displaystyle{ \operatorname{E}[ \operatorname{E}[X \mid Y]] = \operatorname{E}[X]. }[/math]
Alternative proof for [math]\displaystyle{ \operatorname{E}[ \operatorname{E}[X \mid Y]] = \operatorname{E}[X]. }[/math]
This is a simple consequence of the measure-theoretic definition of conditional expectation. By definition, [math]\displaystyle{ \operatorname{E}[X \mid Y] := \operatorname{E}[X \mid \sigma(Y)] }[/math] is a [math]\displaystyle{ \sigma(Y) }[/math]-measurable random variable that satisfies
- [math]\displaystyle{ \int_A \operatorname{E}[X \mid Y] \, d\operatorname{P} = \int_A X \, d\operatorname{P}, }[/math]
for every measurable set [math]\displaystyle{ A \in \sigma(Y) }[/math]. Taking [math]\displaystyle{ A = \Omega }[/math] proves the claim.
See also
- The fundamental theorem of poker for one practical application.
- Law of total probability
- Law of total variance
- Law of total covariance
- Law of total cumulance
- Product distribution (application of the Law for proving that the product expectation is the product of expectations)
References
- ↑ Weiss, Neil A. (2005). A Course in Probability. Boston: Addison–Wesley. pp. 380–383. ISBN 0-321-18954-X. https://books.google.com/books?id=p-rwJAAACAAJ&pg=PA380.
- ↑ "Law of Iterated Expectation | Brilliant Math & Science Wiki" (in en-us). https://brilliant.org/wiki/law-of-iterated-expectation/.
- ↑ "Adam's and Eve's Laws". https://r.amherst.edu/apps/nhorton/Adam-Eve/.
- ↑ Rhee, Chang-han (Sep 20, 2011). "Probability and Statistics". https://web.stanford.edu/class/cme001/handouts/changhan/Refresher2.pdf.
- ↑ Wolpert, Robert (November 18, 2010). "Conditional Expectation". https://www2.stat.duke.edu/courses/Fall10/sta205/lec/topics/rn.pdf.
- Billingsley, Patrick (1995). Probability and measure. New York: John Wiley & Sons. ISBN 0-471-00710-2. (Theorem 34.4)
- Christopher Sims, "Notes on Random Variables, Expectations, Probability Densities, and Martingales", especially equations (16) through (18)
Original source: https://en.wikipedia.org/wiki/Law of total expectation.
Read more |