Law of total probability

From HandWiki
Short description: Concept in probability theory

In probability theory, the law (or formula) of total probability is a fundamental rule relating marginal probabilities to conditional probabilities. It expresses the total probability of an outcome which can be realized via several distinct events, hence the name.


The law of total probability is[1] a theorem that states, in its discrete case, if [math]\displaystyle{ \left\{{B_n : n = 1, 2, 3, \ldots}\right\} }[/math] is a finite or countably infinite partition of a sample space (in other words, a set of pairwise disjoint events whose union is the entire sample space) and each event [math]\displaystyle{ B_n }[/math] is measurable, then for any event [math]\displaystyle{ A }[/math] of the same sample space:

[math]\displaystyle{ P(A)=\sum_n P(A\cap B_n) }[/math]

or, alternatively,[1]

[math]\displaystyle{ P(A)=\sum_n P(A\mid B_n)P(B_n), }[/math]

where, for any [math]\displaystyle{ n }[/math], if [math]\displaystyle{ P(B_n) = 0 }[/math], then these terms are simply omitted from the summation since [math]\displaystyle{ P(A\mid B_n) }[/math] is finite.

The summation can be interpreted as a weighted average, and consequently the marginal probability, [math]\displaystyle{ P(A) }[/math], is sometimes called "average probability";[2] "overall probability" is sometimes used in less formal writings.[3]

The law of total probability can also be stated for conditional probabilities:

[math]\displaystyle{ P( {A|C} ) = \frac{{P( {A,C} )}}{{P( C )}} = \frac{{\sum\limits_n {P( {A,{B_n},C} )} }}{{P( C )}} = \frac{{\sum\limits_n P ( {A\mid {B_n},C} )P( {{B_n}\mid C} )P( C )}}{{P( C )}} = \sum\limits_n P ( {A\mid {B_n},C} )P( {{B_n}\mid C} ) }[/math]

Taking the [math]\displaystyle{ B_n }[/math] as above, and assuming [math]\displaystyle{ C }[/math] is an event independent of any of the [math]\displaystyle{ B_n }[/math]:

[math]\displaystyle{ P(A \mid C) = \sum_n P(A \mid C,B_n) P(B_n) }[/math]

Continuous case

The law of total probability extends to the case of conditioning on events generated by continuous random variables. Let [math]\displaystyle{ (\Omega, \mathcal{F}, P) }[/math] be a probability space. Suppose [math]\displaystyle{ X }[/math] is a random variable with distribution function [math]\displaystyle{ F_X }[/math], and [math]\displaystyle{ A }[/math] an event on [math]\displaystyle{ (\Omega, \mathcal{F}, P) }[/math]. Then the law of total probability states

[math]\displaystyle{ P(A) = \int_{-\infty}^\infty P(A |X = x) d F_X(x). }[/math]

If [math]\displaystyle{ X }[/math] admits a density function [math]\displaystyle{ f_X }[/math], then the result is

[math]\displaystyle{ P(A) = \int_{-\infty}^\infty P(A |X = x) f_X(x) dx. }[/math]

Moreover, for the specific case where [math]\displaystyle{ A = \{Y \in B \} }[/math], where [math]\displaystyle{ B }[/math] is a Borel set, then this yields

[math]\displaystyle{ P(Y \in B) = \int_{-\infty}^\infty P(Y \in B |X = x) f_X(x) dx. }[/math]


Suppose that two factories supply light bulbs to the market. Factory X's bulbs work for over 5000 hours in 99% of cases, whereas factory Y's bulbs work for over 5000 hours in 95% of cases. It is known that factory X supplies 60% of the total bulbs available and Y supplies 40% of the total bulbs available. What is the chance that a purchased bulb will work for longer than 5000 hours?

Applying the law of total probability, we have:

[math]\displaystyle{ \begin{align} P(A) & = P(A\mid B_X) \cdot P(B_X) + P(A\mid B_Y) \cdot P(B_Y) \\[4pt] & = {99 \over 100} \cdot {6 \over 10} + {95 \over 100} \cdot {4 \over 10} = {{594 + 380} \over 1000} = {974 \over 1000} \end{align} }[/math]


  • [math]\displaystyle{ P(B_X)={6 \over 10} }[/math] is the probability that the purchased bulb was manufactured by factory X;
  • [math]\displaystyle{ P(B_Y)={4 \over 10} }[/math] is the probability that the purchased bulb was manufactured by factory Y;
  • [math]\displaystyle{ P(A\mid B_X)={99 \over 100} }[/math] is the probability that a bulb manufactured by X will work for over 5000 hours;
  • [math]\displaystyle{ P(A\mid B_Y)={95 \over 100} }[/math] is the probability that a bulb manufactured by Y will work for over 5000 hours.

Thus each purchased light bulb has a 97.4% chance to work for more than 5000 hours.

Other names

The term law of total probability is sometimes taken to mean the law of alternatives, which is a special case of the law of total probability applying to discrete random variables.[citation needed] One author uses the terminology of the "Rule of Average Conditional Probabilities",[4] while another refers to it as the "continuous law of alternatives" in the continuous case.[5] This result is given by Grimmett and Welsh[6] as the partition theorem, a name that they also give to the related law of total expectation.

See also


  1. 1.0 1.1 Zwillinger, D., Kokoska, S. (2000) CRC Standard Probability and Statistics Tables and Formulae, CRC Press. ISBN 1-58488-059-7 page 31.
  2. Paul E. Pfeiffer (1978). Concepts of probability theory. Courier Dover Publications. pp. 47–48. ISBN 978-0-486-63677-1. 
  3. Deborah Rumsey (2006). Probability for dummies. For Dummies. p. 58. ISBN 978-0-471-75141-0. 
  4. Jim Pitman (1993). Probability. Springer. p. 41. ISBN 0-387-97974-3. 
  5. Kenneth Baclawski (2008). Introduction to probability with R. CRC Press. p. 179. ISBN 978-1-4200-6521-3. 
  6. Probability: An Introduction, by Geoffrey Grimmett and Dominic Welsh, Oxford Science Publications, 1986, Theorem 1B.


  • Introduction to Probability and Statistics by Robert J. Beaver, Barbara M. Beaver, Thomson Brooks/Cole, 2005, page 159.
  • Theory of Statistics, by Mark J. Schervish, Springer, 1995.
  • Schaum's Outline of Probability, Second Edition, by John J. Schiller, Seymour Lipschutz, McGraw–Hill Professional, 2010, page 89.
  • A First Course in Stochastic Models, by H. C. Tijms, John Wiley and Sons, 2003, pages 431–432.
  • An Intermediate Course in Probability, by Alan Gut, Springer, 1995, pages 5–6.