# Probability axioms

Part of a series on statistics |

Probability theory |
---|

The **Kolmogorov axioms** are the foundations of probability theory introduced by Russian mathematician Andrey Kolmogorov in 1933.^{[1]} These axioms remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases.^{[2]} An alternative approach to formalising probability, favoured by some Bayesians, is given by Cox's theorem.^{[3]}

## Axioms

The assumptions as to setting up the axioms can be summarised as follows: Let [math]\displaystyle{ (\Omega, F, P) }[/math] be a measure space with [math]\displaystyle{ P(E) }[/math] being the probability of some event E*,* and [math]\displaystyle{ P(\Omega) = 1 }[/math]. Then [math]\displaystyle{ (\Omega, F, P) }[/math] is a probability space, with sample space [math]\displaystyle{ \Omega }[/math], event space [math]\displaystyle{ F }[/math] and probability measure [math]\displaystyle{ P }[/math].^{[1]}

### First axiom

The probability of an event is a non-negative real number:

- [math]\displaystyle{ P(E)\in\mathbb{R}, P(E)\geq 0 \qquad \forall E \in F }[/math]

where [math]\displaystyle{ F }[/math] is the event space. It follows that [math]\displaystyle{ P(E) }[/math] is always finite, in contrast with more general measure theory. Theories which assign negative probability relax the first axiom.

### Second axiom

This is the assumption of unit measure: that the probability that at least one of the elementary events in the entire sample space will occur is 1

- [math]\displaystyle{ P(\Omega) = 1. }[/math]

### Third axiom

This is the assumption of σ-additivity:

- Any countable sequence of disjoint sets (synonymous with
*mutually exclusive*events) [math]\displaystyle{ E_1, E_2, \ldots }[/math] satisfies- [math]\displaystyle{ P\left(\bigcup_{i = 1}^\infty E_i\right) = \sum_{i=1}^\infty P(E_i). }[/math]

Some authors consider merely finitely additive probability spaces, in which case one just needs an algebra of sets, rather than a σ-algebra.^{[4]} Quasiprobability distributions in general relax the third axiom.

## Consequences

From the Kolmogorov axioms, one can deduce other useful rules for studying probabilities. The proofs^{[5]}^{[6]}^{[7]} of these rules are a very insightful procedure that illustrates the power of the third axiom, and its interaction with the remaining two axioms. Four of the immediate corollaries and their proofs are shown below:

### Monotonicity

- [math]\displaystyle{ \quad\text{if}\quad A\subseteq B\quad\text{then}\quad P(A)\leq P(B). }[/math]

If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.

*Proof of monotonicity*^{[5]}

In order to verify the monotonicity property, we set [math]\displaystyle{ E_1=A }[/math] and [math]\displaystyle{ E_2=B\setminus A }[/math], where [math]\displaystyle{ A\subseteq B }[/math] and [math]\displaystyle{ E_i=\varnothing }[/math] for [math]\displaystyle{ i\geq 3 }[/math]. From the properties of the empty set ([math]\displaystyle{ \varnothing }[/math]), it is easy to see that the sets [math]\displaystyle{ E_i }[/math] are pairwise disjoint and [math]\displaystyle{ E_1\cup E_2\cup\cdots=B }[/math]. Hence, we obtain from the third axiom that

- [math]\displaystyle{ P(A)+P(B\setminus A)+\sum_{i=3}^\infty P(E_i)=P(B). }[/math]

Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to [math]\displaystyle{ P(B) }[/math] which is finite, we obtain both [math]\displaystyle{ P(A)\leq P(B) }[/math] and [math]\displaystyle{ P(\varnothing)=0 }[/math].

### The probability of the empty set

- [math]\displaystyle{ P(\varnothing)=0. }[/math]

In many cases, [math]\displaystyle{ \varnothing }[/math] is not the only event with probability 0.

*Proof of probability of the empty set*

Define [math]\displaystyle{ E_i := \varnothing }[/math] for [math]\displaystyle{ i \in \N }[/math], then these are disjoint, and [math]\displaystyle{ \bigcup_{i = 1}^\infty E_i = \varnothing = E_1 }[/math], hence by the third axiom [math]\displaystyle{ \sum_{i = 1}^\infty P(E_i) = P(E_1) }[/math]; subtracting [math]\displaystyle{ P(E_1) }[/math] (which is finite by the first axiom) yields [math]\displaystyle{ \sum_{i = 2}^\infty P(E_i) = 0 }[/math]. From this together with the first axiom follows [math]\displaystyle{ 0 \leq P(E_2) \leq \sum_{i = 2}^\infty P(E_i) = 0 }[/math], thus [math]\displaystyle{ P(E_2) = P(\varnothing) = 0 }[/math].

### The complement rule

[math]\displaystyle{ P\left(A^{c}\right) = P(\Omega-A) = 1 - P(A) }[/math]

*Proof of the complement rule*

Given [math]\displaystyle{ A }[/math] and [math]\displaystyle{ A^{c} }[/math] are mutually exclusive and that [math]\displaystyle{ A \cup A^c = \Omega }[/math]:

[math]\displaystyle{ P(A \cup A^c)=P(A)+P(A^c)
}[/math] *... (by axiom 3)*

and, [math]\displaystyle{
P(A + A^c)=P(\Omega)=1
}[/math] ... *(by axiom 2)*

[math]\displaystyle{ \Rightarrow P(A)+P(A^c)=1 }[/math]

[math]\displaystyle{ \therefore P(A^c)=1-P(A) }[/math]

### The numeric bound

It immediately follows from the monotonicity property that

- [math]\displaystyle{ 0\leq P(E)\leq 1\qquad \forall E\in F. }[/math]

*Proof of the numeric bound*

Given the complement rule [math]\displaystyle{ P(E^c)=1-P(E)
}[/math] and *axiom 1* [math]\displaystyle{ P(E^c)\geq0
}[/math]:

[math]\displaystyle{ 1-P(E) \geq 0 }[/math]

[math]\displaystyle{ \Rightarrow 1 \geq P(E) }[/math]

[math]\displaystyle{ \therefore 0\leq P(E)\leq 1 }[/math]

## Further consequences

Another important property is:

- [math]\displaystyle{ P(A \cup B) = P(A) + P(B) - P(A \cap B). }[/math]

This is called the addition law of probability, or the sum rule.
That is, the probability that an event in *A* *or* *B* will happen is the sum of the probability of an event in *A* and the probability of an event in *B*, minus the probability of an event that is in both *A* *and* *B*. The proof of this is as follows:

Firstly,

- [math]\displaystyle{ P(A\cup B) = P(A) + P(B\setminus A) }[/math] ...
*(by Axiom 3)*

So,

- [math]\displaystyle{ P(A \cup B) = P(A) + P(B\setminus (A \cap B)) }[/math] (by [math]\displaystyle{ B \setminus A = B\setminus (A \cap B) }[/math]).

Also,

- [math]\displaystyle{ P(B) = P(B\setminus (A \cap B)) + P(A \cap B) }[/math]

and eliminating [math]\displaystyle{ P(B\setminus (A \cap B)) }[/math] from both equations gives us the desired result.

An extension of the addition law to any number of sets is the inclusion–exclusion principle.

Setting *B* to the complement *A ^{c}* of

*A*in the addition law gives

- [math]\displaystyle{ P\left(A^{c}\right) = P(\Omega\setminus A) = 1 - P(A) }[/math]

That is, the probability that any event will *not* happen (or the event's complement) is 1 minus the probability that it will.

## Simple example: coin toss

Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair.

We may define:

- [math]\displaystyle{ \Omega = \{H,T\} }[/math]
- [math]\displaystyle{ F = \{\varnothing, \{H\}, \{T\}, \{H,T\}\} }[/math]

Kolmogorov's axioms imply that:

- [math]\displaystyle{ P(\varnothing) = 0 }[/math]

The probability of *neither* heads *nor* tails, is 0.

- [math]\displaystyle{ P(\{H,T\}^c) = 0 }[/math]

The probability of *either* heads *or* tails, is 1.

- [math]\displaystyle{ P(\{H\}) + P(\{T\}) = 1 }[/math]

The sum of the probability of heads and the probability of tails, is 1.

## See also

- Conditional probability – Probability of an event occurring, given that another event has already occurred
- Fully probabilistic design
- Intuitive statistics
- Set theory – Branch of mathematics that studies sets
- σ-algebra

## References

- ↑
^{1.0}^{1.1}Kolmogorov, Andrey (1950).*Foundations of the theory of probability*. New York, USA: Chelsea Publishing Company. https://archive.org/details/foundationsofthe00kolm. - ↑ Aldous, David. "What is the significance of the Kolmogorov axioms?". https://www.stat.berkeley.edu/~aldous/Real_World/kolmogorov.html.
- ↑ Terenin Alexander; David Draper (2015).
*Cox's Theorem and the Jaynesian Interpretation of Probability*. arXiv.org. Bibcode: 2015arXiv150706597T. https://archive.org/details/arxiv-1507.06597. - ↑ Hájek, Alan (August 28, 2019). "Interpretations of Probability". https://plato.stanford.edu/entries/probability-interpret/#KolProCal.
- ↑
^{5.0}^{5.1}Ross, Sheldon M. (2014).*A first course in probability*(Ninth ed.). Upper Saddle River, New Jersey. pp. 27, 28. ISBN 978-0-321-79477-2. OCLC 827003384. - ↑ Gerard, David (December 9, 2017). "Proofs from axioms". https://dcgerard.github.io/stat234/11_proofs_from_axioms.pdf.
- ↑ Jackson, Bill (2010). "Probability (Lecture Notes - Week 3)". http://www.maths.qmul.ac.uk/~bill/MTH4107/notesweek3_10.pdf.

## Further reading

- DeGroot, Morris H. (1975).
*Probability and Statistics*. Reading: Addison-Wesley. pp. 12–16. ISBN 0-201-01503-X. https://archive.org/details/probabilitystati0000degr/page/12. - McCord, James R.; Moroney, Richard M. (1964). "Axiomatic Probability".
*Introduction to Probability Theory*. New York: Macmillan. pp. 13–28. https://archive.org/details/introductiontopr00mcco. - Formal definition of probability in the Mizar system, and the list of theorems formally proved about it.

Original source: https://en.wikipedia.org/wiki/Probability axioms.
Read more |