Probability axioms

From HandWiki

The Kolmogorov axioms are the foundations of probability theory introduced by Russian mathematician Andrey Kolmogorov in 1933.[1] These axioms remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases.[2] An alternative approach to formalising probability, favoured by some Bayesians, is given by Cox's theorem.[3]


The assumptions as to setting up the axioms can be summarised as follows: Let [math]\displaystyle{ (\Omega, F, P) }[/math] be a measure space with [math]\displaystyle{ P(E) }[/math] being the probability of some event E, and [math]\displaystyle{ P(\Omega) = 1 }[/math]. Then [math]\displaystyle{ (\Omega, F, P) }[/math] is a probability space, with sample space [math]\displaystyle{ \Omega }[/math], event space [math]\displaystyle{ F }[/math] and probability measure [math]\displaystyle{ P }[/math].[1]

First axiom

The probability of an event is a non-negative real number:

[math]\displaystyle{ P(E)\in\mathbb{R}, P(E)\geq 0 \qquad \forall E \in F }[/math]

where [math]\displaystyle{ F }[/math] is the event space. It follows that [math]\displaystyle{ P(E) }[/math] is always finite, in contrast with more general measure theory. Theories which assign negative probability relax the first axiom.

Second axiom

This is the assumption of unit measure: that the probability that at least one of the elementary events in the entire sample space will occur is 1

[math]\displaystyle{ P(\Omega) = 1. }[/math]

Third axiom

This is the assumption of σ-additivity:

Any countable sequence of disjoint sets (synonymous with mutually exclusive events) [math]\displaystyle{ E_1, E_2, \ldots }[/math] satisfies
[math]\displaystyle{ P\left(\bigcup_{i = 1}^\infty E_i\right) = \sum_{i=1}^\infty P(E_i). }[/math]

Some authors consider merely finitely additive probability spaces, in which case one just needs an algebra of sets, rather than a σ-algebra.[4] Quasiprobability distributions in general relax the third axiom.


From the Kolmogorov axioms, one can deduce other useful rules for studying probabilities. The proofs[5][6][7] of these rules are a very insightful procedure that illustrates the power of the third axiom, and its interaction with the remaining two axioms. Four of the immediate corollaries and their proofs are shown below:


[math]\displaystyle{ \quad\text{if}\quad A\subseteq B\quad\text{then}\quad P(A)\leq P(B). }[/math]

If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.

Proof of monotonicity[5]

In order to verify the monotonicity property, we set [math]\displaystyle{ E_1=A }[/math] and [math]\displaystyle{ E_2=B\setminus A }[/math], where [math]\displaystyle{ A\subseteq B }[/math] and [math]\displaystyle{ E_i=\varnothing }[/math] for [math]\displaystyle{ i\geq 3 }[/math]. From the properties of the empty set ([math]\displaystyle{ \varnothing }[/math]), it is easy to see that the sets [math]\displaystyle{ E_i }[/math] are pairwise disjoint and [math]\displaystyle{ E_1\cup E_2\cup\cdots=B }[/math]. Hence, we obtain from the third axiom that

[math]\displaystyle{ P(A)+P(B\setminus A)+\sum_{i=3}^\infty P(E_i)=P(B). }[/math]

Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to [math]\displaystyle{ P(B) }[/math] which is finite, we obtain both [math]\displaystyle{ P(A)\leq P(B) }[/math] and [math]\displaystyle{ P(\varnothing)=0 }[/math].

The probability of the empty set

[math]\displaystyle{ P(\varnothing)=0. }[/math]

In many cases, [math]\displaystyle{ \varnothing }[/math] is not the only event with probability 0.

Proof of probability of the empty set

Define [math]\displaystyle{ E_i := \varnothing }[/math] for [math]\displaystyle{ i \in \N }[/math], then these are disjoint, and [math]\displaystyle{ \bigcup_{i = 1}^\infty E_i = \varnothing = E_1 }[/math], hence by the third axiom [math]\displaystyle{ \sum_{i = 1}^\infty P(E_i) = P(E_1) }[/math]; subtracting [math]\displaystyle{ P(E_1) }[/math] (which is finite by the first axiom) yields [math]\displaystyle{ \sum_{i = 2}^\infty P(E_i) = 0 }[/math]. From this together with the first axiom follows [math]\displaystyle{ 0 \leq P(E_2) \leq \sum_{i = 2}^\infty P(E_i) = 0 }[/math], thus [math]\displaystyle{ P(E_2) = P(\varnothing) = 0 }[/math].

The complement rule

[math]\displaystyle{ P\left(A^{c}\right) = P(\Omega-A) = 1 - P(A) }[/math]

Proof of the complement rule

Given [math]\displaystyle{ A }[/math] and [math]\displaystyle{ A^{c} }[/math] are mutually exclusive and that [math]\displaystyle{ A \cup A^c = \Omega }[/math]:

[math]\displaystyle{ P(A \cup A^c)=P(A)+P(A^c) }[/math] ... (by axiom 3)

and, [math]\displaystyle{ P(A + A^c)=P(\Omega)=1 }[/math] ... (by axiom 2)

[math]\displaystyle{ \Rightarrow P(A)+P(A^c)=1 }[/math]

[math]\displaystyle{ \therefore P(A^c)=1-P(A) }[/math]

The numeric bound

It immediately follows from the monotonicity property that

[math]\displaystyle{ 0\leq P(E)\leq 1\qquad \forall E\in F. }[/math]

Proof of the numeric bound

Given the complement rule [math]\displaystyle{ P(E^c)=1-P(E) }[/math] and axiom 1 [math]\displaystyle{ P(E^c)\geq0 }[/math]:

[math]\displaystyle{ 1-P(E) \geq 0 }[/math]

[math]\displaystyle{ \Rightarrow 1 \geq P(E) }[/math]

[math]\displaystyle{ \therefore 0\leq P(E)\leq 1 }[/math]

Further consequences

Another important property is:

[math]\displaystyle{ P(A \cup B) = P(A) + P(B) - P(A \cap B). }[/math]

This is called the addition law of probability, or the sum rule. That is, the probability that an event in A or B will happen is the sum of the probability of an event in A and the probability of an event in B, minus the probability of an event that is in both A and B. The proof of this is as follows:


[math]\displaystyle{ P(A\cup B) = P(A) + P(B\setminus A) }[/math] ... (by Axiom 3)


[math]\displaystyle{ P(A \cup B) = P(A) + P(B\setminus (A \cap B)) }[/math] (by [math]\displaystyle{ B \setminus A = B\setminus (A \cap B) }[/math]).


[math]\displaystyle{ P(B) = P(B\setminus (A \cap B)) + P(A \cap B) }[/math]

and eliminating [math]\displaystyle{ P(B\setminus (A \cap B)) }[/math] from both equations gives us the desired result.

An extension of the addition law to any number of sets is the inclusion–exclusion principle.

Setting B to the complement Ac of A in the addition law gives

[math]\displaystyle{ P\left(A^{c}\right) = P(\Omega\setminus A) = 1 - P(A) }[/math]

That is, the probability that any event will not happen (or the event's complement) is 1 minus the probability that it will.

Simple example: coin toss

Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair.

We may define:

[math]\displaystyle{ \Omega = \{H,T\} }[/math]
[math]\displaystyle{ F = \{\varnothing, \{H\}, \{T\}, \{H,T\}\} }[/math]

Kolmogorov's axioms imply that:

[math]\displaystyle{ P(\varnothing) = 0 }[/math]

The probability of neither heads nor tails, is 0.

[math]\displaystyle{ P(\{H,T\}^c) = 0 }[/math]

The probability of either heads or tails, is 1.

[math]\displaystyle{ P(\{H\}) + P(\{T\}) = 1 }[/math]

The sum of the probability of heads and the probability of tails, is 1.

See also


Further reading