# Disintegration theorem

Short description: Theorem in measure theory

In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is related to the existence of conditional probability measures. In a sense, "disintegration" is the opposite process to the construction of a product measure.

## Motivation

Consider the unit square in the Euclidean plane R2, S = [0, 1] × [0, 1]. Consider the probability measure μ defined on S by the restriction of two-dimensional Lebesgue measure λ2 to S. That is, the probability of an event ES is simply the area of E. We assume E is a measurable subset of S.

Consider a one-dimensional subset of S such as the line segment Lx = {x} × [0, 1]. Lx has μ-measure zero; every subset of Lx is a μ-null set; since the Lebesgue measure space is a complete measure space, $\displaystyle{ E \subseteq L_{x} \implies \mu (E) = 0. }$

While true, this is somewhat unsatisfying. It would be nice to say that μ "restricted to" Lx is the one-dimensional Lebesgue measure λ1, rather than the zero measure. The probability of a "two-dimensional" event E could then be obtained as an integral of the one-dimensional probabilities of the vertical "slices" ELx: more formally, if μx denotes one-dimensional Lebesgue measure on Lx, then $\displaystyle{ \mu (E) = \int_{[0, 1]} \mu_{x} (E \cap L_{x}) \, \mathrm{d} x }$ for any "nice" ES. The disintegration theorem makes this argument rigorous in the context of measures on metric spaces.

## Statement of the theorem

(Hereafter, P(X) will denote the collection of Borel probability measures on a topological space (X, T).) The assumptions of the theorem are as follows:

• Let Y and X be two Radon spaces (i.e. a topological space such that every Borel probability measure on M is inner regular e.g. separable metric spaces on which every probability measure is a Radon measure).
• Let μ ∈ P(Y).
• Let π : YX be a Borel-measurable function. Here one should think of π as a function to "disintegrate" Y, in the sense of partitioning Y into $\displaystyle{ \{ \pi^{-1}(x)\ |\ x \in X\} }$. For example, for the motivating example above, one can define $\displaystyle{ \pi((a,b)) = a, (a,b) \in [0,1]\times [0,1] }$, which gives that $\displaystyle{ \pi^{-1}(a) = a \times [0,1] }$, a slice we want to capture.
• Let $\displaystyle{ \nu }$P(X) be the pushforward measure ν = π(μ) = μ ∘ π−1. This measure provides the distribution of x (which corresponds to the events $\displaystyle{ \pi^{-1}(x) }$).

The conclusion of the theorem: There exists a $\displaystyle{ \nu }$-almost everywhere uniquely determined family of probability measures {μx}xXP(Y), which provides a "disintegration" of $\displaystyle{ \mu }$ into $\displaystyle{ \{\mu_x\}_{x \in X} }$, such that:

• the function $\displaystyle{ x \mapsto \mu_{x} }$ is Borel measurable, in the sense that $\displaystyle{ x \mapsto \mu_{x} (B) }$ is a Borel-measurable function for each Borel-measurable set BY;
• μx "lives on" the fiber π−1(x): for $\displaystyle{ \nu }$-almost all xX, $\displaystyle{ \mu_{x} \left( Y \setminus \pi^{-1} (x) \right) = 0, }$ and so μx(E) = μx(E ∩ π−1(x));
• for every Borel-measurable function f : Y → [0, ∞], $\displaystyle{ \int_{Y} f(y) \, \mathrm{d} \mu (y) = \int_{X} \int_{\pi^{-1} (x)} f(y) \, \mathrm{d} \mu_{x} (y) \mathrm{d} \nu (x). }$ In particular, for any event EY, taking f to be the indicator function of E, $\displaystyle{ \mu (E) = \int_{X} \mu_{x} \left( E \right) \, \mathrm{d} \nu (x). }$

## Applications

### Product spaces

The original example was a special case of the problem of product spaces, to which the disintegration theorem applies.

When Y is written as a Cartesian product Y = X1 × X2 and πi : YXi is the natural projection, then each fibre π1−1(x1) can be canonically identified with X2 and there exists a Borel family of probability measures $\displaystyle{ \{ \mu_{x_{1}} \}_{x_{1} \in X_{1}} }$ in P(X2) (which is (π1)(μ)-almost everywhere uniquely determined) such that $\displaystyle{ \mu = \int_{X_{1}} \mu_{x_{1}} \, \mu \left(\pi_1^{-1}(\mathrm d x_1) \right)= \int_{X_{1}} \mu_{x_{1}} \, \mathrm{d} (\pi_{1})_{*} (\mu) (x_{1}), }$ which is in particular[clarification needed] $\displaystyle{ \int_{X_1\times X_2} f(x_1,x_2)\, \mu(\mathrm d x_1,\mathrm d x_2) = \int_{X_1}\left( \int_{X_2} f(x_1,x_2) \mu(\mathrm d x_2|x_1) \right) \mu\left( \pi_1^{-1}(\mathrm{d} x_{1})\right) }$ and $\displaystyle{ \mu(A \times B) = \int_A \mu\left(B|x_1\right) \, \mu\left( \pi_1^{-1}(\mathrm{d} x_{1})\right). }$

The relation to conditional expectation is given by the identities $\displaystyle{ \operatorname E(f|\pi_1)(x_1)= \int_{X_2} f(x_1,x_2) \mu(\mathrm d x_2|x_1), }$ $\displaystyle{ \mu(A\times B|\pi_1)(x_1)= 1_A(x_1) \cdot \mu(B| x_1). }$

### Vector calculus

The disintegration theorem can also be seen as justifying the use of a "restricted" measure in vector calculus. For instance, in Stokes' theorem as applied to a vector field flowing through a compact surface Σ ⊂ R3, it is implicit that the "correct" measure on Σ is the disintegration of three-dimensional Lebesgue measure λ3 on Σ, and that the disintegration of this measure on ∂Σ is the same as the disintegration of λ3 on ∂Σ.

### Conditional distributions

The disintegration theorem can be applied to give a rigorous treatment of conditional probability distributions in statistics, while avoiding purely abstract formulations of conditional probability.