Disintegration theorem

From HandWiki
Short description: Theorem in measure theory

In mathematics, the disintegration theorem is a result in measure theory and probability theory. It rigorously defines the idea of a non-trivial "restriction" of a measure to a measure zero subset of the measure space in question. It is related to the existence of conditional probability measures. In a sense, "disintegration" is the opposite process to the construction of a product measure.

Motivation

Consider the unit square [math]\displaystyle{ S = [0,1]\times[0,1] }[/math] in the Euclidean plane [math]\displaystyle{ \mathbb{R}^2 }[/math]. Consider the probability measure [math]\displaystyle{ \mu }[/math] defined on [math]\displaystyle{ S }[/math] by the restriction of two-dimensional Lebesgue measure [math]\displaystyle{ \lambda^2 }[/math] to [math]\displaystyle{ S }[/math]. That is, the probability of an event [math]\displaystyle{ E\subseteq S }[/math] is simply the area of [math]\displaystyle{ E }[/math]. We assume [math]\displaystyle{ E }[/math] is a measurable subset of [math]\displaystyle{ S }[/math].

Consider a one-dimensional subset of [math]\displaystyle{ S }[/math] such as the line segment [math]\displaystyle{ L_x = \{x\}\times[0, 1] }[/math]. [math]\displaystyle{ L_x }[/math] has [math]\displaystyle{ \mu }[/math]-measure zero; every subset of [math]\displaystyle{ L_x }[/math] is a [math]\displaystyle{ \mu }[/math]-null set; since the Lebesgue measure space is a complete measure space, [math]\displaystyle{ E \subseteq L_{x} \implies \mu (E) = 0. }[/math]

While true, this is somewhat unsatisfying. It would be nice to say that [math]\displaystyle{ \mu }[/math] "restricted to" [math]\displaystyle{ L_x }[/math] is the one-dimensional Lebesgue measure [math]\displaystyle{ \lambda^1 }[/math], rather than the zero measure. The probability of a "two-dimensional" event [math]\displaystyle{ E }[/math] could then be obtained as an integral of the one-dimensional probabilities of the vertical "slices" [math]\displaystyle{ E\cap L_x }[/math]: more formally, if [math]\displaystyle{ \mu_x }[/math] denotes one-dimensional Lebesgue measure on [math]\displaystyle{ L_x }[/math], then [math]\displaystyle{ \mu (E) = \int_{[0, 1]} \mu_{x} (E \cap L_{x}) \, \mathrm{d} x }[/math] for any "nice" [math]\displaystyle{ E\subseteq S }[/math]. The disintegration theorem makes this argument rigorous in the context of measures on metric spaces.

Statement of the theorem

(Hereafter, [math]\displaystyle{ \mathcal{P}(X) }[/math] will denote the collection of Borel probability measures on a topological space [math]\displaystyle{ (X, T) }[/math].) The assumptions of the theorem are as follows:

  • Let [math]\displaystyle{ Y }[/math] and [math]\displaystyle{ X }[/math] be two Radon spaces (i.e. a topological space such that every Borel probability measure on it is inner regular, e.g. separably metrizable spaces; in particular, every probability measure on it is outright a Radon measure).
  • Let [math]\displaystyle{ \mu\in\mathcal{P}(Y) }[/math].
  • Let [math]\displaystyle{ \pi : Y\to X }[/math] be a Borel-measurable function. Here one should think of [math]\displaystyle{ \pi }[/math] as a function to "disintegrate" [math]\displaystyle{ Y }[/math], in the sense of partitioning [math]\displaystyle{ Y }[/math] into [math]\displaystyle{ \{ \pi^{-1}(x)\ |\ x \in X\} }[/math]. For example, for the motivating example above, one can define [math]\displaystyle{ \pi((a,b)) = a }[/math], [math]\displaystyle{ (a,b) \in [0,1]\times [0,1] }[/math], which gives that [math]\displaystyle{ \pi^{-1}(a) = a \times [0,1] }[/math], a slice we want to capture.
  • Let [math]\displaystyle{ \nu \in\mathcal{P}(X) }[/math] be the pushforward measure [math]\displaystyle{ \nu = \pi_{*}(\mu) = \mu \circ \pi^{-1} }[/math]. This measure provides the distribution of [math]\displaystyle{ x }[/math] (which corresponds to the events [math]\displaystyle{ \pi^{-1}(x) }[/math]).

The conclusion of the theorem: There exists a [math]\displaystyle{ \nu }[/math]-almost everywhere uniquely determined family of probability measures [math]\displaystyle{ \{\mu_x\}_{x\in X} \subseteq \mathcal{P}(Y) }[/math], which provides a "disintegration" of [math]\displaystyle{ \mu }[/math] into [math]\displaystyle{ \{\mu_x\}_{x \in X} }[/math], such that:

  • the function [math]\displaystyle{ x \mapsto \mu_{x} }[/math] is Borel measurable, in the sense that [math]\displaystyle{ x \mapsto \mu_{x} (B) }[/math] is a Borel-measurable function for each Borel-measurable set [math]\displaystyle{ B\subseteq Y }[/math];
  • [math]\displaystyle{ \mu_x }[/math] "lives on" the fiber [math]\displaystyle{ \pi^{-1}(x) }[/math]: for [math]\displaystyle{ \nu }[/math]-almost all [math]\displaystyle{ x\in X }[/math], [math]\displaystyle{ \mu_{x} \left( Y \setminus \pi^{-1} (x) \right) = 0, }[/math] and so [math]\displaystyle{ \mu_x(E) =\mu_x(E\cap\pi^{-1}(x)) }[/math];
  • for every Borel-measurable function [math]\displaystyle{ f : Y \to [0,\infty] }[/math], [math]\displaystyle{ \int_{Y} f(y) \, \mathrm{d} \mu (y) = \int_{X} \int_{\pi^{-1} (x)} f(y) \, \mathrm{d} \mu_{x} (y) \mathrm{d} \nu (x). }[/math] In particular, for any event [math]\displaystyle{ E\subseteq Y }[/math], taking [math]\displaystyle{ f }[/math] to be the indicator function of [math]\displaystyle{ E }[/math],[1] [math]\displaystyle{ \mu (E) = \int_{X} \mu_{x} \left( E \right) \, \mathrm{d} \nu (x). }[/math]

Applications

Product spaces

The original example was a special case of the problem of product spaces, to which the disintegration theorem applies.

When [math]\displaystyle{ Y }[/math] is written as a Cartesian product [math]\displaystyle{ Y = X_1\times X_2 }[/math] and [math]\displaystyle{ \pi_i : Y\to X_i }[/math] is the natural projection, then each fibre [math]\displaystyle{ \pi_1^{-1}(x_1) }[/math] can be canonically identified with [math]\displaystyle{ X_2 }[/math] and there exists a Borel family of probability measures [math]\displaystyle{ \{ \mu_{x_{1}} \}_{x_{1} \in X_{1}} }[/math] in [math]\displaystyle{ \mathcal{P}(X_2) }[/math] (which is [math]\displaystyle{ (\pi_1)_*(\mu) }[/math]-almost everywhere uniquely determined) such that [math]\displaystyle{ \mu = \int_{X_{1}} \mu_{x_{1}} \, \mu \left(\pi_1^{-1}(\mathrm d x_1) \right)= \int_{X_{1}} \mu_{x_{1}} \, \mathrm{d} (\pi_{1})_{*} (\mu) (x_{1}), }[/math] which is in particular[clarification needed] [math]\displaystyle{ \int_{X_1\times X_2} f(x_1,x_2)\, \mu(\mathrm d x_1,\mathrm d x_2) = \int_{X_1}\left( \int_{X_2} f(x_1,x_2) \mu(\mathrm d x_2|x_1) \right) \mu\left( \pi_1^{-1}(\mathrm{d} x_{1})\right) }[/math] and [math]\displaystyle{ \mu(A \times B) = \int_A \mu\left(B|x_1\right) \, \mu\left( \pi_1^{-1}(\mathrm{d} x_{1})\right). }[/math]

The relation to conditional expectation is given by the identities [math]\displaystyle{ \operatorname E(f|\pi_1)(x_1)= \int_{X_2} f(x_1,x_2) \mu(\mathrm d x_2|x_1), }[/math] [math]\displaystyle{ \mu(A\times B|\pi_1)(x_1)= 1_A(x_1) \cdot \mu(B| x_1). }[/math]

Vector calculus

The disintegration theorem can also be seen as justifying the use of a "restricted" measure in vector calculus. For instance, in Stokes' theorem as applied to a vector field flowing through a compact surface [math]\displaystyle{ \Sigma \subset \mathbb{R}^3 }[/math], it is implicit that the "correct" measure on [math]\displaystyle{ \Sigma }[/math] is the disintegration of three-dimensional Lebesgue measure [math]\displaystyle{ \lambda^3 }[/math] on [math]\displaystyle{ \Sigma }[/math], and that the disintegration of this measure on ∂Σ is the same as the disintegration of [math]\displaystyle{ \lambda^3 }[/math] on [math]\displaystyle{ \partial\Sigma }[/math].[2]

Conditional distributions

The disintegration theorem can be applied to give a rigorous treatment of conditional probability distributions in statistics, while avoiding purely abstract formulations of conditional probability.[3]

See also

References

  1. Dellacherie, C.; Meyer, P.-A. (1978). Probabilities and Potential. North-Holland Mathematics Studies. Amsterdam: North-Holland. ISBN 0-7204-0701-X. 
  2. Ambrosio, L.; Gigli, N.; Savaré, G. (2005). Gradient Flows in Metric Spaces and in the Space of Probability Measures. ETH Zürich, Birkhäuser Verlag, Basel. ISBN 978-3-7643-2428-5. 
  3. Chang, J.T.; Pollard, D. (1997). "Conditioning as disintegration". Statistica Neerlandica 51 (3): 287. doi:10.1111/1467-9574.00056. http://www.stat.yale.edu/~jtc5/papers/ConditioningAsDisintegration.pdf.