Polar factorization theorem

From HandWiki
Short description: Theorem in Optimal Transport

In optimal transport, a branch of mathematics, polar factorization of vector fields is a basic result due to Brenier (1987),[1] with antecedents of Knott-Smith (1984)[2] and Rachev (1985),[3] that generalizes many existing results among which are the polar decomposition of real matrices, and the rearrangement of real-valued functions.

The theorem

Notation. Denote [math]\displaystyle{ \xi_\# \mu }[/math] the image measure of [math]\displaystyle{ \mu }[/math] through the map [math]\displaystyle{ \xi }[/math].

Definition: Measure preserving map. Let [math]\displaystyle{ (X,\mu) }[/math] and [math]\displaystyle{ (Y,\nu) }[/math] be some probability spaces and [math]\displaystyle{ \sigma :X \rightarrow Y }[/math] a map. Then, [math]\displaystyle{ \sigma }[/math] is said to be measure preserving if for every [math]\displaystyle{ \nu }[/math]-measurable subset [math]\displaystyle{ \Omega }[/math] of [math]\displaystyle{ Y }[/math], [math]\displaystyle{ \sigma^{-1}(\Omega) }[/math] is [math]\displaystyle{ \mu }[/math]-measurable and [math]\displaystyle{ \mu(\sigma^{-1}(\Omega))=\nu(\Omega ) }[/math], that is: [math]\displaystyle{ \textstyle \int_{X}f\circ \sigma d\mu =\int_{Y}fd\nu }[/math] with [math]\displaystyle{ f }[/math] that is [math]\displaystyle{ \nu }[/math]-integrable and [math]\displaystyle{ \sigma \circ f }[/math] that is [math]\displaystyle{ \mu }[/math]-integrable.

Theorem. Consider a map [math]\displaystyle{ \xi :\Omega \rightarrow R^{d} }[/math] where [math]\displaystyle{ \Omega }[/math] is a convex subset of [math]\displaystyle{ R^{d} }[/math], and [math]\displaystyle{ \mu }[/math] a measure on [math]\displaystyle{ \Omega }[/math] which is absolutely continuous. Assume that [math]\displaystyle{ \xi_{\#}\mu }[/math] is absolutely continuous. Then there is a convex function [math]\displaystyle{ \varphi :\Omega \rightarrow R }[/math] and a map [math]\displaystyle{ \sigma :\Omega \rightarrow \Omega }[/math] preserving [math]\displaystyle{ \mu }[/math] such that

[math]\displaystyle{ \xi =\left( \nabla \varphi \right) \circ \sigma }[/math]

In addition, [math]\displaystyle{ \nabla \varphi }[/math] and [math]\displaystyle{ \sigma }[/math] are uniquely defined almost everywhere.[1][4]

Applications and connections

Dimension 1

In dimension 1, and when [math]\displaystyle{ \mu }[/math] is the Lebesgue measure over the unit interval, the result specializes to Ryff's theorem.[5] When [math]\displaystyle{ d=1 }[/math] and [math]\displaystyle{ \mu }[/math] is the uniform distribution over [math]\displaystyle{ \left[0,1\right] }[/math], the polar decomposition boils down to

[math]\displaystyle{ \xi \left( t\right) =F_{X}^{-1}\left( \sigma \left( t\right) \right) }[/math]

where [math]\displaystyle{ F_{X} }[/math] is cumulative distribution function of the random variable [math]\displaystyle{ \xi \left( U\right) }[/math] and [math]\displaystyle{ U }[/math] has a uniform distribution over [math]\displaystyle{ \left[ 0,1\right] }[/math]. [math]\displaystyle{ F_{X} }[/math] is assumed to be continuous, and [math]\displaystyle{ \sigma \left( t\right)=F_{X}\left( \xi \left( t\right) \right) }[/math] preserves the Lebesgue measure on [math]\displaystyle{ \left[ 0,1\right] }[/math].

Polar decomposition of matrices

When [math]\displaystyle{ \xi }[/math] is a linear map and [math]\displaystyle{ \mu }[/math] is the Gaussian normal distribution, the result coincides with the polar decomposition of matrices. Assuming [math]\displaystyle{ \xi \left( x\right) =Mx }[/math] where [math]\displaystyle{ M }[/math] is an invertible [math]\displaystyle{ d\times d }[/math] matrix and considering [math]\displaystyle{ \mu }[/math] the [math]\displaystyle{ \mathcal{N}\left( 0,I_{d}\right) }[/math] probability measure, the polar decomposition boils down to

[math]\displaystyle{ M=SO }[/math]

where [math]\displaystyle{ S }[/math] is a symmetric positive definite matrix, and [math]\displaystyle{ O }[/math] an orthogonal matrix. The connection with the polar factorization is [math]\displaystyle{ \varphi \left(x\right) =x^{\top }Sx/2 }[/math] which is convex, and [math]\displaystyle{ \sigma \left( x\right) =Ox }[/math] which preserves the [math]\displaystyle{ \mathcal{N}\left( 0,I_{d}\right) }[/math] measure.

Helmholtz decomposition

The results also allow to recover Helmholtz decomposition. Letting [math]\displaystyle{ x\rightarrow V\left( x\right) }[/math] be a smooth vector field it can then be written in a unique way as

[math]\displaystyle{ V=w+\nabla p }[/math]

where [math]\displaystyle{ p }[/math] is a smooth real function defined on [math]\displaystyle{ \Omega }[/math], unique up to an additive constant, and [math]\displaystyle{ w }[/math] is a smooth divergence free vector field, parallel to the boundary of [math]\displaystyle{ \Omega }[/math].

The connection can be seen by assuming [math]\displaystyle{ \mu }[/math] is the Lebesgue measure on a compact set [math]\displaystyle{ \Omega \subset R^{n} }[/math] and by writing [math]\displaystyle{ \xi }[/math] as a perturbation of the identity map

[math]\displaystyle{ \xi _{\epsilon }(x)=x+\epsilon V(x) }[/math]

where [math]\displaystyle{ \epsilon }[/math] is small. The polar decomposition of [math]\displaystyle{ \xi _{\epsilon } }[/math] is given by [math]\displaystyle{ \xi _{\epsilon }=(\nabla \varphi_{\epsilon })\circ \sigma_{\epsilon } }[/math]. Then, for any test function [math]\displaystyle{ f:R^{n}\rightarrow R }[/math] the following holds:

[math]\displaystyle{ \int_{\Omega }f(x+\epsilon V(x))dx=\int_{\Omega }f((\nabla \varphi _{\epsilon })\circ \sigma _{\epsilon }\left( x\right) )dx=\int_{\Omega }f(\nabla \varphi _{\epsilon }\left( x\right) )dx }[/math]

where the fact that [math]\displaystyle{ \sigma _{\epsilon } }[/math] was preserving the Lebesgue measure was used in the second equality.

In fact, as [math]\displaystyle{ \textstyle \varphi _{0}(x)=\frac{1}{2}\Vert x\Vert ^{2} }[/math], one can expand [math]\displaystyle{ \textstyle \varphi _{\epsilon }(x)=\frac{1}{2}\Vert x\Vert ^{2}+\epsilon p(x)+O(\epsilon ^{2}) }[/math], and therefore [math]\displaystyle{ \textstyle \nabla \varphi_{\epsilon }\left( x\right) =x+\epsilon \nabla p(x)+O(\epsilon ^{2}) }[/math]. As a result, [math]\displaystyle{ \textstyle \int_{\Omega }\left( V(x)-\nabla p(x)\right) \nabla f(x))dx }[/math] for any smooth function [math]\displaystyle{ f }[/math], which implies that [math]\displaystyle{ w\left( x\right) =V(x)-\nabla p(x) }[/math] is divergence-free.[1][6]

See also

  • Polar decomposition – Representation of invertible matrices as unitary operator multiplying a Hermitian operator

References

  1. 1.0 1.1 1.2 Brenier, Yann (1991). "Polar factorization and monotone rearrangement of vector‐valued functions". Communications on Pure and Applied Mathematics 44 (4): 375–417. doi:10.1002/cpa.3160440402. http://www.math.toronto.edu/~mccann/assignments/477/Brenier91.pdf. Retrieved 16 April 2021. 
  2. Knott, M.; Smith, C. S. (1984). "On the optimal mapping of distributions". Journal of Optimization Theory and Applications 43: 39–49. doi:10.1007/BF00934745. https://link.springer.com/article/10.1007/BF00934745. Retrieved 16 April 2021. 
  3. Rachev, Svetlozar T. (1985). "The Monge–Kantorovich mass transference problem and its stochastic applications". Theory of Probability & Its Applications 29 (4): 647–676. doi:10.1137/1129093. https://www.math.ucdavis.edu/~saito/data/emd/rachev.pdf. Retrieved 16 April 2021. 
  4. Santambrogio, Filippo (2015). Optimal transport for applied mathematicians. New York: Birkäuser. 
  5. Ryff, John V. (1965). "Orbits of L1-Functions Under Doubly Stochastic Transformation". Transactions of the American Mathematical Society 117: 92–100. doi:10.2307/1994198. https://www.jstor.org/stable/1994198. Retrieved 16 April 2021. 
  6. Villani, Cédric (2003). Topics in optimal transportation. American Mathematical Society.