Mercer's theorem

In mathematics, specifically functional analysis, Mercer's theorem is a representation of a symmetric positive-definite function on a square as a sum of a convergent sequence of product functions. This theorem, presented in (Mercer 1909), is one of the most notable results of the work of James Mercer (1883–1932). It is an important theoretical tool in the theory of integral equations; it is used in the Hilbert space theory of stochastic processes, for example the Karhunen–Loève theorem; and it is also used in the reproducing kernel Hilbert space theory where it characterizes a symmetric positive-definite kernel as a reproducing kernel.^[1]

Introduction

To explain Mercer's theorem, we first consider an important special case; see below for a more general formulation. A kernel, in this context, is a symmetric continuous function

[math]\displaystyle{ K: [a,b] \times [a,b] \rightarrow \mathbb{R} }[/math]

where symmetric means that [math]\displaystyle{ K(x,y) = K(y,x) }[/math] for all [math]\displaystyle{ x,y \in [a,b] }[/math].

K is said to be a positive-definite kernel if and only if

[math]\displaystyle{ \sum_{i=1}^n\sum_{j=1}^n K(x_i, x_j) c_i c_j \geq 0 }[/math]

for all finite sequences of points x₁, ..., x_n of [a, b] and all choices of real numbers c₁, ..., c_n. Note that the term "positive-definite" is well-established in literature despite the weak inequality in the definition.^[2]^[3]

Associated to K is a linear operator (more specifically a Hilbert–Schmidt integral operator) on functions defined by the integral

[math]\displaystyle{ [T_K \varphi](x) =\int_a^b K(x,s) \varphi(s)\, ds. }[/math]

For technical considerations we assume [math]\displaystyle{ \varphi }[/math] can range through the space L²[a, b] (see Lp space) of square-integrable real-valued functions. Since T_K is a linear operator, we can talk about eigenvalues and eigenfunctions of T_K.

Theorem. Suppose K is a continuous symmetric positive-definite kernel. Then there is an orthonormal basis {e_i}_i of L²[a, b] consisting of eigenfunctions of T_K such that the corresponding sequence of eigenvalues {λ_i}_i is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on [a, b] and K has the representation

[math]\displaystyle{ K(s,t) = \sum_{j=1}^\infty \lambda_j \, e_j(s) \, e_j(t) }[/math]

where the convergence is absolute and uniform.

Details

We now explain in greater detail the structure of the proof of Mercer's theorem, particularly how it relates to spectral theory of compact operators.

The map K ↦ T_K is injective.
T_K is a non-negative symmetric compact operator on L²[a,b]; moreover K(x, x) ≥ 0.

To show compactness, show that the image of the unit ball of L²[a,b] under T_K equicontinuous and apply Ascoli's theorem, to show that the image of the unit ball is relatively compact in C([a,b]) with the uniform norm and a fortiori in L²[a,b].

Now apply the spectral theorem for compact operators on Hilbert spaces to T_K to show the existence of the orthonormal basis {e_i}_i of L²[a,b]

[math]\displaystyle{ \lambda_i e_i(t)= [T_K e_i](t) = \int_a^b K(t,s) e_i(s)\, ds. }[/math]

If λ_i ≠ 0, the eigenvector (eigenfunction) e_i is seen to be continuous on [a,b]. Now

[math]\displaystyle{ \sum_{i=1}^\infty \lambda_i |e_i(t) e_i(s)| \leq \sup_{x \in [a,b]} |K(x,x)|, }[/math]

which shows that the sequence

[math]\displaystyle{ \sum_{i=1}^\infty \lambda_i e_i(t) e_i(s) }[/math]

converges absolutely and uniformly to a kernel K₀ which is easily seen to define the same operator as the kernel K. Hence K=K₀ from which Mercer's theorem follows.

Finally, to show non-negativity of the eigenvalues one can write [math]\displaystyle{ \lambda \langle f,f \rangle= \langle f, T_{K}f \rangle }[/math] and expressing the right hand side as an integral well-approximated by its Riemann sums, which are non-negative by positive-definiteness of K, implying [math]\displaystyle{ \lambda \langle f,f \rangle \geq 0 }[/math], implying [math]\displaystyle{ \lambda \geq 0 }[/math].

Trace

The following is immediate:

Theorem. Suppose K is a continuous symmetric positive-definite kernel; T_K has a sequence of nonnegative eigenvalues {λ_i}_i. Then

[math]\displaystyle{ \int_a^b K(t,t)\, dt = \sum_i \lambda_i. }[/math]

This shows that the operator T_K is a trace class operator and

[math]\displaystyle{ \operatorname{trace}(T_K) = \int_a^b K(t,t)\, dt. }[/math]

Generalizations

Mercer's theorem itself is a generalization of the result that any symmetric positive-semidefinite matrix is the Gramian matrix of a set of vectors.

The first generalization^{[citation needed]} replaces the interval [a, b] with any compact Hausdorff space and Lebesgue measure on [a, b] is replaced by a finite countably additive measure μ on the Borel algebra of X whose support is X. This means that μ(U) > 0 for any nonempty open subset U of X.

A recent generalization^{[citation needed]} replaces these conditions by the following: the set X is a first-countable topological space endowed with a Borel (complete) measure μ. X is the support of μ and, for all x in X, there is an open set U containing x and having finite measure. Then essentially the same result holds:

Theorem. Suppose K is a continuous symmetric positive-definite kernel on X. If the function κ is L¹_μ(X), where κ(x)=K(x,x), for all x in X, then there is an orthonormal set {e_i}_i of L²_μ(X) consisting of eigenfunctions of T_K such that corresponding sequence of eigenvalues {λ_i}_i is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on X and K has the representation

[math]\displaystyle{ K(s,t) = \sum_{j=1}^\infty \lambda_j \, e_j(s) \, e_j(t) }[/math]

where the convergence is absolute and uniform on compact subsets of X.

The next generalization^{[citation needed]} deals with representations of measurable kernels.

Let (X, M, μ) be a σ-finite measure space. An L² (or square-integrable) kernel on X is a function

[math]\displaystyle{ K \in L^2_{\mu \otimes \mu}(X \times X). }[/math]

L² kernels define a bounded operator T_K by the formula

[math]\displaystyle{ \langle T_K \varphi, \psi \rangle = \int_{X \times X} K(y,x) \varphi(y) \psi(x) \,d[\mu \otimes \mu](y,x). }[/math]

T_K is a compact operator (actually it is even a Hilbert–Schmidt operator). If the kernel K is symmetric, by the spectral theorem, T_K has an orthonormal basis of eigenvectors. Those eigenvectors that correspond to non-zero eigenvalues can be arranged in a sequence {e_i}_i (regardless of separability).

Theorem. If K is a symmetric positive-definite kernel on (X, M, μ), then

[math]\displaystyle{ K(y,x) = \sum_{i \in \mathbb{N}} \lambda_i e_i(y) e_i(x) }[/math]

where the convergence in the L² norm. Note that when continuity of the kernel is not assumed, the expansion no longer converges uniformly.

Mercer's condition

In mathematics, a real-valued function K(x,y) is said to fulfill Mercer's condition if for all square-integrable functions g(x) one has

[math]\displaystyle{ \iint g(x)K(x,y)g(y)\,dx\,dy \geq 0. }[/math]

Discrete analog

This is analogous to the definition of a positive-semidefinite matrix. This is a matrix [math]\displaystyle{ K }[/math] of dimension [math]\displaystyle{ N }[/math], which satisfies, for all vectors [math]\displaystyle{ g }[/math], the property

[math]\displaystyle{ (g,Kg)=g^{T}{\cdot}Kg=\sum_{i=1}^N\sum_{j=1}^N\,g_i\,K_{ij}\,g_j\geq0 }[/math].

Examples

A positive constant function

[math]\displaystyle{ K(x, y)=c\, }[/math]

satisfies Mercer's condition, as then the integral becomes by Fubini's theorem

[math]\displaystyle{ \iint g(x)\,c\,g(y)\,dx dy = c\int\! g(x) \,dx \int\! g(y) \,dy = c\left(\int\! g(x) \,dx\right)^2 }[/math]

which is indeed non-negative.

Notes

↑ Bartlett, Peter (2008). "Reproducing Kernel Hilbert Spaces". Lecture notes of CS281B/Stat241B Statistical Learning Theory. University of California at Berkeley. https://people.eecs.berkeley.edu/~bartlett/courses/281b-sp08/7.pdf.
↑ Mohri, Mehryar (2018). Foundations of machine learning. Afshin Rostamizadeh, Ameet Talwalkar (Second ed.). Cambridge, Massachusetts. ISBN 978-0-262-03940-6. OCLC 1041560990. https://www.worldcat.org/oclc/1041560990.
↑ Berlinet, A. (2004). Reproducing kernel Hilbert spaces in probability and statistics. Christine Thomas-Agnan. New York: Springer Science+Business Media. ISBN 1-4419-9096-8. OCLC 844346520. https://www.worldcat.org/oclc/844346520.

References

Adriaan Zaanen, Linear Analysis, North Holland Publishing Co., 1960,
Ferreira, J. C., Menegatto, V. A., Eigenvalues of integral operators defined by smooth positive definite kernels, Integral equation and Operator Theory, 64 (2009), no. 1, 61–81. (Gives the generalization of Mercer's theorem for metric spaces. The result is easily adapted to first countable topological spaces)
Konrad Jörgens, Linear integral operators, Pitman, Boston, 1982,
Richard Courant and David Hilbert, Methods of Mathematical Physics, vol 1, Interscience 1953,
Robert Ash, Information Theory, Dover Publications, 1990,
Mercer, J. (1909), "Functions of positive and negative type and their connection with the theory of integral equations", Philosophical Transactions of the Royal Society A 209 (441–458): 415–446, doi:10.1098/rsta.1909.0016, Bibcode: 1909RSPTA.209..415M ,
Hazewinkel, Michiel, ed. (2001), "Mercer theorem", Encyclopedia of Mathematics, Springer Science+Business Media B.V. / Kluwer Academic Publishers, ISBN 978-1-55608-010-4, https://www.encyclopediaofmath.org/index.php?title=p/m063440
H. König, Eigenvalue distribution of compact operators, Birkhäuser Verlag, 1986. (Gives the generalization of Mercer's theorem for finite measures μ.)

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Mercer's theorem. Read more

[1] Bartlett, Peter (2008). "Reproducing Kernel Hilbert Spaces". Lecture notes of CS281B/Stat241B Statistical Learning Theory. University of California at Berkeley. https://people.eecs.berkeley.edu/~bartlett/courses/281b-sp08/7.pdf.

[2] Mohri, Mehryar (2018). Foundations of machine learning. Afshin Rostamizadeh, Ameet Talwalkar (Second ed.). Cambridge, Massachusetts. ISBN 978-0-262-03940-6. OCLC 1041560990. https://www.worldcat.org/oclc/1041560990.

[3] Berlinet, A. (2004). Reproducing kernel Hilbert spaces in probability and statistics. Christine Thomas-Agnan. New York: Springer Science+Business Media. ISBN 1-4419-9096-8. OCLC 844346520. https://www.worldcat.org/oclc/844346520.

[1]

[2]

[3]

Collapse v t e Functional analysis (topics)
Topological vector spaces	Asplund Banach (list) Banach lattice Barrelled Bornological Brauner F-space Fréchet (tame) Hilbert (Inner product space Polarization identity) LF-space Locally convex (Seminorms/Minkowski functionals) Mackey Montel Nuclear Normed (norm) Quasinormed Reflexive Riesz Smith Stereotype Strictly convex Webbed Topological tensor product (of Hilbert spaces)
Topologies of function spaces	Dual Dual space (Dual norm) Operator Ultraweak Weak (polar operator) Mackey Strong (polar operator) Ultrastrong Uniform convergence
Linear operators	Adjoint Bilinear (form operator sesquilinear) (Un)Bounded Closed Compact (on Hilbert spaces) (Dis)Continuous Densely defined Fredholm Hilbert–Schmidt Functionals (positive) Normal Nuclear Self-adjoint Strictly singular Trace class Transpose Unitary
Operator theory	Banach algebras C-algebras Spectrum (C-algebra radius) Spectral theory (of ODEs Spectral theorem) Polar decomposition Singular value decomposition
Theorems	Banach–Alaoglu Banach–Mazur Banach–Saks Banach–Schauder (open mapping) Banach–Steinhaus (Uniform boundedness) Bessel's inequality Cauchy–Schwarz inequality Closed graph Closed range Eberlein–Šmulian Freudenthal spectral Gelfand–Mazur Gelfand–Naimark Goldstine Hahn–Banach (hyperplane separation) Kakutani fixed-point Krein–Milman Lomonosov's invariant subspace Mackey–Arens Mazur's lemma M. Riesz extension Riesz representation Parseval's identity Schauder fixed-point
Analysis	Abstract Wiener space Bochner space Differentiation in Fréchet spaces Derivatives (Fréchet Gateaux functional holomorphic) Integrals (Bochner Dunford Gelfand–Pettis regulated Paley–Wiener weak) Functional calculus (Borel continuous holomorphic) Inverse function theorem (Nash–Moser theorem) Measures (Lebesgue Projection-valued Vector) Weakly measurable function
Types of sets	Absolutely convex Absorbing Balanced Bounded Convex Convex cone (subset) Linear cone (subset) Radial Star-shaped Symmetric Zonotope
Subsets / set operations	Algebraic interior (core) Bounding points Convex hull Extreme point Interior Minkowski addition Polar

Anonymous

Search

Mercer's theorem

Namespaces

More

Page actions

Contents

Introduction

Details

Trace

Generalizations

Mercer's condition

Discrete analog

Examples

See also

Notes

References

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Mercer's theorem

Introduction

Details

Trace

Generalizations

Mercer's condition

Discrete analog

Examples

See also

Notes

References

Navigation

Wiki tools

Page tools

Other projects

Categories