Kernel-independent component analysis

From HandWiki

In statistics, kernel-independent component analysis (kernel ICA) is an efficient algorithm for independent component analysis which estimates source components by optimizing a generalized variance contrast function, which is based on representations in a reproducing kernel Hilbert space.[1][2] Those contrast functions use the notion of mutual information as a measure of statistical independence.

Main idea

Kernel ICA is based on the idea that correlations between two random variables can be represented in a reproducing kernel Hilbert space (RKHS), denoted by [math]\displaystyle{ \mathcal{F} }[/math], associated with a feature map [math]\displaystyle{ L_x: \mathcal{F} \mapsto \mathbb{R} }[/math] defined for a fixed [math]\displaystyle{ x \in \mathbb{R} }[/math]. The [math]\displaystyle{ \mathcal{F} }[/math]-correlation between two random variables [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] is defined as

[math]\displaystyle{ \rho_{\mathcal{F}}(X,Y) = \max_{f, g \in \mathcal{F}} \operatorname{corr}( \langle L_X,f \rangle, \langle L_Y,g \rangle) }[/math]

where the functions [math]\displaystyle{ f,g: \mathbb{R} \to \mathbb{R} }[/math] range over [math]\displaystyle{ \mathcal{F} }[/math] and

[math]\displaystyle{ \operatorname{corr}( \langle L_X,f \rangle, \langle L_Y,g \rangle) := \frac{\operatorname{cov}(f(X), g(Y)) }{\operatorname{var}(f(X))^{1/2} \operatorname{var}(g(Y))^{1/2} } }[/math]

for fixed [math]\displaystyle{ f,g \in \mathcal{F} }[/math].[1] Note that the reproducing property implies that [math]\displaystyle{ f(x) = \langle L_x, f \rangle }[/math] for fixed [math]\displaystyle{ x \in \mathbb{R} }[/math] and [math]\displaystyle{ f \in \mathcal{F} }[/math].[3] It follows then that the [math]\displaystyle{ \mathcal{F} }[/math]-correlation between two independent random variables is zero.

This notion of [math]\displaystyle{ \mathcal{F} }[/math]-correlations is used for defining contrast functions that are optimized in the Kernel ICA algorithm. Specifically, if [math]\displaystyle{ \mathbf{X} := (x_{ij}) \in \mathbb{R}^{n \times m} }[/math] is a prewhitened data matrix, that is, the sample mean of each column is zero and the sample covariance of the rows is the [math]\displaystyle{ m \times m }[/math] dimensional identity matrix, Kernel ICA estimates a [math]\displaystyle{ m \times m }[/math] dimensional orthogonal matrix [math]\displaystyle{ \mathbf{A} }[/math] so as to minimize finite-sample [math]\displaystyle{ \mathcal{F} }[/math]-correlations between the columns of [math]\displaystyle{ \mathbf{S} := \mathbf{X} \mathbf{A}^{\prime} }[/math].

References

  1. 1.0 1.1 Bach, Francis R.; Jordan, Michael I. (2003). "Kernel independent component analysis". The Journal of Machine Learning Research 3: 1–48. doi:10.1162/153244303768966085. https://www.di.ens.fr/~fbach/kernelICA-jmlr.pdf. 
  2. Bach, Francis R.; Jordan, Michael I. (2003). "Kernel independent component analysis". 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 4. pp. IV-876-9. doi:10.1109/icassp.2003.1202783. ISBN 978-0-7803-7663-2. https://www.di.ens.fr/~fbach/kernelICA-icassp03.pdf. 
  3. Saitoh, Saburou (1988). Theory of Reproducing Kernels and Its Applications. Longman. ISBN 978-0582035645.