FastICA
FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology.[1][2] Like most ICA algorithms, FastICA seeks an orthogonal rotation of prewhitened data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. Non-gaussianity serves as a proxy for statistical independence, which is a very strong condition and requires infinite data to verify. FastICA can also be alternatively derived as an approximative Newton iteration.
Algorithm
Prewhitening the data
Let the [math]\displaystyle{ \mathbf{X} := (x_{ij}) \in \mathbb{R}^{N \times M} }[/math] denote the input data matrix, [math]\displaystyle{ M }[/math] the number of columns corresponding with the number of samples of mixed signals and [math]\displaystyle{ N }[/math] the number of rows corresponding with the number of independent source signals. The input data matrix [math]\displaystyle{ \mathbf{X} }[/math] must be prewhitened, or centered and whitened, before applying the FastICA algorithm to it.
- Centering the data entails demeaning each component of the input data [math]\displaystyle{ \mathbf{X} }[/math], that is,
- for each [math]\displaystyle{ i =1,\ldots,N }[/math] and [math]\displaystyle{ j = 1, \ldots, M }[/math]. After centering, each row of [math]\displaystyle{ \mathbf{X} }[/math] has an expected value of [math]\displaystyle{ 0 }[/math].
- Whitening the data requires a linear transformation [math]\displaystyle{ \mathbf{L}: \mathbb{R}^{N \times M} \to \mathbb{R}^{N \times M} }[/math] of the centered data so that the components of [math]\displaystyle{ \mathbf{L}(\mathbf{X}) }[/math] are uncorrelated and have variance one. More precisely, if [math]\displaystyle{ \mathbf{X} }[/math] is a centered data matrix, the covariance of [math]\displaystyle{ \mathbf{L}_{\mathbf{x}} := \mathbf{L}(\mathbf{X}) }[/math] is the [math]\displaystyle{ (N \times N) }[/math]-dimensional identity matrix, that is,
- A common method for whitening is by performing an eigenvalue decomposition on the covariance matrix of the centered data [math]\displaystyle{ \mathbf{X} }[/math], [math]\displaystyle{ E\left \{ \mathbf{X} \mathbf{X}^{T} \right \} = \mathbf{E}\mathbf{D}\mathbf{E}^T }[/math], where [math]\displaystyle{ \mathbf{E} }[/math] is the matrix of eigenvectors and [math]\displaystyle{ \mathbf{D} }[/math] is the diagonal matrix of eigenvalues. The whitened data matrix is defined thus by
Single component extraction
The iterative algorithm finds the direction for the weight vector [math]\displaystyle{ \mathbf{w} \in \mathbb{R}^N }[/math] that maximizes a measure of non-Gaussianity of the projection [math]\displaystyle{ \mathbf{w}^T \mathbf{X} }[/math], with [math]\displaystyle{ \mathbf{X} \in \mathbb{R}^{N \times M} }[/math] denoting a prewhitened data matrix as described above. Note that [math]\displaystyle{ \mathbf{w} }[/math] is a column vector. To measure non-Gaussianity, FastICA relies on a nonquadratic nonlinear function [math]\displaystyle{ f(u) }[/math], its first derivative [math]\displaystyle{ g(u) }[/math], and its second derivative [math]\displaystyle{ g^{\prime}(u) }[/math]. Hyvärinen states that the functions
are useful for general purposes, while
may be highly robust.[1] The steps for extracting the weight vector [math]\displaystyle{ \mathbf{w} }[/math] for single component in FastICA are the following:
- Randomize the initial weight vector [math]\displaystyle{ \mathbf{w} }[/math]
- Let [math]\displaystyle{ \mathbf{w}^+ \leftarrow E\left\{\mathbf{X} g(\mathbf{w}^T \mathbf{X})^T\right\} - E\left\{g'(\mathbf{w}^T \mathbf{X})\right\}\mathbf{w} }[/math], where [math]\displaystyle{ E\left\{...\right\} }[/math] means averaging over all column-vectors of matrix [math]\displaystyle{ \mathbf{X} }[/math]
- Let [math]\displaystyle{ \mathbf{w} \leftarrow \mathbf{w}^+ / \|\mathbf{w}^+\| }[/math]
- If not converged, go back to 2
Multiple component extraction
The single unit iterative algorithm estimates only one weight vector which extracts a single component. Estimating additional components that are mutually "independent" requires repeating the algorithm to obtain linearly independent projection vectors - note that the notion of independence here refers to maximizing non-Gaussianity in the estimated components. Hyvärinen provides several ways of extracting multiple components with the simplest being the following. Here, [math]\displaystyle{ \mathbf{1_{M}} }[/math] is a column vector of 1's of dimension [math]\displaystyle{ M }[/math].
Algorithm FastICA
- Input: [math]\displaystyle{ C }[/math] Number of desired components
- Input: [math]\displaystyle{ \mathbf{X} \in \mathbb{R}^{N \times M} }[/math] Prewhitened matrix, where each column represents an [math]\displaystyle{ N }[/math]-dimensional sample, where [math]\displaystyle{ C \lt = N }[/math]
- Output: [math]\displaystyle{ \mathbf{W} \in \mathbb{R}^{N \times C} }[/math] Un-mixing matrix where each column projects [math]\displaystyle{ \mathbf{X} }[/math] onto independent component.
- Output: [math]\displaystyle{ \mathbf{S} \in \mathbb{R}^{C \times M} }[/math] Independent components matrix, with [math]\displaystyle{ M }[/math] columns representing a sample with [math]\displaystyle{ C }[/math] dimensions.
for p in 1 to C: [math]\displaystyle{ \mathbf{w_p} \leftarrow }[/math] Random vector of length N while [math]\displaystyle{ \mathbf{w_p} }[/math] changes [math]\displaystyle{ \mathbf{w_p} \leftarrow \frac{1}{M}\mathbf{X} g(\mathbf{w_p}^T \mathbf{X})^T - \frac{1}{M}g'(\mathbf{w_p}^T\mathbf{X})\mathbf{1_{M}} \mathbf{w_p} }[/math] [math]\displaystyle{ \mathbf{w_p} \leftarrow \mathbf{w_p} - \sum_{j = 1}^{p-1} (\mathbf{w_p}^T\mathbf{w_j})\mathbf{w_j} }[/math] [math]\displaystyle{ \mathbf{w_p} \leftarrow \frac{\mathbf{w_p}}{\|\mathbf{w_p}\|} }[/math]
output [math]\displaystyle{ \mathbf{W} \leftarrow \begin{bmatrix} \mathbf{w_1}, \dots, \mathbf{w_C} \end{bmatrix} }[/math]
output [math]\displaystyle{ \mathbf{S} \leftarrow \mathbf{W^T}\mathbf{X} }[/math]
See also
- Unsupervised learning
- Machine learning
- The IT++ library features a FastICA implementation in C++
- Infomax
References
- ↑ 1.0 1.1 Hyvärinen, A.; Oja, E. (2000). "Independent component analysis: Algorithms and applications". Neural Networks 13 (4–5): 411–430. doi:10.1016/S0893-6080(00)00026-5. PMID 10946390. http://www.cs.helsinki.fi/u/ahyvarin/papers/NN00new.pdf.
- ↑ Hyvarinen, A. (1999). "Fast and robust fixed-point algorithms for independent component analysis". IEEE Transactions on Neural Networks 10 (3): 626–634. doi:10.1109/72.761722. PMID 18252563. http://www.cs.helsinki.fi/u/ahyvarin/papers/TNN99new.pdf.
External links
- FastICA in Python
- FastICA package for Matlab or Octave
- fastICA package in R programming language
- FastICA in Java on SourceForge
- FastICA in Java in RapidMiner.
- FastICA in Matlab
- FastICA in MDP
- FastICA in Julia
Original source: https://en.wikipedia.org/wiki/FastICA.
Read more |