FastICA

From HandWiki

FastICA is an efficient and popular algorithm for independent component analysis invented by Aapo Hyvärinen at Helsinki University of Technology.[1][2] Like most ICA algorithms, FastICA seeks an orthogonal rotation of prewhitened data, through a fixed-point iteration scheme, that maximizes a measure of non-Gaussianity of the rotated components. Non-gaussianity serves as a proxy for statistical independence, which is a very strong condition and requires infinite data to verify. FastICA can also be alternatively derived as an approximative Newton iteration.

Algorithm

Prewhitening the data

Let the [math]\displaystyle{ \mathbf{X} := (x_{ij}) \in \mathbb{R}^{N \times M} }[/math] denote the input data matrix, [math]\displaystyle{ M }[/math] the number of columns corresponding with the number of samples of mixed signals and [math]\displaystyle{ N }[/math] the number of rows corresponding with the number of independent source signals. The input data matrix [math]\displaystyle{ \mathbf{X} }[/math] must be prewhitened, or centered and whitened, before applying the FastICA algorithm to it.

  • Centering the data entails demeaning each component of the input data [math]\displaystyle{ \mathbf{X} }[/math], that is,
[math]\displaystyle{ x_{ij} \leftarrow x_{ij} - \frac{1}{M} \sum_{j^{\prime}} x_{ij^{\prime}} }[/math]
for each [math]\displaystyle{ i =1,\ldots,N }[/math] and [math]\displaystyle{ j = 1, \ldots, M }[/math]. After centering, each row of [math]\displaystyle{ \mathbf{X} }[/math] has an expected value of [math]\displaystyle{ 0 }[/math].
  • Whitening the data requires a linear transformation [math]\displaystyle{ \mathbf{L}: \mathbb{R}^{N \times M} \to \mathbb{R}^{N \times M} }[/math] of the centered data so that the components of [math]\displaystyle{ \mathbf{L}(\mathbf{X}) }[/math] are uncorrelated and have variance one. More precisely, if [math]\displaystyle{ \mathbf{X} }[/math] is a centered data matrix, the covariance of [math]\displaystyle{ \mathbf{L}_{\mathbf{x}} := \mathbf{L}(\mathbf{X}) }[/math] is the [math]\displaystyle{ (N \times N) }[/math]-dimensional identity matrix, that is,
[math]\displaystyle{ \mathrm{E}\left \{ \mathbf{L}_{\mathbf{x}} \mathbf{L}_{\mathbf{x}}^{T} \right \} = \mathbf{I}_N }[/math]
A common method for whitening is by performing an eigenvalue decomposition on the covariance matrix of the centered data [math]\displaystyle{ \mathbf{X} }[/math], [math]\displaystyle{ E\left \{ \mathbf{X} \mathbf{X}^{T} \right \} = \mathbf{E}\mathbf{D}\mathbf{E}^T }[/math], where [math]\displaystyle{ \mathbf{E} }[/math] is the matrix of eigenvectors and [math]\displaystyle{ \mathbf{D} }[/math] is the diagonal matrix of eigenvalues. The whitened data matrix is defined thus by
[math]\displaystyle{ \mathbf{X} \leftarrow \mathbf{D}^{-1/2}\mathbf{E}^T\mathbf{X}. }[/math]

Single component extraction

The iterative algorithm finds the direction for the weight vector [math]\displaystyle{ \mathbf{w} \in \mathbb{R}^N }[/math] that maximizes a measure of non-Gaussianity of the projection [math]\displaystyle{ \mathbf{w}^T \mathbf{X} }[/math], with [math]\displaystyle{ \mathbf{X} \in \mathbb{R}^{N \times M} }[/math] denoting a prewhitened data matrix as described above. Note that [math]\displaystyle{ \mathbf{w} }[/math] is a column vector. To measure non-Gaussianity, FastICA relies on a nonquadratic nonlinear function [math]\displaystyle{ f(u) }[/math], its first derivative [math]\displaystyle{ g(u) }[/math], and its second derivative [math]\displaystyle{ g^{\prime}(u) }[/math]. Hyvärinen states that the functions

[math]\displaystyle{ f(u) = \log \cosh (u), \quad g(u) = \tanh (u), \quad \text{and} \quad {g}'(u) = 1-\tanh^2(u), }[/math]

are useful for general purposes, while

[math]\displaystyle{ f(u) = -e^{-u^2/2}, \quad g(u) = u e^{-u^2/2}, \quad \text{and} \quad {g}'(u) = (1-u^2) e^{-u^2/2} }[/math]

may be highly robust.[1] The steps for extracting the weight vector [math]\displaystyle{ \mathbf{w} }[/math] for single component in FastICA are the following:

  1. Randomize the initial weight vector [math]\displaystyle{ \mathbf{w} }[/math]
  2. Let [math]\displaystyle{ \mathbf{w}^+ \leftarrow E\left\{\mathbf{X} g(\mathbf{w}^T \mathbf{X})^T\right\} - E\left\{g'(\mathbf{w}^T \mathbf{X})\right\}\mathbf{w} }[/math], where [math]\displaystyle{ E\left\{...\right\} }[/math] means averaging over all column-vectors of matrix [math]\displaystyle{ \mathbf{X} }[/math]
  3. Let [math]\displaystyle{ \mathbf{w} \leftarrow \mathbf{w}^+ / \|\mathbf{w}^+\| }[/math]
  4. If not converged, go back to 2

Multiple component extraction

The single unit iterative algorithm estimates only one weight vector which extracts a single component. Estimating additional components that are mutually "independent" requires repeating the algorithm to obtain linearly independent projection vectors - note that the notion of independence here refers to maximizing non-Gaussianity in the estimated components. Hyvärinen provides several ways of extracting multiple components with the simplest being the following. Here, [math]\displaystyle{ \mathbf{1_{M}} }[/math] is a column vector of 1's of dimension [math]\displaystyle{ M }[/math].

Algorithm FastICA

Input: [math]\displaystyle{ C }[/math] Number of desired components
Input: [math]\displaystyle{ \mathbf{X} \in \mathbb{R}^{N \times M} }[/math] Prewhitened matrix, where each column represents an [math]\displaystyle{ N }[/math]-dimensional sample, where [math]\displaystyle{ C \lt = N }[/math]
Output: [math]\displaystyle{ \mathbf{W} \in \mathbb{R}^{N \times C} }[/math] Un-mixing matrix where each column projects [math]\displaystyle{ \mathbf{X} }[/math] onto independent component.
Output: [math]\displaystyle{ \mathbf{S} \in \mathbb{R}^{C \times M} }[/math] Independent components matrix, with [math]\displaystyle{ M }[/math] columns representing a sample with [math]\displaystyle{ C }[/math] dimensions.
 for p in 1 to C:
    [math]\displaystyle{ \mathbf{w_p} \leftarrow }[/math] Random vector of length N
    while [math]\displaystyle{ \mathbf{w_p} }[/math] changes
        [math]\displaystyle{ \mathbf{w_p} \leftarrow \frac{1}{M}\mathbf{X} g(\mathbf{w_p}^T \mathbf{X})^T - \frac{1}{M}g'(\mathbf{w_p}^T\mathbf{X})\mathbf{1_{M}} \mathbf{w_p} }[/math]
        [math]\displaystyle{ \mathbf{w_p} \leftarrow \mathbf{w_p} - \sum_{j = 1}^{p-1} (\mathbf{w_p}^T\mathbf{w_j})\mathbf{w_j} }[/math]
        [math]\displaystyle{ \mathbf{w_p} \leftarrow \frac{\mathbf{w_p}}{\|\mathbf{w_p}\|} }[/math]
output [math]\displaystyle{ \mathbf{W} \leftarrow \begin{bmatrix} \mathbf{w_1}, \dots, \mathbf{w_C} \end{bmatrix} }[/math]
output [math]\displaystyle{ \mathbf{S} \leftarrow \mathbf{W^T}\mathbf{X} }[/math]

See also

References

External links