Dimensionality reduction

Short description: Process of reducing the number of random variables under consideration

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.^[1]

Methods are commonly divided into linear and nonlinear approaches.^[1] Linear approaches can be further divided into feature selection and feature extraction.^[2] Dimensionality reduction can be used for noise reduction, data visualization, cluster analysis, or as an intermediate step to facilitate other analyses.

Feature selection

The process of feature selection aims to find a suitable subset of the input variables (features, or attributes) for the task at hand. The three strategies are: the filter strategy (e.g., information gain), the wrapper strategy (e.g., accuracy-guided search), and the embedded strategy (features are added or removed while building the model based on prediction errors).

Data analysis such as regression or classification can be done in the reduced space more accurately than in the original space.^[3]

Feature projection

Feature projection (also called feature extraction) transforms the data from the high-dimensional space to a space of fewer dimensions. The data transformation may be linear, as in principal component analysis (PCA), but many nonlinear dimensionality reduction techniques also exist.^[4]^[5] For multidimensional data, tensor representation can be used in dimensionality reduction through multilinear subspace learning.^[6]

Principal component analysis (PCA)

Non-negative matrix factorization (NMF)

NMF decomposes a non-negative matrix to the product of two non-negative ones, which has been a promising tool in fields where only non-negative signals exist,^[7]^[8] such as astronomy.^[9]^[10] NMF is well known since the multiplicative update rule by Lee & Seung,^[7] which has been continuously developed: the inclusion of uncertainties,^[9] the consideration of missing data and parallel computation,^[11] sequential construction^[11] which leads to the stability and linearity of NMF,^[10] as well as other updates including handling missing data in digital image processing.^[12]

With a stable component basis during construction, and a linear modeling process, sequential NMF^[11] is able to preserve the flux in direct imaging of circumstellar structures in astronomy,^[10] as one of the methods of detecting exoplanets, especially for the direct imaging of circumstellar discs. In comparison with PCA, NMF does not remove the mean of the matrices, which leads to physical non-negative fluxes; therefore NMF is able to preserve more information than PCA as demonstrated by Ren et al.^[10]

Kernel PCA

Principal component analysis can be employed in a nonlinear way by means of the kernel trick. The resulting technique is capable of constructing nonlinear mappings that maximize the variance in the data. The resulting technique is called kernel PCA.

Graph-based kernel PCA

Other prominent nonlinear techniques include manifold learning techniques such as Isomap, locally linear embedding (LLE),^[13] Hessian LLE, Laplacian eigenmaps, and methods based on tangent space analysis.^[14] These techniques assume that the high-dimensional input data lies near a low-dimensional manifold embedded in the ambient space, and construct a low-dimensional representation using a cost function that retains local properties of the data; they can be viewed as defining a graph-based kernel for Kernel PCA.^[15]

More recently, techniques have been proposed that, instead of defining a fixed kernel, try to learn the kernel using semidefinite programming. The most prominent example of such a technique is maximum variance unfolding (MVU). The central idea of MVU is to exactly preserve all pairwise distances between nearest neighbors (in the inner product space) while maximizing the distances between points that are not nearest neighbors.

An alternative approach to neighborhood preservation is through the minimization of a cost function that measures differences between distances in the input and output spaces. Important examples of such techniques include: classical multidimensional scaling, which is identical to PCA; Isomap, which uses geodesic distances in the data space; diffusion maps, which use diffusion distances in the data space; t-distributed stochastic neighbor embedding (t-SNE), which minimizes the divergence between distributions over pairs of points; and curvilinear component analysis.

A different approach to nonlinear dimensionality reduction is through the use of autoencoders, a special kind of feedforward neural networks with a bottleneck hidden layer.^[16] The training of deep encoders is typically performed using a greedy layer-wise pre-training (e.g., using a stack of restricted Boltzmann machines) that is followed by a finetuning stage based on backpropagation.

Linear discriminant analysis (LDA)

Linear discriminant analysis (LDA) is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events.

Generalized discriminant analysis (GDA)

GDA deals with nonlinear discriminant analysis using kernel function operator. The underlying theory is close to the support-vector machines (SVM) insofar as the GDA method provides a mapping of the input vectors into high-dimensional feature space.^[17]^[18] Similar to LDA, the objective of GDA is to find a projection for the features into a lower dimensional space by maximizing the ratio of between-class scatter to within-class scatter.

Autoencoder

Autoencoders can be used to learn nonlinear dimension reduction functions and codings together with an inverse function from the coding to the original representation.

t-SNE

T-distributed Stochastic Neighbor Embedding (t-SNE) is a nonlinear dimensionality reduction technique useful for the visualization of high-dimensional datasets. It is not recommended for use in analysis such as clustering or outlier detection since it does not necessarily preserve densities or distances well.^[19]

UMAP

Uniform manifold approximation and projection (UMAP) is a nonlinear dimensionality reduction technique. Visually, it is similar to t-SNE, but it assumes that the data is uniformly distributed on a locally connected Riemannian manifold and that the Riemannian metric is locally constant or approximately locally constant.

Dimension reduction

For high-dimensional datasets, dimension reduction is usually performed prior to applying a k-nearest neighbors (k-NN) algorithm in order to mitigate the curse of dimensionality.^[20]

Feature extraction and dimension reduction can be combined in one step, using principal component analysis (PCA), linear discriminant analysis (LDA), canonical correlation analysis (CCA), or non-negative matrix factorization (NMF) techniques to pre-process the data, followed by clustering via k-NN on feature vectors in a reduced-dimension space. In machine learning, this process is also called low-dimensional embedding.^[21]

For high-dimensional datasets (e.g., when performing similarity search on live video streams, DNA data, or high-dimensional time series), running a fast approximate k-NN search using locality-sensitive hashing, random projection,^[22] "sketches",^[23] or other high-dimensional similarity search techniques from the VLDB conference toolbox^{[clarification needed]} may be the only feasible option.

Applications

A dimensionality reduction technique that is sometimes used in neuroscience is maximally informative dimensions,^[24] which finds a lower-dimensional representation of a dataset such that as much information as possible about the original data is preserved.

Notes

↑ ^1.0 ^1.1 van der Maaten, Laurens; Postma, Eric; van den Herik, Jaap (October 26, 2009). "Dimensionality Reduction: A Comparative Review". J Mach Learn Res 10: 66–71. https://members.loria.fr/moberger/Enseignement/AVR/Exposes/TR_Dimensiereductie.pdf.
↑ Pudil, P.; Novovičová, J. (1998). "Novel Methods for Feature Subset Selection with Respect to Problem Knowledge". in Liu, Huan; Motoda, Hiroshi. Feature Extraction, Construction and Selection. pp. 101. doi:10.1007/978-1-4615-5725-8_7. ISBN 978-1-4613-7622-4.
↑ Rico-Sulayes, Antonio (2017). "Reducing Vector Space Dimensionality in Automatic Classification for Authorship Attribution". Revista Ingeniería Electrónica, Automática y Comunicaciones 38 (3): 26–35. ISSN 1815-5928. https://rielac.cujae.edu.cu/index.php/rieac/article/view/478.
↑ Samet, H. (2006) Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann. ISBN 0-12-369446-9
↑ C. Ding, X. He, H. Zha, H.D. Simon, Adaptive Dimension Reduction for Clustering High Dimensional Data, Proceedings of International Conference on Data Mining, 2002
↑ Lu, Haiping; Plataniotis, K.N.; Venetsanopoulos, A.N. (2011). "A Survey of Multilinear Subspace Learning for Tensor Data". Pattern Recognition 44 (7): 1540–1551. doi:10.1016/j.patcog.2011.01.004. Bibcode: 2011PatRe..44.1540L. https://www.dsp.utoronto.ca/~haiping/Publication/SurveyMSL_PR2011.pdf.
↑ ^7.0 ^7.1 Daniel D. Lee; H. Sebastian Seung (1999). "Learning the parts of objects by non-negative matrix factorization". Nature 401 (6755): 788–791. doi:10.1038/44565. PMID 10548103. Bibcode: 1999Natur.401..788L.
↑ Daniel D. Lee; H. Sebastian Seung (2001). "Algorithms for Non-negative Matrix Factorization". Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference. MIT Press. pp. 556–562. https://proceedings.neurips.cc/paper/2000/file/f9d1152547c0bde01830b7e8bd60024c-Paper.pdf.
↑ ^9.0 ^9.1 Blanton, Michael R.; Roweis, Sam (2007). "K-corrections and filter transformations in the ultraviolet, optical, and near infrared". The Astronomical Journal 133 (2): 734–754. doi:10.1086/510127. Bibcode: 2007AJ....133..734B.
↑ ^10.0 ^10.1 ^10.2 ^10.3 Ren, Bin; Pueyo, Laurent; Zhu, Guangtun B.; Duchêne, Gaspard (2018). "Non-negative Matrix Factorization: Robust Extraction of Extended Structures". The Astrophysical Journal 852 (2): 104. doi:10.3847/1538-4357/aaa1f2. Bibcode: 2018ApJ...852..104R.
↑ ^11.0 ^11.1 ^11.2 Zhu, Guangtun B. (2016-12-19). "Nonnegative Matrix Factorization (NMF) with Heteroscedastic Uncertainties and Missing data". arXiv:1612.06037 [astro-ph.IM].
↑ Ren, Bin; Pueyo, Laurent; Chen, Christine; Choquet, Elodie; Debes, John H.; Duechene, Gaspard; Menard, Francois; Perrin, Marshall D. (2020). "Using Data Imputation for Signal Separation in High Contrast Imaging". The Astrophysical Journal 892 (2): 74. doi:10.3847/1538-4357/ab7024. Bibcode: 2020ApJ...892...74R.
↑ Roweis, S. T.; Saul, L. K. (2000). "Nonlinear Dimensionality Reduction by Locally Linear Embedding". Science 290 (5500): 2323–2326. doi:10.1126/science.290.5500.2323. PMID 11125150. Bibcode: 2000Sci...290.2323R.
↑ Zhang, Zhenyue; Zha, Hongyuan (2004). "Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment". SIAM Journal on Scientific Computing 26 (1): 313–338. doi:10.1137/s1064827502419154. Bibcode: 2004SJSC...26..313Z.
↑ Ham, Jihun; Lee, Daniel D.; Mika, Sebastian; Schölkopf, Bernhard (2004). "A kernel view of the dimensionality reduction of manifolds". pp. 47. doi:10.1145/1015330.1015417.
↑ Hongbing Hu, Stephen A. Zahorian, (2010) "Dimensionality Reduction Methods for HMM Phonetic Recognition", ICASSP 2010, Dallas, TX
↑ Baudat, G.; Anouar, F. (2000). "Generalized Discriminant Analysis Using a Kernel Approach". Neural Computation 12 (10): 2385–2404. doi:10.1162/089976600300014980. PMID 11032039.
↑ Haghighat, Mohammad; Zonouz, Saman; Abdel-Mottaleb, Mohamed (2015). "CloudID: Trustworthy cloud-based and cross-enterprise biometric identification". Expert Systems with Applications 42 (21): 7905–7916. doi:10.1016/j.eswa.2015.06.025.
↑ Schubert, Erich; Gertz, Michael (2017). "Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection". in Beecks, Christian; Borutta, Felix; Kröger, Peer et al. (in en). Similarity Search and Applications. Lecture Notes in Computer Science. 10609. Cham: Springer International Publishing. pp. 188–203. doi:10.1007/978-3-319-68474-1_13. ISBN 978-3-319-68474-1. https://link.springer.com/chapter/10.1007/978-3-319-68474-1_13.
↑ Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft (1999) "When is "nearest neighbor" meaningful?". Database Theory—ICDT99, 217–235
↑ Shaw, B.; Jebara, T. (2009). "Structure preserving embedding". Proceedings of the 26th Annual International Conference on Machine Learning – ICML '09. pp. 1. doi:10.1145/1553374.1553494. ISBN 9781605585161. http://www.cs.columbia.edu/~jebara/papers/spe-icml09.pdf.
↑ Bingham, E.; Mannila, H. (2001). "Random projection in dimensionality reduction". Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining – KDD '01. pp. 245. doi:10.1145/502512.502546. ISBN 978-1581133912.
↑ Shasha, D High (2004) Performance Discovery in Time Series Berlin: Springer. ISBN 0-387-00857-8
↑ Schütt, Heiko H. (2024-11-13), Bayesian Comparisons Between Representations

References

Boehmke, Brad; Greenwell, Brandon M. (2019). "Dimension Reduction". Hands-On Machine Learning with R. Chapman & Hall. pp. 343–396. ISBN 978-1-138-49568-5. https://books.google.com/books?id=aXC9DwAAQBAJ&pg=PA343.
Cunningham, P. (2007). Dimension Reduction (Technical report). University College Dublin. UCD-CSI-2007-7.
Fodor, I. (2002). A survey of dimension reduction techniques (Technical report). Center for Applied Scientific Computing, Lawrence Livermore National. UCRL-ID-148494.
Lakshmi Padmaja, Dhyaram; Vishnuvardhan, B (2016). "Comparative Study of Feature Subset Selection Methods for Dimensionality Reduction on Scientific Data". 2016 IEEE 6th International Conference on Advanced Computing (IACC). pp. 31–34. doi:10.1109/IACC.2016.16. ISBN 978-1-4673-8286-1.

External links

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Dimensionality reduction. Read more

[dr_review-1] 1.0 ^1.1 van der Maaten, Laurens; Postma, Eric; van den Herik, Jaap (October 26, 2009). "Dimensionality Reduction: A Comparative Review". J Mach Learn Res 10: 66–71. https://members.loria.fr/moberger/Enseignement/AVR/Exposes/TR_Dimensiereductie.pdf.

[2] Pudil, P.; Novovičová, J. (1998). "Novel Methods for Feature Subset Selection with Respect to Problem Knowledge". in Liu, Huan; Motoda, Hiroshi. Feature Extraction, Construction and Selection. pp. 101. doi:10.1007/978-1-4615-5725-8_7. ISBN 978-1-4613-7622-4.

[3] Rico-Sulayes, Antonio (2017). "Reducing Vector Space Dimensionality in Automatic Classification for Authorship Attribution". Revista Ingeniería Electrónica, Automática y Comunicaciones 38 (3): 26–35. ISSN 1815-5928. https://rielac.cujae.edu.cu/index.php/rieac/article/view/478.

[4] Samet, H. (2006) Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann. ISBN 0-12-369446-9

[5] C. Ding, X. He, H. Zha, H.D. Simon, Adaptive Dimension Reduction for Clustering High Dimensional Data, Proceedings of International Conference on Data Mining, 2002

[MSLsurvey-6] Lu, Haiping; Plataniotis, K.N.; Venetsanopoulos, A.N. (2011). "A Survey of Multilinear Subspace Learning for Tensor Data". Pattern Recognition 44 (7): 1540–1551. doi:10.1016/j.patcog.2011.01.004. Bibcode: 2011PatRe..44.1540L. https://www.dsp.utoronto.ca/~haiping/Publication/SurveyMSL_PR2011.pdf.

[lee-seung-7] 7.0 ^7.1 Daniel D. Lee; H. Sebastian Seung (1999). "Learning the parts of objects by non-negative matrix factorization". Nature 401 (6755): 788–791. doi:10.1038/44565. PMID 10548103. Bibcode: 1999Natur.401..788L.

[lee2001algorithms-8] Daniel D. Lee; H. Sebastian Seung (2001). "Algorithms for Non-negative Matrix Factorization". Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference. MIT Press. pp. 556–562. https://proceedings.neurips.cc/paper/2000/file/f9d1152547c0bde01830b7e8bd60024c-Paper.pdf.

[blantonRoweis07-9] 9.0 ^9.1 Blanton, Michael R.; Roweis, Sam (2007). "K-corrections and filter transformations in the ultraviolet, optical, and near infrared". The Astronomical Journal 133 (2): 734–754. doi:10.1086/510127. Bibcode: 2007AJ....133..734B.

[ren18-10] 10.0 ^10.1 ^10.2 ^10.3 Ren, Bin; Pueyo, Laurent; Zhu, Guangtun B.; Duchêne, Gaspard (2018). "Non-negative Matrix Factorization: Robust Extraction of Extended Structures". The Astrophysical Journal 852 (2): 104. doi:10.3847/1538-4357/aaa1f2. Bibcode: 2018ApJ...852..104R.

[zhu16-11] 11.0 ^11.1 ^11.2 Zhu, Guangtun B. (2016-12-19). "Nonnegative Matrix Factorization (NMF) with Heteroscedastic Uncertainties and Missing data". arXiv:1612.06037 [astro-ph.IM].

[ren20-12] Ren, Bin; Pueyo, Laurent; Chen, Christine; Choquet, Elodie; Debes, John H.; Duechene, Gaspard; Menard, Francois; Perrin, Marshall D. (2020). "Using Data Imputation for Signal Separation in High Contrast Imaging". The Astrophysical Journal 892 (2): 74. doi:10.3847/1538-4357/ab7024. Bibcode: 2020ApJ...892...74R.

[13] Roweis, S. T.; Saul, L. K. (2000). "Nonlinear Dimensionality Reduction by Locally Linear Embedding". Science 290 (5500): 2323–2326. doi:10.1126/science.290.5500.2323. PMID 11125150. Bibcode: 2000Sci...290.2323R.

[14] Zhang, Zhenyue; Zha, Hongyuan (2004). "Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment". SIAM Journal on Scientific Computing 26 (1): 313–338. doi:10.1137/s1064827502419154. Bibcode: 2004SJSC...26..313Z.

[15] Ham, Jihun; Lee, Daniel D.; Mika, Sebastian; Schölkopf, Bernhard (2004). "A kernel view of the dimensionality reduction of manifolds". pp. 47. doi:10.1145/1015330.1015417.

[16] Hongbing Hu, Stephen A. Zahorian, (2010) "Dimensionality Reduction Methods for HMM Phonetic Recognition", ICASSP 2010, Dallas, TX

[gda-17] Baudat, G.; Anouar, F. (2000). "Generalized Discriminant Analysis Using a Kernel Approach". Neural Computation 12 (10): 2385–2404. doi:10.1162/089976600300014980. PMID 11032039.

[cloudid-18] Haghighat, Mohammad; Zonouz, Saman; Abdel-Mottaleb, Mohamed (2015). "CloudID: Trustworthy cloud-based and cross-enterprise biometric identification". Expert Systems with Applications 42 (21): 7905–7916. doi:10.1016/j.eswa.2015.06.025.

[19] Schubert, Erich; Gertz, Michael (2017). "Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier Detection". in Beecks, Christian; Borutta, Felix; Kröger, Peer et al. (in en). Similarity Search and Applications. Lecture Notes in Computer Science. 10609. Cham: Springer International Publishing. pp. 188–203. doi:10.1007/978-3-319-68474-1_13. ISBN 978-3-319-68474-1. https://link.springer.com/chapter/10.1007/978-3-319-68474-1_13.

[20] Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft (1999) "When is "nearest neighbor" meaningful?". Database Theory—ICDT99, 217–235

[21] Shaw, B.; Jebara, T. (2009). "Structure preserving embedding". Proceedings of the 26th Annual International Conference on Machine Learning – ICML '09. pp. 1. doi:10.1145/1553374.1553494. ISBN 9781605585161. http://www.cs.columbia.edu/~jebara/papers/spe-icml09.pdf.

[22] Bingham, E.; Mannila, H. (2001). "Random projection in dimensionality reduction". Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining – KDD '01. pp. 245. doi:10.1145/502512.502546. ISBN 978-1581133912.

[23] Shasha, D High (2004) Performance Discovery in Time Series Berlin: Springer. ISBN 0-387-00857-8

[24] Schütt, Heiko H. (2024-11-13), Bayesian Comparisons Between Representations

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

Anonymous

Search

Dimensionality reduction

Namespaces

More

Page actions

Contents

Feature selection