Multiway data analysis

From HandWiki
Short description: Method of analyzing large data sets

Multiway data analysis is a method of analyzing large data sets by representing a collection of observations as a multiway array, [math]\displaystyle{ {\mathcal A}\in{\mathbb C}^{I_0\times I_1\times \dots I_c\times \dots I_C} }[/math]. The proper choice of data organization into (C+1)-way array, and analysis techniques can reveal patterns in the underlying data undetected by other methods.[1]

History

The study of multiway data analysis was first formalized as the result of a conference held in 1988. The result of this conference was the first text specifically addressed to this field, Coppi and Bolasco's Multiway Data Analysis.[1] At that time, the application areas for multiway analysis included statistics, econometrics and psychometrics. In recent years, applications have expanded to include chemometrics, agriculture, social network analysis and the food industry.[2]

Composition of multiway data analysis

Multiway data

Multiway data analysts use the term way to refer to the number sources of data variation while reserving the word mode for the methods or models used to analyze the data.[3]:xviii

In this sense, we can define the various ways of data to analyze:

  • One way data: A data point with [math]\displaystyle{ I_0 }[/math]-dimensions, [math]\displaystyle{ {\bf a}\in {\mathbb C}^{I_0} }[/math] is a vector or data point that is stored in a one-way array data structure.
  • Two-way data: A collection of [math]\displaystyle{ I_1 }[/math] data points [math]\displaystyle{ {\bf a}\in {\mathbb C}^{I_0} }[/math] is stored in a two-way array, [math]\displaystyle{ {\bf A}\in {\mathbb C}^{I_0\times I_1} }[/math]. A spreadsheet can be used to visualize such data in the case of discrete dimensions.
  • Three-way data: A collection of data [math]\displaystyle{ {\bf a}\in {\mathbb C}^{I_0} }[/math] that has two modes of variation is stored in a three-way array, [math]\displaystyle{ {\bf A}\in {\mathbb C}^{I_0\times I_1\times I_2} }[/math]. Such data might represent the temperature at different locations (two-way data) sampled over different times (leading to three-way data)
  • Four-way data, using the same spreadsheet analogy, can be represented as a file folder full of separate workbooks.
  • Five-way data and six-way data can be represented by similarly higher levels of data aggregation.

In general, a multiway data is stored in a multiway array and may be measured at different times, or in different places, using different methodologies, and may contain inconsistencies such as missing data or discrepancies in data representation.

Multiway model

Multiway application

Multiway data analysis can be employed in various multiway applications so as to address the problem of finding hidden multilinear structure in multiway datasets. Following are examples of applications in different fields:[4]

Multiway processing

Multiway processing is the execution of designed and determined multiway model(s) transforming multiway data to the desirable level by addressing the specific need of particular multiway application. A typical example of data generated with a potentiometric electronic tongue illustrates relevant multiway processing.[9]

See also

References

  1. 1.0 1.1 Coppi, R.; Bolasco, S., eds (1989). Multiway Data Analysis. Amsterdam: North-Holland. ISBN 9780444874108. 
  2. Bro, Rasmus (20 November 1998). Multi-way Analysis in the Food Industry: Models, Algorithms, and Applications (PDF) (Ph.D. thesis). University of Amsterdam.
  3. Kroonenberg, Pieter M. (2008). Applied Multiway Data Analysis. Wiley Series in Probability and Statistics. 702. John Wiley & Sons. p. xv. ISBN 9780470237991. 
  4. Acar, Evrim; Yener, Bulent. Unsupervised Multiway Data Analysis: A Literature Survey (PDF) (Thesis). Rensselaer Polytechnic Institute.
  5. Vasilescu, M.A.O.; Terzopoulos, D. (2002). Multilinear Analysis of Image Ensembles: TensorFaces. Lecture Notes in Computer Science 2350; (Presented at Proc. 7th European Conference on Computer Vision (ECCV'02), Copenhagen, Denmark). Springer, Berlin, Heidelberg. doi:10.1007/3-540-47969-4_30. ISBN 978-3-540-43745-1. http://www.cs.toronto.edu/~maov/tensorfaces/Springer%20ECCV%202002_files/eccv02proceeding_23500447.pdf. 
  6. M.A.O. Vasilescu, D. Terzopoulos (2005) "Multilinear Independent Component Analysis", "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, June 2005, vol.1, 547–553."
  7. M.A.O. Vasilescu (2002) "Human Motion Signatures: Analysis, Synthesis, Recognition," Proceedings of International Conference on Pattern Recognition (ICPR 2002), Vol. 3, Quebec City, Canada, Aug, 2002, 456–460.
  8. Vasilescu, M.A.O.; Kim, Eric; Zeng, Xiao (2021). ""CausalX: Causal eXplanations and Block Multilinear Factor Analysis",". In the Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR 2020). Milan, Italy. pp. 10736-10743. http://www.cs.toronto.edu/~maov/tensorfaces/Springer%20ECCV%202002_files/eccv02proceeding_23500447.pdf. 
  9. Cartas, Raul; Mimendia, Aitor; Legin, Andrey; del Valle, Manel (2011). "Multiway Processing of Data Generated with a Potentiometric Electronic Tongue in a SIA System". Electroanalysis 23 (4): 953–961. doi:10.1002/elan.201000642.