Parallel coordinates

From HandWiki
Short description: Chart displaying multivariate data
Parallel coordinates
Ggobi-flea2

Parallel coordinates are a common way of visualizing and analyzing high-dimensional datasets.

To show a set of points in an n-dimensional space, a backdrop is drawn consisting of n parallel lines, typically vertical and equally spaced. A point in n-dimensional space is represented as a polyline with vertices on the parallel axes; the position of the vertex on the i-th axis corresponds to the i-th coordinate of the point.

This visualization is closely related to time series visualization, except that it is applied to data where the axes do not correspond to points in time, and therefore do not have a natural order. Therefore, different axis arrangements may be of interest.

History

The idea of parallel coordinates is often said to originate in 1885 with the French mathematician Philbert Maurice d'Ocagne[1] who sought a way to provide graphic calculation of mathematical functions with alignment diagrams or nomograms using parallel axes with different scales. A three-variable equation, for example, could be solved using three parallel axes, where known values could be marked on their scales, a line drawn between them, and an unknown read on its scale at the point where the line intersects that scale.

The use of parallel coordinates as a visualization technique to show data is also often said to have originated earlier with Henry Gannett in work preceding the Statistical Atlas of the United States for the 1890 Census, for example his "General Summary, Showing the Rank of States, by Ratios, 1880", [2] that shows the rank of 10 measures (population, occupations, wealth, manufacturing, agriculture, and so forth) on parallel axes connected by lines for each state.

However, both d'Ocagne and Gannet were far preceded in this by André-Michel Guerry,[3] Plate IV, "Influence de l'Age", where he showed rankings of crimes against persons by age along parallel axes, connecting the same crime across age groups.[4]

They were popularised again 87 years later by Alfred Inselberg[5] in 1985 and systematically developed as a coordinate system starting from 1977. Some important applications are in collision avoidance algorithms for air traffic control (1987—3 USA patents), data mining (USA patent), computer vision (USA patent), Optimization, process control, more recently in intrusion detection and elsewhere.

Higher dimensions

On the plane with an xy cartesian coordinate system, adding more dimensions in parallel coordinates (often abbreviated ||-coords or PCP) involves adding more axes. The value of parallel coordinates is that certain geometrical properties in high dimensions transform into easily seen 2D patterns. For example, a set of points on a line in n-space transforms to a set of polylines in parallel coordinates all intersecting at n − 1 points. For n = 2 this yields a point-line duality pointing out why the mathematical foundations of parallel coordinates are developed in the projective rather than euclidean space. A pair of lines intersects at a unique point which has two coordinates and, therefore, can correspond to a unique line which is also specified by two parameters (or two points). By contrast, more than two points are required to specify a curve and also a pair of curves may not have a unique intersection. Hence by using curves in parallel coordinates instead of lines, the point line duality is lost together with all the other properties of projective geometry, and the known nice higher-dimensional patterns corresponding to (hyper)planes, curves, several smooth (hyper)surfaces, proximities, convexity and recently non-orientability.[6] The goal is to map n-dimensional relations into 2D patterns. Hence, parallel coordinates is not a point-to-point mapping but rather a nD subset to 2D subset mapping, there is no loss of information. Note: even a point in nD is not mapped into a point in 2D, but to a polygonal line—a subset of 2D.

Statistical considerations

Representative sample for parallel coordinates.

When used for statistical data visualisation there are three important considerations: the order, the rotation, and the scaling of the axes.

The order of the axes is critical for finding features, and in typical data analysis many reorderings will need to be tried. Some authors have come up with ordering heuristics which may create illuminating orderings.[7]

The rotation of the axes is a translation in the parallel coordinates and if the lines intersected outside the parallel axes it can be translated between them by rotations. The simplest example of this is rotating the axis by 180 degrees.[8]

Scaling is necessary because the plot is based on interpolation (linear combination) of consecutive pairs of variables.[8] Therefore, the variables must be in common scale, and there are many scaling methods to be considered as part of data preparation process that can reveal more informative views.

A smooth parallel coordinate plot is achieved with splines.[9] In the smooth plot, every observation is mapped into a parametric line (or curve), which is smooth, continuous on the axes, and orthogonal to each parallel axis. This design emphasizes the quantization level for each data attribute.[8]

Reading

Inselberg (Inselberg 1997) made a full review of how to visually read out parallel coords' relational patterns.[10] When most lines between two parallel axis are somewhat parallel to each other, it suggests a positive relationship between these two dimensions. When lines cross in a kind of superposition of X-shapes, it's a negative relationship. When lines cross randomly or are parallel, it shows there is no particular relationship.

Limitations

In parallel coordinates, each axis can have at most two neighboring axes (one on the left, and one on the right). For a d-dimensional data set, at most d-1 relationships can be shown at a time. In time series visualization, there exists a natural predecessor and successor; therefore in this special case, there exists a preferred arrangement. However, when the axes do not have a unique order, finding a good axis arrangement requires the use of heuristics and experimentation. In order to explore more complex relationships, axes must be reordered.

By arranging the axes in 3-dimensional space (however, still in parallel, like nails in a nail bed), an axis can have more than two neighbors in a circle around the central attribute, and the arrangement problem gets easier (for example by using a minimum spanning tree).[11] A prototype of this visualization is available as extension to the data mining software ELKI. However, the visualization is harder to interpret and interact with than a linear order.

Software

While there are a large number of papers about parallel coordinates, there are only few notable software publicly available to convert databases into parallel coordinates graphics.[12] Notable software are ELKI, GGobi, Mondrian, Orange and ROOT. Libraries include Protovis.js, D3.js provides basic examples. D3.Parcoords.js (a D3-based library) specifically dedicated to parallel coordinates graphic creation has also been published. The Python data structure and analysis library Pandas implements parallel coordinates plotting, using the plotting library matplotlib.[13]

Other visualizations for multivariate data

  • Radar chart – a visualization with coordinate axes arranged radially
  • Andrews plot – the Fourier transform of a parallel coordinates graph

References

  1. Ocagne, M. (1885). Coordonnées Parallèles et Axiales: Méthode de transformation géométrique et procédé nouveau de calcul graphique déduits de la considération des coordonnées parallèlles. Gauthier-Villars. https://archive.org/details/coordonnesparal00ocaggoog }}
  2. Gannett, Henry. General Summary Showing the Rank of States by Ratios 1880. https://www.davidrumsey.com/luna/servlet/detail/RUMSEY~8~1~32803~1152181. 
  3. Guerry, A.-M. (1833). Essai sur la Statistique Morale de la France. Paris: Crochard.
  4. Friendly, M. (2022). The life and works of André-Michel Guerry, revisited. Sociological Spectrum, 42(4-6), 233–259. https://doi.org/10.1080/02732173.2022.2078450
  5. Inselberg, Alfred (1985). "The Plane with Parallel Coordinates". Visual Computer 1 (4): 69–91. doi:10.1007/BF01898350. 
  6. Inselberg, Alfred (2009). Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications. Springer. ISBN 978-0387215075. 
  7. Yang, Jing; Peng, Wei; Ward, Matthew O.; Rundensteiner, Elke A. (2003). "Interactive Hierarchical Dimension Ordering Spacing and Filtering for Exploration of High Dimensional Datasets". IEEE Symposium on Information Visualization (INFOVIS 2003): 3–4. http://davis.wpi.edu/~xmdv/docs/tr0313_osf.pdf. 
  8. 8.0 8.1 8.2 Moustafa, Rida; Wegman, Edward J. (2006). "Multivariate continuous data – Parallel Coordinates". Graphics of Large Datasets: Visualizing a Million. Springer. pp. 143–156. ISBN 978-0387329062. 
  9. Moustafa, Rida; Wegman, Edward J. (2002). "On Some Generalizations of Parallel Coordinate Plots". Seeing a Million, A Data Visualization Workshop, Rain Am Lech (Nr.), Germany. http://herakles.zcu.cz/seminars/docs/infovis/papers/Moustafa_generalized_parallel_coordinates.pdf. 
  10. Inselberg, A. (1997), "Multidimensional detective", Information Visualization, 1997. Proceedings., IEEE Symposium on, pp. 100–107, doi:10.1109/INFVIS.1997.636793, ISBN 0-8186-8189-6 
  11. Elke Achtert, Hans-Peter Kriegel, Erich Schubert, Arthur Zimek (2013). "Interactive data mining with 3D-parallel-coordinate-trees". Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York City, NY. pp. 1009–1012. doi:10.1145/2463676.2463696. ISBN 9781450320375. 
  12. Kosara, Robert (2010). "Parallel Coordinates". http://eagereyes.org/techniques/parallel-coordinates. 
  13. Parallel Coordinates in Pandas

Further reading

  • Heinrich, Julian and Weiskopf, Daniel (2013) State of the Art of Parallel Coordinates, Eurographics 2013 - State of the Art Reports, pp. 95–116
  • Moustafa, Rida (2011) Parallel coordinate and parallel coordinate density plots, Wiley Interdisciplinary Reviews: Computational Statistics, Vol 3(2), pp. 134–148.
  • Weidele, Daniel Karl I. (2019) Conditional Parallel Coordinates, IEEE Visualization Conference (VIS) 2019, pp. 221–225

External links