Stress majorization

From HandWiki
Short description: Geometric placement based on ideal distances

Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of [math]\displaystyle{ n }[/math] [math]\displaystyle{ m }[/math]-dimensional data items, a configuration [math]\displaystyle{ X }[/math] of [math]\displaystyle{ n }[/math] points in [math]\displaystyle{ r }[/math] [math]\displaystyle{ (\ll m) }[/math]-dimensional space is sought that minimizes the so-called stress function [math]\displaystyle{ \sigma(X) }[/math]. Usually [math]\displaystyle{ r }[/math] is [math]\displaystyle{ 2 }[/math] or [math]\displaystyle{ 3 }[/math], i.e. the [math]\displaystyle{ (n\times r) }[/math] matrix [math]\displaystyle{ X }[/math] lists points in [math]\displaystyle{ 2- }[/math] or [math]\displaystyle{ 3- }[/math]dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function [math]\displaystyle{ \sigma }[/math] is a cost or loss function that measures the squared differences between ideal ([math]\displaystyle{ m }[/math]-dimensional) distances and actual distances in r-dimensional space. It is defined as:

[math]\displaystyle{ \sigma(X)=\sum_{i\lt j\le n}w_{ij}(d_{ij}(X)-\delta_{ij})^2 }[/math]

where [math]\displaystyle{ w_{ij}\ge 0 }[/math] is a weight for the measurement between a pair of points [math]\displaystyle{ (i,j) }[/math], [math]\displaystyle{ d_{ij}(X) }[/math] is the euclidean distance between [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math] and [math]\displaystyle{ \delta_{ij} }[/math] is the ideal distance between the points (their separation) in the [math]\displaystyle{ m }[/math]-dimensional data space. Note that [math]\displaystyle{ w_{ij} }[/math] can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).

A configuration [math]\displaystyle{ X }[/math] which minimizes [math]\displaystyle{ \sigma(X) }[/math] gives a plot in which points that are close together correspond to points that are also close together in the original [math]\displaystyle{ m }[/math]-dimensional data space.

There are many ways that [math]\displaystyle{ \sigma(X) }[/math] could be minimized. For example, Kruskal[1] recommended an iterative steepest descent approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimizing stress was introduced by Jan de Leeuw.[2] De Leeuw's iterative majorization method at each step minimizes a simple convex function which both bounds [math]\displaystyle{ \sigma }[/math] from above and touches the surface of [math]\displaystyle{ \sigma }[/math] at a point [math]\displaystyle{ Z }[/math], called the supporting point. In convex analysis such a function is called a majorizing function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by MAjorizing a COmplicated Function").

The SMACOF algorithm

The stress function [math]\displaystyle{ \sigma }[/math] can be expanded as follows:

[math]\displaystyle{ \sigma(X)=\sum_{i\lt j\le n}w_{ij}(d_{ij}(X)-\delta_{ij})^2 =\sum_{i\lt j}w_{ij}\delta_{ij}^2 + \sum_{i\lt j}w_{ij}d_{ij}^2(X)-2\sum_{i\lt j}w_{ij}\delta_{ij}d_{ij}(X) }[/math]

Note that the first term is a constant [math]\displaystyle{ C }[/math] and the second term is quadratic in [math]\displaystyle{ X }[/math] (i.e. for the Hessian matrix [math]\displaystyle{ V }[/math] the second term is equivalent to tr[math]\displaystyle{ X'VX }[/math]) and therefore relatively easily solved. The third term is bounded by:

[math]\displaystyle{ \sum_{i\lt j}w_{ij}\delta_{ij}d_{ij}(X)=\,\operatorname{tr}\, X'B(X)X \ge \,\operatorname{tr}\, X'B(Z)Z }[/math]

where [math]\displaystyle{ B(Z) }[/math] has:

[math]\displaystyle{ b_{ij}=-\frac{w_{ij}\delta_{ij}}{d_{ij}(Z)} }[/math] for [math]\displaystyle{ d_{ij}(Z)\ne 0, i \ne j }[/math]

and [math]\displaystyle{ b_{ij}=0 }[/math] for [math]\displaystyle{ d_{ij}(Z)=0, i\ne j }[/math]

and [math]\displaystyle{ b_{ii}=-\sum_{j=1,j\ne i}^n b_{ij} }[/math].

Proof of this inequality is by the Cauchy-Schwarz inequality, see Borg[3] (pp. 152–153).

Thus, we have a simple quadratic function [math]\displaystyle{ \tau(X,Z) }[/math] that majorizes stress:

[math]\displaystyle{ \sigma(X)=C+\,\operatorname{tr}\, X'VX - 2 \,\operatorname{tr}\, X'B(X)X }[/math]
[math]\displaystyle{ \le C+\,\operatorname{tr}\, X' V X - 2 \,\operatorname{tr}\, X'B(Z)Z = \tau(X,Z) }[/math]


The iterative minimization procedure is then:

  • at the [math]\displaystyle{ k^{th} }[/math] step we set [math]\displaystyle{ Z\leftarrow X^{k-1} }[/math]
  • [math]\displaystyle{ X^k\leftarrow \min_X \tau(X,Z) }[/math]
  • stop if [math]\displaystyle{ \sigma(X^{k-1})-\sigma(X^{k})\lt \epsilon }[/math] otherwise repeat.

This algorithm has been shown to decrease stress monotonically (see de Leeuw[2]).

Use in graph drawing

Stress majorization and algorithms similar to SMACOF also have application in the field of graph drawing.[4][5] That is, one can find a reasonably aesthetically appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the [math]\displaystyle{ \delta_{ij} }[/math] are usually set to the graph-theoretic distances between nodes [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math] and the weights [math]\displaystyle{ w_{ij} }[/math] are taken to be [math]\displaystyle{ \delta_{ij}^{-\alpha} }[/math]. Here, [math]\displaystyle{ \alpha }[/math] is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for [math]\displaystyle{ \alpha=2 }[/math].[6]

References

  1. Kruskal, J. B. (1964), "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis", Psychometrika 29 (1): 1–27, doi:10.1007/BF02289565 .
  2. 2.0 2.1 de Leeuw, J. (1977), "Applications of convex analysis to multidimensional scaling", in Barra, J. R.; Brodeau, F.; Romie, G. et al., Recent developments in statistics, pp. 133–145 .
  3. Borg, I.; Groenen, P. (1997), Modern Multidimensional Scaling: theory and applications, New York: Springer-Verlag .
  4. Michailidis, G.; de Leeuw, J. (2001), "Data visualization through graph drawing", Computation Stat. 16 (3): 435–450, doi:10.1007/s001800100077 .
  5. Gansner, E.; Koren, Y.; North, S. (2004), "Graph Drawing by Stress Majorization", Proceedings of 12th Int. Symp. Graph Drawing (GD'04), Lecture Notes in Computer Science, 3383, Springer-Verlag, pp. 239–250 .
  6. Cohen, J. (1997), "Drawing graphs to convey proximity: an incremental arrangement method", ACM Transactions on Computer-Human Interaction 4 (3): 197–229, doi:10.1145/264645.264657 .