Distance geometry problem

From HandWiki

The distance geometry problem is that of characterization and study of sets of points based only on given values of the distances between member pairs.[1][2][3] Therefore distance geometry has immediate relevance where distance values are determined or considered, such as biology,[4] sensor network,[5] surveying, cartography, and physics.

Applications

The distance geometry problem (DGP) is that of finding the coordinates of a set of points by using the distances between some pairs of such points.[3] There exists nowadays a large community that is actively working on this problem, because there are several real-life applications that can lead to the formulation of a DGP. As an example, an interesting application is that of locating sensors in telecommunication networks.[5] In such a case, the positions of some sensors are known (which are called anchors) and some of the distances between sensors (which can be anchors or not) are also known: the problem is to identify the positions in space for all sensors.

An interesting application arises in biology.[4][6] Experimental techniques are able to estimate distances between pairs of atoms of a given molecule, and the problem becomes the one of identifying the three-dimensional conformation of the molecule, i.e. the positions of all its atoms. In this field, the main interest is on proteins, because discovering their three-dimensional conformation allows us to get clues about the function they are able to perform. The implications in related fields, such as biomedicine and drug design, are evident. When dealing with biological molecules, the DGP is generally referred to as molecular DGP (MDGP).

In the following, even if the article considers in general the DGP, the MDGP will be used as an example.

Basic issues

A straight line is the shortest path between two points. Therefore the distance from A to B is no bigger than the length of the straight-line path from A to C plus the length of the straight-line path from C to B. This fact is called the triangle inequality. If that sum happens to be equal to the distance from A to B, then the three points A, B, and C lie on a straight line, with C between A and B.

Similarly, suppose one knows

  • the distance from A to B;
  • the distance from A to C;
  • the distance from A to D;
  • the distance from B to C;
  • the distance from B to D; and
  • the distance from C to D.

Knowing only these six numbers, one would like to figure out

  • whether A, B, C, and D lie on a common straight line;
  • whether A, B, and C lie on a common line but D is not on that line (and similarly for any of A, B, and C in the role of the one exceptional point);
  • whether all four points lie in a common plane (whether they are coplanar);
  • if they lie in a common plane, whether one of them is in the interior of the triangle formed by the other three, and if so, which one.

Distance geometry includes the solution of such problems.

Cayley–Menger determinants

Of particular utility and importance are classifications by means of Cayley–Menger determinants, named after Arthur Cayley and Karl Menger:

  • a set Λ (with at least three distinct elements) is called straight if and only if, for any three elements A, B, and C of Λ,
[math]\displaystyle{ \det \begin{bmatrix} 0 & d(AB)^2 & d(AC)^2 & 1 \\ d(AB)^2 & 0 & d(BC)^2 & 1 \\ d(AC)^2 & d(BC)^2 & 0 & 1 \\ 1 & 1 & 1 & 0 \end{bmatrix} = 0, }[/math]
  • a set Π (with at least four distinct elements) is called plane if and only if, for any four elements A, B, C and D of Π,
[math]\displaystyle{ \det \begin{bmatrix} 0 & d(AB)^2 & d(AC)^2 & d(AD)^2 & 1 \\ d(AB)^2 & 0 & d(BC)^2 & d(BD)^2 & 1 \\ d(AC)^2 & d(BC)^2 & 0 & d(CD)^2 & 1 \\ d(AD)^2 & d(BD)^2 & d(CD)^2 & 0 & 1 \\ 1 & 1 & 1 & 1 & 0 \end{bmatrix} = 0, }[/math]
but not all triples of elements of Π are straight to each other;
  • a set Φ (with at least five distinct elements) is called flat if and only if, for any five elements A, B, C, D and E of Φ,
[math]\displaystyle{ \det \begin{bmatrix} 0 & d(AB)^2 & d(AC)^2 & d(AD)^2 & d(AE)^2 & 1 \\ d(AB)^2 & 0 & d(BC)^2 & d(BD)^2 & d(BE)^2 & 1 \\ d(AC)^2 & d(BC)^2 & 0 & d(CD)^2 & d(CE)^2 & 1 \\ d(AD)^2 & d(BD)^2 & d(CD)^2 & 0 & d(DE)^2 & 1 \\ d(AE)^2 & d(BE)^2 & d(CE)^2 & d(DE)^2 & 0 & 1 \\ 1 & 1 & 1 & 1 & 1 & 0 \end{bmatrix} = 0, }[/math]
but not all quadruples of elements of Φ are plane to each other; and so on.

Discretization and orders

The DGP is, by definition, a constraint satisfaction problem. It is however generally reformulated as an optimization problem in a continuous space, and its solution is then attempted by using techniques for global optimization (see for example [7]).

Under certain assumptions, however, the problem can be discretized, in the sense that the search domain of the optimization problem can be reduced to a discrete domain. When all distances are supposed to be exact (no experimental errors), the search domain becomes a binary tree, where the candidate positions for the same atom of the molecule are given on a common layer of the tree.[8][9] The discretization allows us to enumerate the entire solution set (this is not possible in general when using global optimization methods).

The discretization assumptions[2] are strongly based on the order in which the atoms of the molecule are considered. When considering the atoms of the molecule in their natural ordering, such assumptions are generally not satisfied. An interesting and fundamental pre-processing step for the discretization of DGPs is, therefore, the problem of identifying an order for the atoms that allows for the discretization. This problem can be solved in polynomial time, when all distances are supposed to be exact, as well as when some available distance is represented by a suitable interval.[10]

Software for distance geometry

  • DGSOL. It is based on the idea of approximating the penalty function with a sequence of smoother functions converging to the original objective function. It is usually used for performing comparisons to other newly proposed techniques, whose code is often not released. DGSOL solves distance geometry problems where a lower and an upper bound on the distances are available.
  • MD-jeep. This software is based on the discretization of the distance geometry problem. A Branch & Prune algorithm is implemented for the solution of the problem.
  • Xplor-NIH. It has been particularly designed for solving instances of the problem in which the data come from NMR experiments, and it includes different functionalities. In particular, for the solution of the distance geometry problems, it makes use of heuristic methods (such as Simulated Annealing) and local search methods (such as Conjugate Gradient Minimization).
  • TINKER. This is a package for molecular modeling and design. It includes many force fields for attempting the prediction of protein conformations from their chemical structure only. One of its functionalities, however, is to solve distance geometry problems.
  • SNLSDPclique. This is a MATLAB code for solving the Sensor Network Localization problem using the semidefinite facial reduction method.

Books and conferences

Crippen and Havel are two pioneers of DGP, and they co-authored the book "Distance Geometry and Molecular Conformation",[4] 1988. Much more recently, an edited book, collecting the most recent efforts from the scientific community for solving the DGP, was published by Springer.[3] See this web page for the list of contributions.

Various conferences and workshops are held every year, where the focus is on DGP-related topics. However, the very first workshop completely devoted to DGP and its applications was held in 2013 in Manaus, Brazil: DGA13.

See also

References

  1. Yemini, Y. (1978). "The positioning problem — a draft of an intermediate summary". Conference on Distributed Sensor Networks, Pittsburgh. 
  2. 2.0 2.1 Liberti, L.; Lavor, C.; Maculan, N.; Mucherino, A.. "Euclidean Distance Geometry and Applications". SIAM Review, to appear. 
  3. 3.0 3.1 3.2 Mucherino, A.; Lavor, C.; Liberti, L.; Maculan, N. (2013). Distance Geometry: Theory, Methods and Applications. 
  4. 4.0 4.1 4.2 Crippen, G.M.; Havel, T.F. (1988). "Distance Geometry and Molecular Conformation". John Wiley & Sons. 
  5. 5.0 5.1 Biswas, P.; Lian, T.; Wang, T.; Ye, Y. (2006). "Semidefinite programming based algorithms for sensor network localization". ACM Transactions in Sensor Networks 2: 188–220. doi:10.1145/1149283.1149286. 
  6. Blumenthal, L.M. (1970). Theory and applications of distance geometry (2nd ed.). Bronx, New York: Chelsea Publishing Company. pp. 347. LCCN 79113117. ISBN 0-8284-0242-6. 
  7. More, J.J.; Wu, Z. (1999). "Distance Geometry Optimization for Protein Structures". Journal of Global Optimization 15: 219–223. 
  8. Liberti, L.; Lavor, C.; Maculan, N. (2008). "A Branch-and-Prune Algorithm for the Molecular Distance Geometry Problem". International Transactions in Operational Research 15: 1–17. doi:10.1111/j.1475-3995.2007.00622.x. 
  9. Mucherino, A.; Liberti, L.; Lavor, C.; Maculan, N. (2009). "Comparisons between an Exact and a MetaHeuristic Algorithm for the Molecular Distance Geometry Problem". ACM Conference Proceedings, Genetic and Evolutionary Computation Conference (GECCO09): 333–340. 
  10. Mucherino, A. (2013). "On the Identification of Discretization Orders for Distance Geometry with Intervals". Lecture Notes in Computer Science 8085: 231–238. doi:10.1007/978-3-642-40020-9_24.