Metric tree

From HandWiki

A metric tree is any tree data structure specialized to index data in metric spaces. Metric trees exploit properties of metric spaces such as the triangle inequality to make accesses to the data more efficient. Examples include the M-tree, vp-trees, cover trees, MVP trees, and BK-trees.[1]

Multidimensional search

Most algorithms and data structures for searching a dataset are based on the classical binary search algorithm, and generalizations such as the k-d tree or range tree work by interleaving the binary search algorithm over the separate coordinates and treating each spatial coordinate as an independent search constraint. These data structures are well-suited for range query problems asking for every point [math]\displaystyle{ (x,y) }[/math] that satisfies [math]\displaystyle{ \mbox{min}_x \leq x \leq \mbox{max}_x }[/math] and [math]\displaystyle{ \mbox{min}_y \leq y \leq \mbox{max}_y }[/math].

A limitation of these multidimensional search structures is that they are only defined for searching over objects that can be treated as vectors. They aren't applicable for the more general case in which the algorithm is given only a collection of objects and a function for measuring the distance or similarity between two objects. If, for example, someone were to create a function that returns a value indicating how similar one image is to another, a natural algorithmic problem would be to take a dataset of images and find the ones that are similar according to the function to a given query image.

Metric data structures

If there is no structure to the similarity measure then a brute force search requiring the comparison of the query image to every image in the dataset is the best that can be done[citation needed]. If, however, the similarity function satisfies the triangle inequality then it is possible to use the result of each comparison to prune the set of candidates to be examined.

The first article on metric trees, as well as the first use of the term "metric tree", published in the open literature was by Jeffrey Uhlmann in 1991.[2] Other researchers were working independently on similar data structures. In particular, Peter Yianilos claimed to have independently discovered the same method, which he called a vantage point tree (VP-tree).[3] The research on metric tree data structures blossomed in the late 1990s and included an examination by Google co-founder Sergey Brin of their use for very large databases.[4] The first textbook on metric data structures was published in 2006.[1]

Open source implementations

  • Matlab: Metric trees are implemented in the metricTree class that is part of the United States Naval Research Laboratory's free Tracker Component Library.[5]

References

  1. 1.0 1.1 Samet, Hanan (2006). Foundations of multidimensional and metric data structures. Morgan Kaufmann. ISBN 978-0-12-369446-1. https://books.google.com/books?id=vO-NRRKHG84C. 
  2. Uhlmann, Jeffrey (1991). "Satisfying General Proximity/Similarity Queries with Metric Trees". Information Processing Letters 40 (4): 175–179. doi:10.1016/0020-0190(91)90074-r. 
  3. Yianilos, Peter N. (1993). "Data structures and algorithms for nearest neighbor search in general metric spaces". Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 311–321. pny93. http://pnylab.com/papers/vptree/main.html. Retrieved 2019-03-07. 
  4. Brin, Sergey (1995). "Near Neighbor Search in Large Metric Spaces". http://www.vldb.org/conf/1995/P574.PDF. 
  5. "Tracker Component Library". Matlab Repository. https://github.com/USNavalResearchLaboratory/TrackerComponentLibrary.