Distributed tree search

From HandWiki

Distributed tree search (DTS) algorithm is a class of algorithms for searching values in an efficient and distributed manner. Their purpose is to iterate through a tree by working along multiple branches in parallel and merging the results of each branch into one common solution, in order to minimize time spent searching for a value in a tree-like data structure.

The original paper was written in 1988 by Chris Ferguson and Richard E. Korf, from the University of California's Computer Science Department. They used multiple other chess AIs to develop this wider range algorithm.

Overview

The Distributed Tree Search Algorithm (also known as Korf–Ferguson algorithm) was created to solve the following problem: "Given a tree with non-uniform branching factor and depth, search it in parallel with an arbitrary number of processors as fast as possible."

The top-level part of this algorithm is general and does not use a particular existing type of tree-search, but it can be easily specialized to fit any type of non-distributed tree-search.

DTS consists of using multiple processes, each with a node and a set of processors attached, with the goal of searching the sub-tree below the said node. Each process then divides itself into multiple coordinated sub-processes which recursively divide themselves again until an optimal way to search the tree has been found based on the number of processors available to each process. Once a process finishes, DTS dynamically reassigns the processors to other processes as to keep the efficiency to a maximum through good load-balancing, especially in irregular trees.

Once a process finishes searching, it recursively sends and merges a resulting signal to its parent-process, until all the different sub-answers have been merged and the entire problem has been solved.[1]

Applications

DTS is only applicable under two major conditions: the data structure to search through is a tree, and the algorithm can make use of at least one computation unit (Although it cannot be considered as distributed if there is only one).

One major example of the everyday use of DTS is network routing. The Internet can be seen as a tree of IP addresses, and an analogy to a routing protocol could be how post offices work in the real world. Since there are over 4.3 billion IP addresses currently, society heavily depends on the time the data takes to find its way to its destination. As such, IP-routing divides the work into multiple sub-units which each have different scales of calculation capabilities and use each other's result to find the route in a very efficient manner. This is an instance of DTS that affects over 43% of the world's population, for reasons going from entertainment to national security.[2]

Alternatives

Although DTS is currently one of the most widely used algorithms, many of its applications have alternatives to them which could potentially develop into more efficient, less resource-demanding solutions, were they more researched.

One of the more controversial examples is Big-Data processing. In applications like Google Search Engine, Facebook, YouTube, search needs to be optimized to keep waiting time inside a reasonable window. This could be achieved through the plain use of DTS, but other algorithms are used in place (for example data-hashing in SQL databases), or in conjunction (Facebook's Haystack algorithm groups parallel tree-search, data-hashing and memory-ordering/sorting).[3]

One of the more important limits of DTS is the fact that it requires a tree as input. Trees are a sub-instance of a data structure known as Graphs, which means every Graph can be converted into a tree. Although there currently exists no better way to search through trees than Korf-Ferguson's algorithm, each task has different particularities and in most cases, there will exist more efficient data structures to represent the problem and solve it than through tree-search. And so there exist instances of tree structures with cycles that cannot possibly be faster than a graph-search on the same structure with the same processing power.

Controversy

There are few controversies around Korf-Ferguson's DTS algorithm, since it is recognized as very complete, but simple. It is very often used as a stepping stone for students to discover the fundamentals and key concepts of distributed problem-solving.

The most important challenge to this algorithmic concept was an article by Kröll B, « Balanced Distributed Search Trees Do Not Exist », which does not attack the veracity or current efficiency of the algorithm, but rather the fact that DTS itself, no matter how many improvements are made to it (for example balancing the input tree before-hand), will never be able to reach optimal resolution-time.[4] This opens a new view point: are too many resources used into the completion of DTS, which blocks new algorithms with higher efficiency-potential from getting researched and developed? Another limit of DTS is the fact that no matter how efficient the division, coordination and merging of the solutions is, it will always be limited by the material number or processors and their processing power.

See also

  • Colbrook A., Brewer E., Dellarocas C., Weihl W., "Algorithms for Search Trees on Message-Passing Architectures" (1996)
  • Colbrook A., Smythe C., Efficient implementations of search trees on parallel distributed memory architectures" (1990)
  • Bayer R., McCreight E., Organization and Maintenance of Large Ordered Indices. Acta Informatica 1 (1972)
  • Comer D., The Ubiquitous B-Tree (1979)

References

  1. KORF Richard E., FERGUSON Chris (1988). "Distributed Tree Search and its Application to Alpha-Beta Pruning". University of California, Los Angeles. http://aaaipress.org/Papers/AAAI/1988/AAAI88-023.pdf. 
  2. "What is IP Routing ?". MetaSwitch Networks. 2016. http://www.metaswitch.com/resources/what-is-ip-routing. 
  3. Vajgel, Peter (April 30, 2009). "Needle in a Haystack : Efficient Storage of Billions of Photos". Facebook (company). https://code.facebook.com/posts/685565858139515/needle-in-a-haystack-efficient-storage-of-billions-of-photos/. 
  4. Kröll, Brigitte; Widmayer, Peter (1995-08-16). Akl, Selim G.. ed (in en). Balanced distributed search trees do not exist. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 50–61. doi:10.1007/3-540-60220-8_50. ISBN 9783540602200.