B*

From HandWiki
Short description: Algorithm

In computer science, B* (pronounced "B star") is a best-first graph search algorithm that finds the least-cost path from a given initial node to any goal node (out of one or more possible goals). First published by Hans Berliner in 1979, it is related to the A* search algorithm.

Summary

The algorithm stores intervals for nodes of the tree as opposed to single point-valued estimates. Then, leaf nodes of the tree can be searched until one of the top level nodes has an interval which is clearly "best."

Details

Interval evaluations rather than estimates

Leaf nodes of a B*-tree are given evaluations that are intervals rather than single numbers. The interval is supposed to contain the true value of that node. If all intervals attached to leaf nodes satisfy this property, then B* will identify an optimal path to the goal state.

Backup process

To back up the intervals within the tree, a parent's upper bound is set to the maximum of the upper bounds of the children. A parent's lower bound is set to the maximum of the lower bound of the children. Note that different children might supply these bounds.

Termination of search

B* systematically expands nodes in order to create "separation," which occurs when the lower bound of a direct child of the root is at least as large as the upper bound of any other direct child of the root. A tree that creates separation at the root contains a proof that the best child is at least as good as any other child.

In practice, complex searches might not terminate within practical resource limits. So the algorithm is normally augmented with artificial termination criteria such as time or memory limits. When an artificial limit is hit, then you must make a heuristic judgment about which move to select. Normally, the tree would supply you with extensive evidence, like the intervals of root nodes.

Expansion

B* is a best-first process, which means that it is very efficient to traverse the tree, repeatedly descending to find a leaf to expand. This section describes how to choose the node to expand. (Note: Whether or not the tree is memory-resident, is a function of the overall implementation efficiency, including how it may be mapped and/or managed via real or virtual memory.)

At the root of the tree, the algorithm applies one of two strategies, called prove-best and disprove-rest. In the prove-best strategy, the algorithm selects the node associated with the highest upper bound. The hope is that expanding that node will raise its lower bound higher than any other node's upper bound.

The disprove-rest strategy selects the child of the root that has the second-highest upper bound. The hope is that by expanding that node you might be able to reduce the upper bound to less than the lower bound of the best child.

Strategy selection

Note that applying the disprove-rest strategy is pointless until the lower bound of the child node that has the highest upper bound is the highest among all lower bounds.

The original algorithm description did not give any further guidance on which strategy to select. There are several reasonable alternatives, such as expanding the choice that has the smaller tree.

Strategy selection at non-root nodes

Once a child of the root has been selected (using prove-best or disprove-rest) then the algorithm descends to a leaf node by repeatedly selecting the child that has the highest upper bound.

When a leaf node is reached, the algorithm generates all successor nodes and assigns intervals to them using the evaluation function. Then the intervals of all nodes have to be backed up using the backup operation.

When transpositions are possible, then the back-up operation might need to alter the values of nodes that did not lie on the selection path. In this case, the algorithm needs pointers from children to all parents so that changes can be propagated. Note that propagation can cease when a backup operation does not change the interval associated with a node.

Robustness

If intervals are incorrect (in the sense that the game-theoretic value of the node is not contained within the interval), then B* might not be able to identify the correct path. However, the algorithm is fairly robust to errors in practice.

The Maven (Scrabble) program has an innovation that improves the robustness of B* when evaluation errors are possible. If a search terminates due to separation then Maven restarts the search after widening all of the evaluation intervals by a small amount. This policy progressively widens the tree, eventually erasing all errors.

Extension to two-player games

The B* algorithm applies to two-player deterministic zero-sum games. In fact, the only change is to interpret "best" with respect to the side moving in that node. So you would take the maximum if your side is moving, and the minimum if the opponent is moving. Equivalently, you can represent all intervals from the perspective of the side to move, and then negate the values during the back-up operation.

Applications

Andrew Palay applied B* to chess. Endpoint evaluations were assigned by performing null-move searches. There is no report of how well this system performed compared to alpha–beta pruning search engines running on the same hardware.

The Maven (Scrabble) program applied B* search to endgames. Endpoint evaluations were assigned using a heuristic planning system.

The B* search algorithm has been used to compute optimal strategy in a sum game of a set of combinatorial games.

See also

References