Interleave lower bound

From HandWiki

In the theory of optimal binary search trees, the interleave lower bound is a lower bound on the number of operations required by a Binary Search Tree (BST) to execute a given sequence of accesses. Several variants of this lower bound have been proven.[1][2][3] This article is based on a variation of the first Wilber's bound.[4] This lower bound is used in the design and analysis of Tango tree.[4] Furthermore, this lower bound can be rephrased and proven geometrically, Geometry of binary search trees.[5]

Definition

The bound is based on a fixed perfect BST [math]\displaystyle{ P }[/math], called the lower bound tree, over the keys [math]\displaystyle{ \{1, 2,..., n \} }[/math]. For example, for [math]\displaystyle{ n = 7 }[/math], [math]\displaystyle{ P }[/math] can be represented by the following parenthesis structure:

[([1] 2 [3]) 4 ([5] 6 [7])]

For each node [math]\displaystyle{ y }[/math] in [math]\displaystyle{ P }[/math], define:

  • [math]\displaystyle{ Left(y) }[/math] to be the set of nodes in the left sub-tree of [math]\displaystyle{ y }[/math], including [math]\displaystyle{ y }[/math].
  • [math]\displaystyle{ Right(y) }[/math] to be the set of nodes in the right sub-tree of [math]\displaystyle{ y }[/math].

Consider the following access sequence: [math]\displaystyle{ X = x_1, x_2, ..., x_m }[/math]. For a fixed node [math]\displaystyle{ y }[/math], and for each access [math]\displaystyle{ x_i }[/math], define the label of [math]\displaystyle{ x_i }[/math] with respect to [math]\displaystyle{ y }[/math] as:

  • "L" - if [math]\displaystyle{ x_i }[/math] is in [math]\displaystyle{ Left(y) }[/math].
  • "R" - if [math]\displaystyle{ x_i }[/math] is in [math]\displaystyle{ Right(y) }[/math];
  • Null - otherwise.

The label of [math]\displaystyle{ y }[/math] is the concatenation of the labels from all the accesses. For example, if the sequence of accesses is: [math]\displaystyle{ 7,6,3 }[/math] then the label of the root [math]\displaystyle{ (4) }[/math] is: "RRL", the label of 6 is: "RL", and the label of 2 is: "L".

For every node [math]\displaystyle{ y }[/math], define the amount of interleaving through y as the number of alternations between L and R in the label of [math]\displaystyle{ y }[/math]. In the above example, the interleaving through [math]\displaystyle{ 4 }[/math] and [math]\displaystyle{ 6 }[/math] is [math]\displaystyle{ 1 }[/math] and the interleaving through all other nodes is [math]\displaystyle{ 0 }[/math].

The interleave bound, [math]\displaystyle{ \mathit{IB}(X) }[/math], is the sum of the interleaving through all the nodes of the tree. The interleave bound of the above sequence is [math]\displaystyle{ 2 }[/math].

The Lower Bound Statement and its Proof

The interleave bound is summarized by the following theorem.

Theorem —  Let [math]\displaystyle{ X }[/math] be an access sequence. Denote by [math]\displaystyle{ IB(X) }[/math] the interleave bound of [math]\displaystyle{ X }[/math], then [math]\displaystyle{ \mathit{IB}(X)/2 - n }[/math] is a lower bound of [math]\displaystyle{ OPT(X) }[/math], the cost of optimal offline BST that serves [math]\displaystyle{ X }[/math].

The following proof is based on.[4]

Proof

Let [math]\displaystyle{ X = x_1,x_2,...,x_m }[/math] be an access sequence. Denote by [math]\displaystyle{ T_i }[/math] the state of an arbitrary BST at time [math]\displaystyle{ i }[/math] i.e. after executing the sequence [math]\displaystyle{ x_1,x_2,...,x_i }[/math]. We also fix a lower bound BST [math]\displaystyle{ P }[/math].

For a node [math]\displaystyle{ y }[/math] in [math]\displaystyle{ P }[/math], define the transition point for [math]\displaystyle{ y }[/math] at time [math]\displaystyle{ i }[/math] to be the minimum-depth node [math]\displaystyle{ z }[/math] in the BST [math]\displaystyle{ T_i }[/math] such that the path from the root of [math]\displaystyle{ T_i }[/math] to [math]\displaystyle{ z }[/math] includes both a node from Left(y) and a node from Right(y). Intuitively, any BST algorithm on [math]\displaystyle{ T_i }[/math] that accesses an element from Right(y) and then an element from Left(y) (or vice versa) must touch the transition point of [math]\displaystyle{ y }[/math] at least once. In the following Lemma, we will show that transition point is well-defined.

Lemma 1 — The transition point of a node [math]\displaystyle{ y }[/math] in [math]\displaystyle{ P }[/math] at a time [math]\displaystyle{ i }[/math] exists and it is unique.[4]

The second lemma that we need to prove states that the transition point is stable. It will not change until it is touched.

Lemma 2 — Given a node [math]\displaystyle{ y }[/math]. Suppose [math]\displaystyle{ z }[/math] is the transition point of [math]\displaystyle{ y }[/math] at a time [math]\displaystyle{ j }[/math]. If an access algorithm for a BST does not touch [math]\displaystyle{ z }[/math] in [math]\displaystyle{ T_i }[/math] for [math]\displaystyle{ i \in [j, k] }[/math], then the transition point of [math]\displaystyle{ y }[/math] will remain [math]\displaystyle{ z }[/math] in [math]\displaystyle{ T_i }[/math] for [math]\displaystyle{ i \in [j, k] }[/math]. [4]

The last Lemma toward the proof states that every node [math]\displaystyle{ y \in P }[/math] has its unique transition point.

Lemma 3 — Given a BST at time [math]\displaystyle{ i }[/math], [math]\displaystyle{ T_i }[/math], any node [math]\displaystyle{ y }[/math] in [math]\displaystyle{ T_i }[/math] can be only a transition for at most one node in [math]\displaystyle{ P }[/math].[4]

Now, we are ready to prove the theorem. First of all, observe that the number of touched transition points by the offline BST algorithm is a lower bound on its cost, we are counting less nodes than the required for the total cost.

We know by Lemma 3 that at any time [math]\displaystyle{ i }[/math], any node [math]\displaystyle{ y }[/math] in [math]\displaystyle{ T_i }[/math] can be only a transition for at most one node in [math]\displaystyle{ P }[/math]. Thus, It is enough to count the number of touches of a transition node of [math]\displaystyle{ y }[/math], the sum over all [math]\displaystyle{ y }[/math].

Therefore, for a fixed node [math]\displaystyle{ y \in P }[/math], let [math]\displaystyle{ \ell }[/math] and [math]\displaystyle{ r }[/math] to be defined as in Lemma 1. The transition point of [math]\displaystyle{ y }[/math] is among these two nodes. In fact, it is the deeper one. Let [math]\displaystyle{ x_{i_{1}}, x_{i_{2}}, ..., x_{i_p} }[/math] be a maximal ordered access sequence to nodes that alternate between [math]\displaystyle{ Left(y) }[/math] and [math]\displaystyle{ Right(y) }[/math]. Then [math]\displaystyle{ p }[/math] is the amount of interleaving through the node [math]\displaystyle{ y }[/math]. Suppose that the even indexed accesses are in the [math]\displaystyle{ Left(y) }[/math], and the odd ones are in [math]\displaystyle{ Right(y) }[/math] i.e. [math]\displaystyle{ x_{i_{2j}} \in Left(y) }[/math] and [math]\displaystyle{ x_{i_{2j - 1}} \in Right(y) }[/math]. We know by the properties of lowest common ancestor that an access to a node in [math]\displaystyle{ Left(y) }[/math], it must touch [math]\displaystyle{ \ell }[/math]. Similarly, an access to a node in [math]\displaystyle{ Right(y) }[/math] must touch [math]\displaystyle{ r }[/math]. Consider every [math]\displaystyle{ j \in [1, \lfloor p/2 \rfloor] }[/math]. For two consecutive accesses [math]\displaystyle{ x_{i_{2j - 1}} }[/math] and [math]\displaystyle{ x_{i_{2j}} }[/math], if they avoid touching the access point of [math]\displaystyle{ y }[/math], then [math]\displaystyle{ \ell }[/math] and [math]\displaystyle{ r }[/math] must change in between. However, by Lemma 2, such change requires touching the transition point. Consequently, the BST access algorithm touches the transition point of [math]\displaystyle{ y }[/math] at least once in the interval of [math]\displaystyle{ [i_{2j - 1}, i_{2j}] }[/math]. Summing over all [math]\displaystyle{ j \in [1, \lfloor p/2 \rfloor ] }[/math], the best algorithm touches the transition point of [math]\displaystyle{ y }[/math] at least [math]\displaystyle{ \lfloor p/2 \rfloor \geq p/2 - 1 }[/math]. Summing over all [math]\displaystyle{ y }[/math],

      [math]\displaystyle{  \sum_{y \in P} p_y/2 - 1 \geq IB(X)/2 - n  }[/math]

where [math]\displaystyle{ p_y }[/math] is the amount of interleave through [math]\displaystyle{ y }[/math]. By definition, the [math]\displaystyle{ p_y }[/math]'s add up to [math]\displaystyle{ IB(X) }[/math]. That concludes the proof.

See also

References

  1. Wilber, R. (1989). "Lower Bounds for Accessing Binary Search Trees with Rotations". SIAM Journal on Computing 18: 56–67. doi:10.1137/0218004. 
  2. Hampapuram, H.; Fredman, M. L. (1998). "Optimal Biweighted Binary Trees and the Complexity of Maintaining Partial Sums". SIAM Journal on Computing 28: 1–9. doi:10.1137/S0097539795291598. 
  3. Patrascu, M.; Demaine, E. D. (2006). "Logarithmic Lower Bounds in the Cell-Probe Model". SIAM Journal on Computing 35 (4): 932. doi:10.1137/S0097539705447256. http://erikdemaine.org/papers/DynamicConnectivity_SICOMP/paper.pdf. 
  4. 4.0 4.1 4.2 4.3 4.4 4.5 Demaine, E. D.; Harmon, D.; Iacono, J.; Pătraşcu, M. (2007). "Dynamic Optimality—Almost". SIAM Journal on Computing 37: 240–251. doi:10.1137/S0097539705447347. http://erikdemaine.org/papers/Tango_SICOMP/paper.pdf. 
  5. "The geometry of binary search trees", In Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2009) (New York): 496–505, 2009, doi:10.1137/1.9781611973068.55, ISBN 978-0-89871-680-1, http://erikdemaine.org/papers/BST_SODA2009/