List-labeling problem

From HandWiki
Revision as of 16:21, 6 February 2024 by MainAI (talk | contribs) (fix)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: A problem related to computer science

In computer science, the list-labeling problem involves maintaining a totally ordered set S supporting the following operations:

  • insert(X), which inserts X into set S;
  • delete(X), which removes X from set S;
  • label(X), which returns a label assigned to X subject to:
    • label(X) [math]\displaystyle{ \in \{0, 1, \ldots, m-1\} }[/math]
    • [math]\displaystyle{ \forall }[/math] X,Y [math]\displaystyle{ \in }[/math] S, X < Y implies label(X) < label(Y)

The cost of a list labeling algorithm is the number of label (re-)assignments per insertion or deletion. List labeling algorithms have applications in many areas, including the order-maintenance problem, cache-oblivious data structures,[1] data structure persistence,[2] graph algorithms[3][4] and fault-tolerant data structures.[5]

Sometimes the list labeling problem is presented where S is not a set of values but rather a set of objects subject to a total order. In this setting, when an item is inserted into S, it is specified to be the successor of some other item already in S. For example, this is the way that list labeling is used in the order-maintenance problem. The solutions presented below apply to both formulations.

Upper bounds

The cost of list labeling is related to [math]\displaystyle{ m }[/math], the range of the labels assigned. Suppose that no more than [math]\displaystyle{ n }[/math] items are stored in the list-labeling structure at any time. Four cases have been studied:

  • [math]\displaystyle{ m = 2^{\Omega(n)} }[/math]
  • [math]\displaystyle{ m = n^{\Omega(1)} }[/math]
  • [math]\displaystyle{ m = O(n) }[/math]
  • [math]\displaystyle{ m = (1+\varepsilon) n }[/math]

Exponential Labels

In the exponential label case, each item that is inserted can be given a label that is the average of its neighboring labels. It takes [math]\displaystyle{ \Omega(n) }[/math] insertions before two items are at adjacent labels and there are no labels available for items in between them. When this happens, all items are relabelled evenly from the space of all labels. This incurs [math]\displaystyle{ O(n) }[/math] relabeling cost. Thus, the amortized relabeling cost in this case is [math]\displaystyle{ O(1) }[/math].[6]

Polynomial Labels

The other cases of list labeling can be solved via balanced binary search trees. Consider [math]\displaystyle{ T }[/math], a binary search tree on S of height [math]\displaystyle{ h }[/math]. We can label every node in the tree via a path label as follows: Let [math]\displaystyle{ \sigma(X) }[/math] be the sequence of left and right edges on the root-to-[math]\displaystyle{ X }[/math] path, encoded as bits. So if [math]\displaystyle{ X }[/math] is in the left subtree of the root, the high-order bit of [math]\displaystyle{ \sigma(X) }[/math] is [math]\displaystyle{ 0 }[/math], and if it is in the right subtree of the root, the high-order bit of [math]\displaystyle{ \sigma(X) }[/math] is [math]\displaystyle{ 1 }[/math]. Once we reach [math]\displaystyle{ X }[/math], we complete [math]\displaystyle{ \sigma(X) }[/math] to a length of [math]\displaystyle{ h+1 }[/math] as follows. If [math]\displaystyle{ X }[/math] is a leaf, we append [math]\displaystyle{ 0 }[/math]s as the low order bits until [math]\displaystyle{ \sigma(X) }[/math] has [math]\displaystyle{ h+1 }[/math] bits. If [math]\displaystyle{ X }[/math] is an internal node, we append one [math]\displaystyle{ 0 }[/math] and then [math]\displaystyle{ 1 }[/math]s as the low order bits until [math]\displaystyle{ \sigma(X) }[/math] has [math]\displaystyle{ h+1 }[/math] bits.

The important properties of [math]\displaystyle{ \sigma() }[/math] are that: these labels are in the range [math]\displaystyle{ \{0, 1, \ldots, 2^{h+1}-1\} }[/math]; and for two nodes with keys [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] in [math]\displaystyle{ T, }[/math] if [math]\displaystyle{ X\lt Y, }[/math] then [math]\displaystyle{ \sigma(X) \lt \sigma(Y) }[/math]. To see this latter property, notice that the property is true if the least common ancestor of [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] is neither [math]\displaystyle{ X }[/math] nor [math]\displaystyle{ Y }[/math], because [math]\displaystyle{ \sigma(X) }[/math] and [math]\displaystyle{ \sigma(Y) }[/math] will share bits until their least common ancestor. If [math]\displaystyle{ X\lt Y }[/math], then because [math]\displaystyle{ T }[/math] is a search tree, [math]\displaystyle{ X }[/math] will be in the left subtree and will have a next bit of [math]\displaystyle{ 0 }[/math], whereas [math]\displaystyle{ Y }[/math] will be in the right subtree and will have a next bit of [math]\displaystyle{ 1 }[/math].

Suppose instead that, without loss of generality, the least common ancestor of [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] is [math]\displaystyle{ X }[/math], and that [math]\displaystyle{ X }[/math] has depth [math]\displaystyle{ d }[/math]. If [math]\displaystyle{ Y }[/math] is in the left subtree of [math]\displaystyle{ X }[/math], then [math]\displaystyle{ \sigma(X) }[/math] and [math]\displaystyle{ \sigma(Y) }[/math] share the first [math]\displaystyle{ d+1 }[/math] bits. The remaining bits of [math]\displaystyle{ \sigma(X) }[/math] are all 1s, whereas the remaining bits of [math]\displaystyle{ \sigma(Y) }[/math] must have a [math]\displaystyle{ 0 }[/math], so [math]\displaystyle{ \sigma(Y)\lt \sigma(X) }[/math]. If instead [math]\displaystyle{ Y }[/math] is in the right subtree of [math]\displaystyle{ X }[/math], then [math]\displaystyle{ \sigma(X) }[/math] and [math]\displaystyle{ \sigma(Y) }[/math] share the first [math]\displaystyle{ d }[/math] bits and the [math]\displaystyle{ d+1 }[/math]st bit of [math]\displaystyle{ \sigma(X) }[/math] is [math]\displaystyle{ 0 }[/math], whereas the [math]\displaystyle{ d+1 }[/math]st bit of [math]\displaystyle{ \sigma(Y) }[/math] is [math]\displaystyle{ 1 }[/math]. Hence [math]\displaystyle{ \sigma(X)\lt \sigma(Y) }[/math].

We conclude that the [math]\displaystyle{ \sigma() }[/math] function fulfills the monotonicity property of the label() function. Thus if we can balance the binary tree to a depth of [math]\displaystyle{ (\log m) -1 }[/math], we will have a solution to the list labeling problem for labels in the range [math]\displaystyle{ \{0,\ldots,m-1\} }[/math].

Weight-balanced trees

In order to use a self-balancing binary search tree to solve the list labeling problem, we need to first define the cost function of a balancing operation on insertion or deletion to equal the number of labels that are changed, since every rebalancing operation of the tree would have to also update all path labels in the subtree rooted at the site of the rebalance. So, for example, rotating a node with a subtree of size [math]\displaystyle{ k }[/math], which can be done in constant time under usual circumstances, requires [math]\displaystyle{ \Omega(k) }[/math] path label updates. In particular, if the node being rotated is the root then the rotation would take time linear in the size of the whole tree. With that much time the entire tree could be rebuilt. We will see below that there are self-balancing binary search tree data structures that cause an appropriate number of label updates during rebalancing.

A weight-balanced tree BB[[math]\displaystyle{ \alpha }[/math]] is defined as follows. For every [math]\displaystyle{ X }[/math] in a root tree [math]\displaystyle{ T }[/math], define [math]\displaystyle{ size(X) }[/math] to be the number of nodes in the subtree rooted at [math]\displaystyle{ X }[/math]. Let the left and right children of [math]\displaystyle{ X }[/math] be [math]\displaystyle{ X.left }[/math] and [math]\displaystyle{ X.right }[/math], respectively. A tree [math]\displaystyle{ T }[/math] is [math]\displaystyle{ \alpha }[/math]-weight balanced if for every internal node [math]\displaystyle{ X }[/math] in [math]\displaystyle{ T }[/math], [math]\displaystyle{ size(X.left) \ge \lfloor \alpha \cdot size(X.right)\rfloor }[/math] and [math]\displaystyle{ size(X.right) \ge \lfloor \alpha \cdot size(X.left)\rfloor. }[/math]

The height of a BB[[math]\displaystyle{ \alpha }[/math]] tree with [math]\displaystyle{ n }[/math] nodes is at most [math]\displaystyle{ \log_{1/(1-\alpha)} n=-\log(n)/\log(1-\alpha). }[/math] Therefore, in order to solve the list-labeling problem, we need [math]\displaystyle{ \alpha = 1 - 1/(1-n^{-1/(\log(m)-1)}) }[/math] to achieve a depth of [math]\displaystyle{ \log(m) - 1. }[/math]

A scapegoat tree is a weight-balanced tree where whenever a node no longer satisfies the weight-balance condition the entire subtree rooted at that node is rebuilt. This rebalancing scheme is ideal for list labeling, since the cost of rebalancing now equals the cost of relabeling. The amortized cost of an insertion or deletion is [math]\displaystyle{ (1+ 1/(1-2\alpha))\log_{1/1-\alpha} n + O(1). }[/math] For the list labeling problem, the cost becomes:

  • [math]\displaystyle{ m = n^{\Omega(1)} }[/math]: [math]\displaystyle{ \alpha = O(1) }[/math], the cost of list labeling is amortized [math]\displaystyle{ O(\log n). }[/math] (Folklore, modification of Itai, Konheim and Rodeh.[7])
  • [math]\displaystyle{ m = O(n) }[/math]: [math]\displaystyle{ \alpha = 1+ \Theta(1/\log n) }[/math], the cost of list labeling is amortized [math]\displaystyle{ O(\log^2 n). }[/math] This bound was first achieved by Itai, Konheim, and Rodeh[7] and deamortized by Willard.[8]
  • [math]\displaystyle{ m = (1+\varepsilon) n }[/math]: If [math]\displaystyle{ m }[/math] is a power of two, then we can set [math]\displaystyle{ \alpha = 1+ \Theta(\varepsilon/\log n) }[/math], and the cost of list labeling is [math]\displaystyle{ O(\varepsilon^{-1}\log^2 n) }[/math]. A more careful algorithm can achieve this bound even in the case where [math]\displaystyle{ m }[/math] is not a power of two.

Lower bounds and open problems

In the case where [math]\displaystyle{ m = n^{1+\Theta(1)} }[/math], a lower bound of [math]\displaystyle{ \Omega(\log n) }[/math][9] has been established for list labeling. This lower bound applies to randomized algorithms, and so the known bounds for this case are tight.

In the case where [math]\displaystyle{ m = {(1+\Theta(1))}n }[/math], there is a lower bound of [math]\displaystyle{ \Omega(\log^2 n) }[/math] list labeling cost for deterministic algorithms.[6] Furthermore, the same lower bound holds for smooth algorithms, which are those whose only relabeling operation assigns labels evenly in a range of items[10] This lower bound is surprisingly strong in that it applies in the offline cases where all insertions and deletions are known ahead of time.

However, the best lower bound known for the linear case of algorithms that are allowed to be non-smooth and randomized is [math]\displaystyle{ \Omega(\log n) }[/math]. Indeed, it has been an open problem since 1981 to close the gap between the [math]\displaystyle{ O(\log^2 n) }[/math] upper bound and the [math]\displaystyle{ \Omega(\log n) }[/math] in the linear case.[7][11] Some progress on this problem has been made by Bender et al. who give a randomized upper bound of [math]\displaystyle{ O(\log^{1.5} n) }[/math].[12]

Applications

The best known applications of list labeling are the order-maintenance problem and packed-memory arrays for cache-oblivious data structures. The order-maintenance problem is that of maintaining a data structure on a linked list to answer order queries: given two items in the linked list, which is closer to the front of the list? This problem can be solved directly by polynomial list labeling in [math]\displaystyle{ O(\log n) }[/math] per insertion and deletion and [math]\displaystyle{ O(1) }[/math] time per query, by assigning labels that are monotone with the rank in the list. The time for insertions and deletions can be improved to constant time by combining exponential polynomial list labeling with exponential list labeling on small lists.

The packed-memory array is an array of size [math]\displaystyle{ (1+\varepsilon)n }[/math] to hold [math]\displaystyle{ n }[/math] items so that any subarray of size [math]\displaystyle{ k }[/math] holds [math]\displaystyle{ \Theta(k) }[/math] items. This can be solved directly by the [math]\displaystyle{ m=(1+\varepsilon)n }[/math] case of list labeling, by using the labels as addresses in the array, as long as the solution guarantees that the space between items is [math]\displaystyle{ O(1) }[/math]. Packed-memory arrays are used in cache-oblivious data structures to store data that must be indexed and scanned. The density bounds guarantee that a scan through the data is asymptotically optimal in the external-memory model for any block transfer size.

References

  1. "Cache-oblivious B-trees", SIAM Journal on Computing 35 (2): 341–358, 2005, doi:10.1137/S0097539701389956, http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/paper.pdf .
  2. Driscoll, James R.; Sarnak, Neil (1989), "Making data structures persistent", Journal of Computer and System Sciences 38 (1): 86–124, doi:10.1016/0022-0000(89)90034-2 .
  3. "Sparsification—a technique for speeding up dynamic graph algorithms", Journal of the ACM 44 (5): 669–696, 1997, doi:10.1145/265910.265914 .
  4. Katriel, Irit (2006), "Online topological ordering", ACM Transactions on Algorithms 2 (3): 364–379, doi:10.1145/1159892.1159896 .
  5. Aumann, Yonatan (1996), "Fault tolerant data structures", Proceedings of the 37th Annual Symposium on Foundations of Computer Science (FOCS 1996), pp. 580–589, doi:10.1109/SFCS.1996.548517, ISBN 978-0-8186-7594-2 .
  6. 6.0 6.1 Bulánek, Jan; Koucký, Michal (2015), "Tight Lower Bounds for the Online Labeling Problem", SIAM Journal on Computing, 44, pp. 1765–1797 .
  7. 7.0 7.1 7.2 Itai, Alon; Konheim, Alan G.; Rodeh, Michael (1981), "A Sparse Table Implementation of Priority Queues", ICALP, pp. 417–431 
  8. Willard, Dan E. (1992), "A Density Control Algorithm for Doing Insertions and Deletions in a Sequentially Ordered File in Good Worst-Case Time", Information and Computation, 97, pp. 150–204 .
  9. Dietz, Paul F.; Seiferas, Joel I.; Zhang, Ju (1994), "A tight lower bound for on-line monotonic list labeling", Algorithm theory—SWAT '94 (Aarhus, 1994), Lecture Notes in Computer Science, 824, Berlin: Springer, pp. 131–142, doi:10.1007/3-540-58218-5_12, ISBN 978-3-540-58218-2 .
  10. Dietz, Paul F.; Zhang, Ju (1990), "Lower bounds for monotonic list labeling", Algorithm theory—SWAT '90, pp. 173–180 .
  11. Saks, Michael (2018), "Online Labeling: Algorithms, Lower Bounds and Open Questions", International Computer Science Symposium in Russia, pp. 23–28 .
  12. Bender, Michael A.; Conway, Alex; Farach-Colton, Martin; Komlos, Hanna; Kuszmaul, William; Wein, Nicole (October 2022). "Online List Labeling: Breaking the log2n Barrier". 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS). IEEE. pp. 980–990. doi:10.1109/focs54457.2022.00096. ISBN 978-1-6654-5519-0. http://dx.doi.org/10.1109/focs54457.2022.00096.