Augmented map

From HandWiki

In computer science, the augmented map[1] is an abstract data type (ADT) based on ordered maps, which associates each ordered map an augmented value. For an ordered map [math]\displaystyle{ m }[/math] with key type [math]\displaystyle{ K }[/math], comparison function [math]\displaystyle{ \lt _K }[/math] on [math]\displaystyle{ K }[/math] and value type [math]\displaystyle{ V }[/math], the augmented value is defined based on two functions: a base function [math]\displaystyle{ g: K\times V \mapsto A }[/math] and a combine function [math]\displaystyle{ f: A\times A\mapsto A }[/math], where [math]\displaystyle{ A }[/math] is the type of the augmented value. The base function [math]\displaystyle{ g }[/math] converts a single entry in [math]\displaystyle{ m }[/math] to an augmented value, and the combine function [math]\displaystyle{ f }[/math] combines multiple augmented values. The combine function [math]\displaystyle{ f }[/math] is required to be associative and have an identity [math]\displaystyle{ I }[/math] (i.e., [math]\displaystyle{ A,f,I }[/math] forms a monoid). We extend the definition of the associative function [math]\displaystyle{ f }[/math] as follows:

[math]\displaystyle{ f(\emptyset)=I }[/math]
[math]\displaystyle{ f(a)=a }[/math]
[math]\displaystyle{ f(a_1,a_2,\dots,a_n)=f(f(a_1,a_2,\dots,a_{n-1}),a_n) }[/math]

Then the augmented value of an ordered map [math]\displaystyle{ m=\{(k_1,v_1),(k_2,v_2),\dots (k_n,v_n)\} }[/math] is defined as follows:

[math]\displaystyle{ A(m)=f(g(k_1,v_1),g(k_2,v_2),\dots,g(k_n,v_n)) }[/math]

Accordingly, an augmented map can be formally defined as a seven-tuple [math]\displaystyle{ \mathbb{AM}(K,\lt _K,V,A,g,f,I) }[/math]. For example, an augmented map with integral keys and values, on which the augmented value is defined as the sum of all values in the map, is defined as:

[math]\displaystyle{ M_1 = \mathbb{AM}(\mathbb{Z},\lt _\mathbb{Z},\mathbb{Z},\mathbb{Z},(k,v)\mapsto v, +_\mathbb{Z},0) }[/math]

As an abstract data type, the augmented map is often used to model problems and serves as an abstraction with a useful interface. It is designed for supporting fast range sums, which means to quickly return the augmented value of all entries in a certain key range.

Interface

In addition to the interface for a standard ordered map, the augmented map should also support functions for range sums. In particular:

  • aug_left[math]\displaystyle{ (m,k) }[/math]: returns the augmented value of all entries in [math]\displaystyle{ m }[/math] with keys no more than [math]\displaystyle{ k }[/math].
  • aug_right[math]\displaystyle{ (m,k) }[/math]: returns the augmented value of all entries in [math]\displaystyle{ m }[/math] with keys no less than [math]\displaystyle{ k }[/math].
  • aug_range[math]\displaystyle{ (m,k_1,k_2) }[/math]: returns the augmented value of all entries in [math]\displaystyle{ m }[/math] with keys in range [math]\displaystyle{ [k_1,k_2] }[/math].
  • aug_val[math]\displaystyle{ (m) }[/math]: returns the augmented value of all entries in [math]\displaystyle{ m }[/math].

Some functions, although are not defined based on augmented values, can make use of augmented values to accelerate their algorithms. Usually, they would require some certain representation of augmented maps, and certain conditions for input parameters.

  • aug_filter[math]\displaystyle{ (m,h) }[/math]: returns all entries in [math]\displaystyle{ m }[/math] satisfying the indicator [math]\displaystyle{ h: A\mapsto \text{bool} }[/math]. It is only applicable when [math]\displaystyle{ h(a)\vee h(b)\Leftrightarrow h(f(a,b)) }[/math]. In this case, the aug_filter function is equivalent to filter[math]\displaystyle{ m,h' }[/math], where [math]\displaystyle{ h':K\times V\mapsto \text{bool} }[/math] and [math]\displaystyle{ h(g(k,v))\Leftrightarrow h'(k,v) }[/math]. When the augmented map is implemented using augmented trees, this function can be implemented asymptotically more efficient than the naive implementation.
  • aug_project[math]\displaystyle{ (g',f',m,k_1,k_2) }[/math]: returns [math]\displaystyle{ g'(aug\_range(m,k_1,k_2)) }[/math]. Here [math]\displaystyle{ g':A\mapsto B }[/math], [math]\displaystyle{ f':B\times B\mapsto B }[/math]. It requires [math]\displaystyle{ (B,f',g'(I)) }[/math] to be a monoid and [math]\displaystyle{ f'(g'(a),g'(b))=g'(f(a,b)) }[/math]. This function is useful when the augmented values

are themselves maps or other large data structures. It allows projecting the augmented values down onto another type by [math]\displaystyle{ g' }[/math] (e.g. project augmented values with complicated structures to values like their sizes) then summing them by [math]\displaystyle{ f' }[/math], and is much more efficient when applicable.

Implementation

Augmented trees

The augmented map can be supported efficiently by augmented trees, where each tree node is augmented by the augmented value of all entries in its subtree. Because of the associativity of the combine function [math]\displaystyle{ f }[/math], the augmented value of a certain tree is fixed, and is independent of the shape of the tree, regardless of rotations. In this case, by combining the partial sums in the tree nodes, any range sum can be returned in [math]\displaystyle{ O(\log n) }[/math] time on an augmented map of size [math]\displaystyle{ n }[/math], assuming both [math]\displaystyle{ f }[/math] and [math]\displaystyle{ g }[/math] have constant cost. For aug_filter, the tree structure takes the advantage that if the augmented value of a subtree does not satisfy the indicator [math]\displaystyle{ h }[/math], the whole tree gets discarded. In this case, the cost of aug_filter is [math]\displaystyle{ O(k\log(1+\frac{n}{k}) }[/math] where [math]\displaystyle{ k }[/math] is the output size. For aug_project, whenever a whole subtree falls in the range [math]\displaystyle{ [k_1,k_2] }[/math], its augmented value can be directly combined to the result, instead of requiring traversing the tree.

Prefix structures

Another implementation is to use prefix structures,[2] which stores the augmented value of all prefixes of the map. For the above-defined augmented map [math]\displaystyle{ M_1 }[/math], the prefix structure is an array stored the prefix sum of the values, sorted by their keys. Prefix structures are particularly useful for aug_left, but can be inefficient to implement other functions on the augmented map interface. It can be useful in some geometric algorithms.

Example application

1D stabbing query

A 1D stabbing query takes as input a set of intervals on 1D number line. The query asks for if a given point is covered by any input interval, or all intervals covering this point. An augmented map can be defined for this problem, where the keys are the left endpoints of all intervals, values are the corresponding right endpoints, and the augmented value is the maximum value of all right endpoints in the map. More formally:

[math]\displaystyle{ M_{interval}=(\mathbb{R},\lt _\mathbb{R},\mathbb{R},\mathbb{R},(k,v)\mapsto v, \text{max}_\mathbb{R},-\infty) }[/math]

To report if any interval covers a given point [math]\displaystyle{ p }[/math], the query algorithm simply determines if aug_left[math]\displaystyle{ (p)\gt p }[/math]. This is because aug_left looks at all intervals that start before [math]\displaystyle{ p }[/math], and if the maximum right endpoint among them exceeds [math]\displaystyle{ p }[/math], then [math]\displaystyle{ p }[/math] must be covered. When implemented with augmented trees, the augmented map [math]\displaystyle{ M_\text{interval} }[/math] is exactly an interval tree. Constructing such an augmented tree structure costs work [math]\displaystyle{ O(n\log n) }[/math], [math]\displaystyle{ O(\log n) }[/math] parallel depth and [math]\displaystyle{ O(n) }[/math] space. Each stabbing query can be done in time [math]\displaystyle{ O(\log n) }[/math].

2D range query

A 2D range query takes as input a set of points on 2 dimensions. The query asks for all points (or the number) in a certain rectangle, whose edges are parallel to the axis. An augmented map can be defined for this problem, which is a two-level nested map structure. The outer map stores all points and sort them by their x-coordinates. The augmented value of the outer map is an inner augmented map structure, which stores the same set of points as the outer map, but sorts them by their y-coordinates. The augmentation of the inner trees accumulate the count of points, i.e., the augmented value of an inner map is its size. Accordingly, the combine function of the outer map is to take a union of the two inner augmented maps. More formally:

[math]\displaystyle{ M_{inner}=(Y,\lt _Y,X,\mathbb{Z},(k,v)\mapsto 1,+_\mathbb{Z},0) }[/math]
[math]\displaystyle{ M_{outer}=(X,\lt _X,Y,M_\text{inner},(k,v)\mapsto M_\text{inner}.singleton(v,k),M_\text{inner}.\text{union},M_\text{inner}.\text{empty}) }[/math]

Here for simplicity, assume all coordinates are unique. [math]\displaystyle{ X }[/math] and [math]\displaystyle{ Y }[/math] are the types of the x- and y-coordinates. To answer the range query in rectangle [math]\displaystyle{ (x_1,x_2,y_1,y_2) }[/math], the query algorithm extracts the augmented value of the outer map in the key range [math]\displaystyle{ [x_1,x_2] }[/math], which is an inner map of all desired points sorted by y-coordinates. Therefore, the algorithm takes another aug_range on this inner map and gets the result. For counting queries, the algorithm can make use of the aug_project function.

[math]\displaystyle{ query(m,x_1,x_2,y_1,y_2)=M_\text{outer}.\text{aug}\_\text{project}(g',+_\mathbb{Z},m,x_1,x_2) }[/math]
[math]\displaystyle{ \text{where }g':M_\text{inner}\mapsto \mathbb{Z}, g'(m')=M_\text{inner}.\text{aug}\_\text{range}(m,y_1,y_2) }[/math]

If both the inner and the outer maps are implemented by augmented trees, then the whole two-level map structure becomes a range tree structure. If the inner map is supported by the augmented tree structure, and the outer tree is supported as the prefix structure, then the algorithm becomes a sweepline algorithm.

Other examples

Other examples[1][2] include segment queries, inverted index searching, rectangle queries, etc.

Library support

A parallel implementation of the augmented map interface is provided in a library PAM.

Notes

References

  1. 1.0 1.1 Blelloch, Guy E.; Ferizovic, Daniel; Sun, Yihan (2018), "PAM: parallel augmented maps", Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp. 290-304 
  2. 2.0 2.1 Blelloch, Guy E.; Sun, Yihan (2018), "Parallel Range, Segment and Rectangle Queries with Augmented Maps", Parallel Range, Segment and Rectangle Queries with Augmented Maps