Palindrome tree

From HandWiki
Short description: Data structure for processing palindromes
Palindrome tree
TypeTree
Invented2015
Invented byMikhail Rubinchik, Arseny M. Shur
Time complexity in big O notation
Algorithm Average Worst case
Space O(n) O(n)
Search O(n*log σ) O(n*log σ)
Insert O(log σ) O(n)

In computer science a palindrome tree, also called an EerTree,[1] is a type of search tree, that allows for fast access to all palindromes contained in a string. They can be used to solve the longest palindromic substring, the k-factorization problem[2] (can a given string be divided into exactly k palindromes), palindromic length of a string[3] (what is the minimum number of palindromes needed to construct the string), and finding and counting all distinct sub-palindromes. Palindrome trees do this in an online manner, that is it does not require the entire string at the start and can be added to character by character.

Description

Palindrome Tree example for TACOCAT, where solid lines are character edges and dashed lines are suffix edges

Like most trees, a palindrome tree consists of vertices and directed edges. Each vertex in the tree represents a palindrome (e.g. 'tacocat') but only stores the length of the palindrome, and each edge represents either a character or a suffix. The character edges represent that when the character is appended to both ends of the palindrome represented by the source vertex, the palindrome in the destination vertex is created (e.g. an edge labeled 't' would connect the source vertex 'acoca' to the destination vertex 'tacocat'). The suffix edge connects each palindrome to the largest palindrome suffix it possesses (in the previous example 'tacocat' would have a suffix edge to 't', and 'atacocata' would have a suffix link to 'ata'). Where palindrome trees differ from regular trees, is that they have two roots (as they are in fact two separate trees). The two roots represent palindromes of length −1, and 0. That is, if the character 'a' is appended to both roots the tree will produce 'a' and 'aa' respectively. Since each edge adds (or removes) an even number of characters, the two trees are only ever connected by suffix edges.

Operations

Add

Since a palindrome tree follows an online construction, it maintains a pointer to the last palindrome added to the tree. To add the next character to the palindrome tree, add(x) first checks if the first character before the palindrome matches the character being added, if it does not, the suffix links are followed until a palindrome can be added to the tree. Once a palindrome has been found, if it already existed in the tree, there is no work to do. Otherwise, a new vertex is added with a link from the suffix to the new vertex, and a suffix link for the new vertex is added. If the length of the new palindrome is 1, the suffix link points to the root of the palindrome tree that represents a length of −1.

# S -> Input String
# x -> position in the string of the character being added
def add(x: int) -> bool:
    """Add character to the palindrome tree."""
    while True:
        if x - 1 - current.length >= 0 and S[x - 1 - current.length] == S[x]:
            break
        current = current.suffix

    if current.add[S[x]] is not None:
        return False

    suffix = current
    current = Palindrome_Vertex()
    current.length = suffix.length + 2
    suffix.add[S[x]] = current

    if current.length == 1:
        current.suffix = root
        return True

    while True:
        suffix = suffix.suffix
        if x - 1 - suffix.length >= 0 and S[x - 1 - suffix.length] == S[x]:
            current.suffix = suffix.add[S[x]]
            return True

Joint trees

Finding palindromes that are common to multiple strings or unique to a single string can be done with [math]\displaystyle{ O(n*i) }[/math] additional space where [math]\displaystyle{ i }[/math] is the number of strings being compared. This is accomplished by adding an array of length [math]\displaystyle{ i }[/math] to each vertex, and setting the flag to 1 at index [math]\displaystyle{ i }[/math] if that vertex was reached when adding string [math]\displaystyle{ i }[/math]. The only other modification needed is to reset the current pointer to the root at the end of each string. By joining trees in such a manner the following problems can be solved:

  • Number of palindromes common to all strings
  • Number of unique palindromes in a string
  • Longest palindrome common to all strings
  • The number of palindromes that occur more often in one string than others

Complexity

Time

Constructing a palindrome tree takes [math]\displaystyle{ O(n \log{\sigma}) }[/math] time, where [math]\displaystyle{ n }[/math] is the length of the string and [math]\displaystyle{ \sigma }[/math] is the size of the alphabet. With [math]\displaystyle{ n }[/math] calls to add(x), each call takes [math]\displaystyle{ O(\log{\sigma}) }[/math] amortized time. This is a result of each call to add(x) increases the depth of the current vertex (the last palindrome in the tree) by at most one, and searching all possible character edges of a vertex takes [math]\displaystyle{ O(\log{\sigma}) }[/math] time. By assigning the cost of moving up and down the tree to each call to add(x), the cost of moving up the tree more than once is 'paid for' by an equal number of calls to add(x) when moving up the tree did not occur.

Space

A palindrome tree takes [math]\displaystyle{ O(n) }[/math] space: At most [math]\displaystyle{ n+2 }[/math] vertices to store the sub-palindromes and two roots, [math]\displaystyle{ n }[/math] edges, linking the vertices and [math]\displaystyle{ n+2 }[/math] suffix edges.

Space–time tradeoff

If instead of storing only the add edges that exist for each palindrome an array of length [math]\displaystyle{ \sigma }[/math] edges is stored, finding the correct edge can be done in constant time reducing construction time to [math]\displaystyle{ O(n + p*\sigma) }[/math] while increasing space to [math]\displaystyle{ O(p*\sigma) }[/math], where [math]\displaystyle{ p }[/math] is the number of palindromes.

References

  1. Rubinchik, Mikhail; Shur, Arseny M. (2015). "Eertree: An Efficient Data Structure for Processing Palindromes in Strings". European Journal of Combinatorics. 
  2. Galil, Zvi; Seiferas, Joel (1978). "A Linear-Time On-Line Recognition Algorithm for Palstar". Journal of the ACM 25 (1): 102–111. doi:10.1145/322047.322056. 
  3. Fici, Gabriele; Gagie, Travis; Kärkkäinen, Juha; Kempa, Dominik (2014). "A subquadratic algorithm for minimum palindromic factorization". Journal of Discrete Algorithms 28: 41–48. doi:10.1016/j.jda.2014.08.001. https://www.sciencedirect.com/science/article/pii/S1570866714000525.