Maximal pair
In computer science, a maximal pair within a string is a pair of matching substrings that are maximal, where "maximal" means that it is not possible to make a longer matching pair by extending the range of both substrings to the left or right.
Example
Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
Character | x | a | b | c | y | a | b | c | w | a | b | c | y | z |
For example, in this table, the substrings at indices 2 to 4 (in red) and indices 6 to 8 (in blue) are a maximal pair, because they contain identical characters (abc
), and they have different characters to the left (x
at index 1 and y
at index 5) and different characters to the right (y
at index 5 and w
at index 9). Similarly, the substrings at indices 6 to 8 (in blue) and indices 10 to 12 (in green) are a maximal pair.
However, the substrings at indices 2 to 4 (in red) and indices 10 to 12 (in green) are not a maximal pair, as the character y
follows both substrings, and so they can be extended to the right to make a longer pair.
Formal definition
Formally, a maximal pair of substrings with starting positions [math]\displaystyle{ p_1 }[/math] and [math]\displaystyle{ p_2 }[/math] respectively, and both of length [math]\displaystyle{ l }[/math], is specified by a triple [math]\displaystyle{ (p_1, p_2, l) }[/math], such that, given a string [math]\displaystyle{ S }[/math] of length [math]\displaystyle{ n }[/math], [math]\displaystyle{ S[p_1..p_1+l-1]=S[p_2..p_2+l-1] }[/math] (meaning that the substrings have identical contents), but [math]\displaystyle{ S[p_1-1] \neq S[p_2-1] }[/math] (they have different characters to their left) and [math]\displaystyle{ S[p_1+l] \neq S[p_2+l] }[/math] (they also have different characters to their right; together, these two inequalities are the condition for being maximal). Thus, in the example above, the maximal pairs are [math]\displaystyle{ (2,6,3) }[/math] (the red and blue substrings) and [math]\displaystyle{ (6,10,3) }[/math] (the green and blue substrings), and [math]\displaystyle{ (2,10,3) }[/math] is not a maximal pair.
Related concepts and time complexity
A maximal repeat is the string represented by a maximal pair. A supermaximal repeat is a maximal repeat never occurring as a proper substring of another maximal repeat. In the above example, abc
and abcy
are both maximal repeats, but only abcy
is a supermaximal repeat.
Maximal pairs, maximal repeats and supermaximal repeats can each be found in [math]\displaystyle{ \Theta(n+z) }[/math] time using a suffix tree,[1] if there are [math]\displaystyle{ z }[/math] such structures.
References
- ↑ Gusfield, Dan (1999) [1997]. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. USA: Cambridge University Press. p. 143. ISBN 0-521-58519-8. https://archive.org/details/algorithmsonstri00gusf_891.
External links
- Project for the computation of all maximal repeats in one ore more strings in Python, using suffix array.
Original source: https://en.wikipedia.org/wiki/Maximal pair.
Read more |