Maximal pair

From HandWiki

In computer science, a maximal pair within a string is a pair of matching substrings that are maximal, where "maximal" means that it is not possible to make a longer matching pair by extending the range of both substrings to the left or right.

Example

Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Character x a b c y a b c w a b c y z

For example, in this table, the substrings at indices 2 to 4 (in red) and indices 6 to 8 (in blue) are a maximal pair, because they contain identical characters (abc), and they have different characters to the left (x at index 1 and y at index 5) and different characters to the right (y at index 5 and w at index 9). Similarly, the substrings at indices 6 to 8 (in blue) and indices 10 to 12 (in green) are a maximal pair.

However, the substrings at indices 2 to 4 (in red) and indices 10 to 12 (in green) are not a maximal pair, as the character y follows both substrings, and so they can be extended to the right to make a longer pair.

Formal definition

Formally, a maximal pair of substrings with starting positions [math]\displaystyle{ p_1 }[/math] and [math]\displaystyle{ p_2 }[/math] respectively, and both of length [math]\displaystyle{ l }[/math], is specified by a triple [math]\displaystyle{ (p_1, p_2, l) }[/math], such that, given a string [math]\displaystyle{ S }[/math] of length [math]\displaystyle{ n }[/math], [math]\displaystyle{ S[p_1..p_1+l-1]=S[p_2..p_2+l-1] }[/math] (meaning that the substrings have identical contents), but [math]\displaystyle{ S[p_1-1] \neq S[p_2-1] }[/math] (they have different characters to their left) and [math]\displaystyle{ S[p_1+l] \neq S[p_2+l] }[/math] (they also have different characters to their right; together, these two inequalities are the condition for being maximal). Thus, in the example above, the maximal pairs are [math]\displaystyle{ (2,6,3) }[/math] (the red and blue substrings) and [math]\displaystyle{ (6,10,3) }[/math] (the green and blue substrings), and [math]\displaystyle{ (2,10,3) }[/math] is not a maximal pair.

Related concepts and time complexity

A maximal repeat is the string represented by a maximal pair. A supermaximal repeat is a maximal repeat never occurring as a proper substring of another maximal repeat. In the above example, abc and abcy are both maximal repeats, but only abcy is a supermaximal repeat.

Maximal pairs, maximal repeats and supermaximal repeats can each be found in [math]\displaystyle{ \Theta(n+z) }[/math] time using a suffix tree,[1] if there are [math]\displaystyle{ z }[/math] such structures.

References

External links