Square-free word

From HandWiki

In combinatorics, a squarefree word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form XX, where X is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern XX.

Finite squarefree words

Binary alphabet

Over a binary alphabet [math]\displaystyle{ \{0,1\} }[/math], the only squarefree words are the empty word [math]\displaystyle{ \epsilon,0,1,01,10,010 }[/math], and [math]\displaystyle{ 101 }[/math].

Ternary alphabet

Over a ternary alphabet [math]\displaystyle{ \{0,1,2\} }[/math], there are infinitely many squarefree words. It is possible to count the number [math]\displaystyle{ c(n) }[/math] of ternary squarefree words of length n.

The number of ternary squarefree words of length n[1]
n 0 1 2 3 4 5 6 7 8 9 10 11 12
[math]\displaystyle{ c(n) }[/math] 1 3 6 12 18 30 42 60 78 108 144 204 264

This number is bounded by [math]\displaystyle{ c(n) = \Theta(\alpha^n) }[/math], where [math]\displaystyle{ 1.3017597 \lt \alpha \lt 1.3017619 }[/math].[2] The upper bound on [math]\displaystyle{ \alpha }[/math] can be found via Fekete's Lemma and approximation by automata. The lower bound can be found by finding a substitution that preserves squarefreeness.[2]

Alphabet with more than three letters

Since there are infinitely many squarefree words over three-letter alphabets, this implies there are also infinitely many squarefree words over an alphabet with more than three letters.

The following table shows the exact growth rate of the k-ary squarefree words:

The growth rate of the k-ary squarefree words[2]
alphabet size (k) 4 5 6 7 8 9
growth rate 2.6215080 3.7325386 4.7914069 5.8284661 6.8541173 7.8729902
alphabet size (k) 10 11 12 13 14 15
growth rate 8.8874856 9.8989813 10.9083279 11.9160804 12.9226167 13.9282035

2-dimensional words

Consider a map [math]\displaystyle{ \textbf{w} }[/math] from [math]\displaystyle{ \mathbb{N}^2 }[/math] to A, where A is an alphabet and [math]\displaystyle{ \textbf{w} }[/math] is called a 2-dimensional word. Let [math]\displaystyle{ w_{m,n} }[/math] be the entry [math]\displaystyle{ \textbf{w}(m,n) }[/math]. A word [math]\displaystyle{ \textbf{x} }[/math] is a line of [math]\displaystyle{ \textbf{w} }[/math] if there exists [math]\displaystyle{ i_1,i_2,j_1, j_2 }[/math]such that [math]\displaystyle{ \text{gcd}(j_1, j_2) = 1 }[/math], and for [math]\displaystyle{ t \ge 0, x_t = w_{{i_1}+{j_1t},{i_2}+{j_2t}} }[/math].[3]

Carpi[4] proves that there exists a 2-dimensional word [math]\displaystyle{ \textbf{w} }[/math] over a 16-letter alphabet such that every line of [math]\displaystyle{ \textbf{w} }[/math] is squarefree. A computer search shows that there are no 2-dimensional words [math]\displaystyle{ \textbf{w} }[/math]over a 7-letter alphabet, such that every line of [math]\displaystyle{ \textbf{w} }[/math] is squarefree.

Generating finite squarefree words

Shur[5] proposes an algorithm called R2F (random-t(w)o-free) that can generate a squarefree word of length n over any alphabet with three or more letters. This algorithm is based on a modification of entropy compression: it randomly selects letters from a k-letter alphabet to generate a [math]\displaystyle{ (k+1) }[/math]-ary squarefree word.

algorithm R2F is
    input: alphabet size [math]\displaystyle{ k \ge 2 }[/math],
           word length [math]\displaystyle{ n \gt  1 }[/math]
    output: a [math]\displaystyle{ (k+1) }[/math]-ary squarefree word wof length n.

    (Note that [math]\displaystyle{ \color{gray}\Sigma_{k+1} }[/math] is the alphabet with letters [math]\displaystyle{ \color{gray}\{1,...,k+1\} }[/math].)
    (For a word [math]\displaystyle{ \color{gray}w \in \Sigma_k }[/math], [math]\displaystyle{ \color{gray}\chi_w }[/math] is the permutation of [math]\displaystyle{ \color{gray}\Sigma_k }[/math] such that a precedes b in [math]\displaystyle{ \color{gray}\chi_w }[/math] if the 
     right most position of a in w is to the right of the rightmost position of b in w.
     For example,  [math]\displaystyle{ \color{gray}w=136263163\in \Sigma_6 }[/math] has [math]\displaystyle{ \color{gray}\chi_w=361245 }[/math].)

    choose [math]\displaystyle{ w[1] }[/math] in [math]\displaystyle{ \Sigma_{k+1} }[/math] uniformly at random 
    set [math]\displaystyle{ \chi_w }[/math] to [math]\displaystyle{ w[1] }[/math] followed by all other letters of [math]\displaystyle{ \Sigma_{k+1} }[/math] in increasing order
    set the number N of iterations to 0

    while [math]\displaystyle{ |w| \lt  n }[/math] do
        choose j in [math]\displaystyle{ \Sigma_{k} }[/math] uniformly at random
        append [math]\displaystyle{ a = \chi_w[j+1] }[/math] to the end of w
        update [math]\displaystyle{ \chi_w }[/math] shifting the first j elements to the right and setting [math]\displaystyle{ \chi_w[1] = a }[/math]
        increment N by 1
        if w ends with a square of rank r then
            delete the last r letters of w

    return w

Every (k+1)-ary squarefree word can be the output of Algorithm R2F, because on each iteration it can append any letter except for the last letter of w.

The expected number of random k-ary letters used by Algorithm R2F to construct a [math]\displaystyle{ (k+1) }[/math]-ary squarefree word of length n is[math]\displaystyle{ N=n\left(1 + \frac 2 {k^2} + \frac 1 {k^3} + \frac 4 {k^4} + O\left(\frac 1 {k^5}\right)\right)+O(1). }[/math]Note that there exists an algorithm that can verify the squarefreeness of a word of length n in [math]\displaystyle{ O(n \log n) }[/math] time. Apostolico and Preparata[6] give an algorithm using suffix trees. Crochemore[7] uses partitioning in his algorithm. Main and Lorentz[8] provide an algorithm based on the divide-and-conquer method. A naive implementation may require [math]\displaystyle{ O(n^2) }[/math] time to verify the squarefreeness of a word of length n.

Infinite squarefree words

There exist arbitrarily long squarefree words in any alphabet with three or more letters, as proved by Axel Thue.[9]

Examples

First difference of the Thue–Morse sequence

One example of an infinite squarefree word over an alphabet of size 3 is the word over the alphabet [math]\displaystyle{ \{-1,0,+1\} }[/math] obtained by taking the first difference of the Thue–Morse sequence.[9] That is, from the Thue–Morse sequence

[math]\displaystyle{ 0, 1 ,1, 0 ,1 ,0 ,0 ,1, 1 ,0 ,0 ,1, 0 ,1 ,1, 0 ... }[/math]

one forms a new sequence in which each term is the difference of two consecutive terms of the Thue–Morse sequence. The resulting squarefree word is

[math]\displaystyle{ 1,0,-1,1,-1,0,1,0,-1,0,1,-1,1,0,-1,... }[/math](sequence A029883 in the OEIS).

Leech's morphism

Another example found by John Leech[10] is defined recursively over the alphabet [math]\displaystyle{ \{0,1,2\} }[/math]. Let [math]\displaystyle{ w_1 }[/math] be any squarefree word starting with the letter 0. Define the words [math]\displaystyle{ \{w_i \mid i \in \mathbb{N} \} }[/math] recursively as follows: the word [math]\displaystyle{ w_{i+1} }[/math] is obtained from [math]\displaystyle{ w_i }[/math] by replacing each 0 in [math]\displaystyle{ w_i }[/math] with 0121021201210, each 1 with 1202102012021, and each 2 with 2010210120102. It is possible to prove that the sequence converges to the infinite squarefree word

0121021201210120210201202120102101201021202102012021...

Generating infinite squarefree words

Infinite squarefree words can be generated by squarefree morphism. A morphism is called squarefree if the image of every squarefree word is squarefree. A morphism is called k–squarefree if the image of every squarefree word of length k is squarefree.

Crochemore[11] proves that a uniform morphism h is squarefree if and only if it is 3-squarefree. In other words, h is squarefree if and only if [math]\displaystyle{ h(w) }[/math] is squarefree for all squarefree w of length 3. It is possible to find a squarefree morphism by brute-force search.

algorithm squarefree_morphism is
    output: a squarefree morphism with the lowest possible rank k.

    set [math]\displaystyle{ k = 3 }[/math]
    while True do
        set k_sf_words to the list of all squarefree words of length k over a ternary alphabet
        for each [math]\displaystyle{ h(0) }[/math] in k_sf_words do
            for each [math]\displaystyle{ h(1) }[/math] in k_sf_words do
                for each [math]\displaystyle{ h(2) }[/math] in k_sf_words do
                    if [math]\displaystyle{ h(1) = h(2) }[/math] then
                        break from the current loop (advance to next [math]\displaystyle{ h(1) }[/math])
                    if [math]\displaystyle{ h(0) \ne h(1) }[/math] and [math]\displaystyle{ h(2) \ne h(0) }[/math] then
                        if [math]\displaystyle{ h(w) }[/math] is squarefree for all squarefree w of length 3 then
                            return [math]\displaystyle{ h(0), h(1), h(2) }[/math]
        increment k by 1

Over a ternary alphabet, there are exactly 144 uniform squarefree morphisms of rank 11 and no uniform squarefree morphisms with a lower rank than 11.

To obtain an infinite squarefree words, start with any squarefree word such as 0, and successively apply a squarefree morphism h to it. The resulting words preserve the property of squarefreeness. For example, let h be a squarefree morphism, then as [math]\displaystyle{ w \to \infty }[/math], [math]\displaystyle{ h^{w}(0) }[/math] is an infinite squarefree word.

Note that, if a morphism over a ternary alphabet is not uniform, then this morphism is squarefree if and only if it is 5-squarefree.[11]

Letter combinations in squarefree words

Extending a squarefree word to avoid ab.

Avoid two-letter combinations

Over a ternary alphabet, a squarefree word of length more than 13 contains all the squarefree two-letter combinations.[12]

This can be proved by constructing a squarefree word without the two-letter combination ab. As a result, bcbacbcacbaca is the longest squarefree word without the combination ab and its length is equal to 13.

Note that over a more than three-letter alphabet there are squarefree words of any length without an arbitrary two-letter combination.

Avoid three-letter combinations

Over a ternary alphabet, a squarefree word of length more than 36 contains all the squarefree three-letter combinations.[12]

However, there are squarefree words of any length without the three-letter combination aba.

Note that over a more than three-letter alphabet there are squarefree words of any length without an arbitrary three-letter combination.

Density of a letter

The density of a letter a in a finite word w is defined as [math]\displaystyle{ \frac{|w|_a}{|w|} }[/math] where [math]\displaystyle{ |w|_a }[/math] is the number of occurrences of a in [math]\displaystyle{ w }[/math] and [math]\displaystyle{ |w| }[/math] is the length of the word. The density of a letter a in an infinite word is [math]\displaystyle{ \liminf_{l\to \infty}\frac{|w_l|_a}{|w_l|} }[/math] where [math]\displaystyle{ w_l }[/math] is the prefix of the word w of length l.[13]

The minimal density of a letter a in an infinite ternary squarefree word is equal to [math]\displaystyle{ \frac{883}{3215} }[/math].[13]

The maximum density of a letter a in an infinite ternary squarefree word is equal to [math]\displaystyle{ \frac{255}{653} }[/math].[14]

Notes

  1. "A006156 - OEIS". https://oeis.org/A006156. 
  2. 2.0 2.1 2.2 Shur, Arseny (2011). "Growth properties of power-free languages". Computer Science Review 6 (5–6): 28–43. doi:10.1016/j.cosrev.2012.09.001. 
  3. Berthe, Valerie; Rigo, Michel, eds. (2016), "Preface", Combinatorics, Words and Symbolic Dynamics, Cambridge University Press, pp. xi–xviii, doi:10.1017/cbo9781139924733.001, ISBN 9781139924733 
  4. Carpi, Arturo (1988). "Multidimensional unrepetitive configurations". Theoretical Computer Science 56 (2): 233–241. doi:10.1016/0304-3975(88)90080-1. ISSN 0304-3975. 
  5. Shur, Arseny (2015). "Generating square-free words efficiently". Theoretical Computer Science 601: 67–72. doi:10.1016/j.tcs.2015.07.027. 
  6. Apostolico, A.; Preparata, F.P. (Feb 1983). "Optimal off-line detection of repetitions in a string". Theoretical Computer Science 22 (3): 297–315. doi:10.1016/0304-3975(83)90109-3. ISSN 0304-3975. 
  7. Crochemore, Max (Oct 1981). "An optimal algorithm for computing the repetitions in a word". Information Processing Letters 12 (5): 244–250. doi:10.1016/0020-0190(81)90024-7. ISSN 0020-0190. 
  8. Main, Michael G; Lorentz, Richard J (Sep 1984). "An O(n log n) algorithm for finding all repetitions in a string". Journal of Algorithms 5 (3): 422–432. doi:10.1016/0196-6774(84)90021-x. ISSN 0196-6774. 
  9. 9.0 9.1 Berstel, Jean (1994). Axel Thue's papers on repetitions in words a translation. Départements de mathématiques et d'informatique, Université du Québec à Montréal. ISBN 978-2892761405. OCLC 494791187. 
  10. Leech, J. (1957). "A problem on strings of beads". Math. Gaz. 41: 277–278. doi:10.1017/S0025557200236115. 
  11. 11.0 11.1 Berstel, Jean (April 1984). "Some Recent Results on Squarefree Words". Annual Symposium on Theoretical Aspects of Computer Science. Lecture Notes in Computer Science 166: 14–25. doi:10.1007/3-540-12920-0_2. ISBN 978-3-540-12920-2. http://www.numdam.org/item/PDML_1985___2B_21_0/. 
  12. 12.0 12.1 Zolotov, Boris (2015). "Another Solution to the Thue Problem of Non-Repeating Words". arXiv:1505.00019 [math.CO].
  13. 13.0 13.1 Khalyavin, Andrey (2007). "The minimal density of a letter in an infinite ternary square-free word is 883/3215". Journal of Integer Sequences 10 (2): 3. Bibcode2007JIntS..10...65K. http://emis.de/journals/JIS/VOL10/Khalyavin/khalyavin13.pdf. 
  14. Ochem, Pascal (2007). "Letter frequency in infinite repetition-free words". Theoretical Computer Science 380 (3): 388–392. doi:10.1016/j.tcs.2007.03.027. ISSN 0304-3975. 

References