Unavoidable pattern

From HandWiki

In mathematics and theoretical computer science, a pattern is an unavoidable pattern if it is unavoidable on any finite alphabet.

Definitions

Pattern

Like a word, a pattern (also called term) is a sequence of symbols over some alphabet.

The minimum multiplicity of the pattern [math]\displaystyle{ p }[/math] is [math]\displaystyle{ m(p)=\min(\mathrm{count_p}(x):x \in p) }[/math] where [math]\displaystyle{ \mathrm{count_p}(x) }[/math] is the number of occurrence of symbol [math]\displaystyle{ x }[/math] in pattern [math]\displaystyle{ p }[/math]. In other words, it is the number of occurrences in [math]\displaystyle{ p }[/math] of the least frequently occurring symbol in [math]\displaystyle{ p }[/math].

Instance

Given finite alphabets [math]\displaystyle{ \Sigma }[/math] and [math]\displaystyle{ \Delta }[/math], a word [math]\displaystyle{ x\in\Sigma^* }[/math] is an instance of the pattern [math]\displaystyle{ p\in\Delta^* }[/math] if there exists a non-erasing semigroup morphism [math]\displaystyle{ f:\Delta^*\rightarrow\Sigma^* }[/math] such that [math]\displaystyle{ f(p)=x }[/math], where [math]\displaystyle{ \Sigma^* }[/math] denotes the Kleene star of [math]\displaystyle{ \Sigma }[/math]. Non-erasing means that [math]\displaystyle{ f(a)\neq\varepsilon }[/math] for all [math]\displaystyle{ a\in\Delta }[/math], where [math]\displaystyle{ \varepsilon }[/math] denotes the empty string.

Avoidance / Matching

A word [math]\displaystyle{ w }[/math] is said to match, or encounter, a pattern [math]\displaystyle{ p }[/math] if a factor (also called subword or substring) of [math]\displaystyle{ w }[/math] is an instance of [math]\displaystyle{ p }[/math]. Otherwise, [math]\displaystyle{ w }[/math] is said to avoid [math]\displaystyle{ p }[/math], or to be [math]\displaystyle{ p }[/math]-free. This definition can be generalized to the case of an infinite [math]\displaystyle{ w }[/math], based on a generalized definition of "substring".

Avoidability / Unavoidability on a specific alphabet

A pattern [math]\displaystyle{ p }[/math] is unavoidable on a finite alphabet [math]\displaystyle{ \Sigma }[/math] if each sufficiently long word [math]\displaystyle{ x\in\Sigma^* }[/math] must match [math]\displaystyle{ p }[/math]; formally: if [math]\displaystyle{ \exists n\in \Nu. \ \forall x\in\Sigma^*. \ (|x|\geq n \implies x \text{ matches } p) }[/math]. Otherwise, [math]\displaystyle{ p }[/math] is avoidable on [math]\displaystyle{ \Sigma }[/math], which implies there exist infinitely many words over the alphabet [math]\displaystyle{ \Sigma }[/math] that avoid [math]\displaystyle{ p }[/math].

By Kőnig's lemma, pattern [math]\displaystyle{ p }[/math] is avoidable on [math]\displaystyle{ \Sigma }[/math] if and only if there exists an infinite word [math]\displaystyle{ w\in \Sigma^\omega }[/math] that avoids [math]\displaystyle{ p }[/math].[1]

Maximal p-free word

Given a pattern [math]\displaystyle{ p }[/math] and an alphabet [math]\displaystyle{ \Sigma }[/math]. A [math]\displaystyle{ p }[/math]-free word [math]\displaystyle{ w\in\Sigma^* }[/math] is a maximal [math]\displaystyle{ p }[/math]-free word over [math]\displaystyle{ \Sigma }[/math] if [math]\displaystyle{ aw }[/math] and [math]\displaystyle{ wa }[/math] match [math]\displaystyle{ p }[/math] [math]\displaystyle{ \forall a\in\Sigma }[/math].

Avoidable / Unavoidable pattern

A pattern [math]\displaystyle{ p }[/math] is an unavoidable pattern (also called blocking term) if [math]\displaystyle{ p }[/math] is unavoidable on any finite alphabet.

If a pattern is unavoidable and not limited to a specific alphabet, then it is unavoidable for any finite alphabet by default. Conversely, if a pattern is said to be avoidable and not limited to a specific alphabet, then it is avoidable on some finite alphabet by default.

k-avoidable / k-unavoidable

A pattern [math]\displaystyle{ p }[/math] is [math]\displaystyle{ k }[/math]-avoidable if [math]\displaystyle{ p }[/math] is avoidable on an alphabet [math]\displaystyle{ \Sigma }[/math] of size [math]\displaystyle{ k }[/math]. Otherwise, [math]\displaystyle{ p }[/math] is [math]\displaystyle{ k }[/math]-unavoidable, which means [math]\displaystyle{ p }[/math] is unavoidable on every alphabet of size [math]\displaystyle{ k }[/math].[2]

If pattern [math]\displaystyle{ p }[/math] is [math]\displaystyle{ k }[/math]-avoidable, then [math]\displaystyle{ p }[/math] is [math]\displaystyle{ g }[/math]-avoidable for all [math]\displaystyle{ g\geq k }[/math].

Given a finite set of avoidable patterns [math]\displaystyle{ S=\{p_1,p_2,...,p_i \} }[/math], there exists an infinite word [math]\displaystyle{ w\in\Sigma^\omega }[/math] such that [math]\displaystyle{ w }[/math] avoids all patterns of [math]\displaystyle{ S }[/math].[1] Let [math]\displaystyle{ \mu(S) }[/math] denote the size of the minimal alphabet [math]\displaystyle{ \Sigma' }[/math]such that [math]\displaystyle{ \exist w' \in {\Sigma'}^\omega }[/math] avoiding all patterns of [math]\displaystyle{ S }[/math].

Avoidability index

The avoidability index of a pattern [math]\displaystyle{ p }[/math] is the smallest [math]\displaystyle{ k }[/math] such that [math]\displaystyle{ p }[/math] is [math]\displaystyle{ k }[/math]-avoidable, and [math]\displaystyle{ \infin }[/math] if [math]\displaystyle{ p }[/math] is unavoidable.[1]

Properties

  • A pattern [math]\displaystyle{ q }[/math] is avoidable if [math]\displaystyle{ q }[/math] is an instance of an avoidable pattern [math]\displaystyle{ p }[/math].[3]
  • Let avoidable pattern [math]\displaystyle{ p }[/math] be a factor of pattern [math]\displaystyle{ q }[/math], then [math]\displaystyle{ q }[/math] is also avoidable.[3]
  • A pattern [math]\displaystyle{ q }[/math] is unavoidable if and only if [math]\displaystyle{ q }[/math] is a factor of some unavoidable pattern [math]\displaystyle{ p }[/math].
  • Given an unavoidable pattern [math]\displaystyle{ p }[/math] and a symbol [math]\displaystyle{ a }[/math] not in [math]\displaystyle{ p }[/math], then [math]\displaystyle{ pap }[/math] is unavoidable.[3]
  • Given an unavoidable pattern [math]\displaystyle{ p }[/math], then the reversal [math]\displaystyle{ p^R }[/math] is unavoidable.
  • Given an unavoidable pattern [math]\displaystyle{ p }[/math], there exists a symbol [math]\displaystyle{ a }[/math] such that [math]\displaystyle{ a }[/math] occurs exactly once in [math]\displaystyle{ p }[/math].[3]
  • Let [math]\displaystyle{ n\in\Nu }[/math] represent the number of distinct symbols of pattern [math]\displaystyle{ p }[/math]. If [math]\displaystyle{ |p|\geq2^n }[/math], then [math]\displaystyle{ p }[/math] is avoidable.[3]

Zimin words

Main page: Sesquipower

Given alphabet [math]\displaystyle{ \Delta=\{x_1,x_2,...\} }[/math], Zimin words (patterns) are defined recursively [math]\displaystyle{ Z_{n+1}=Z_nx_{n+1}Z_n }[/math] for [math]\displaystyle{ n\in\Zeta^+ }[/math] and [math]\displaystyle{ Z_1=x_1 }[/math].

Unavoidability

All Zimin words are unavoidable.[4]

A word [math]\displaystyle{ w }[/math] is unavoidable if and only if it is a factor of a Zimin word.[4]

Given a finite alphabet [math]\displaystyle{ \Sigma }[/math], let [math]\displaystyle{ f(n,|\Sigma|) }[/math] represent the smallest [math]\displaystyle{ m\in\Zeta^+ }[/math] such that [math]\displaystyle{ w }[/math] matches [math]\displaystyle{ Z_n }[/math] for all [math]\displaystyle{ w\in \Sigma^m }[/math]. We have following properties:[5]

  • [math]\displaystyle{ f(1,q)=1 }[/math]
  • [math]\displaystyle{ f(2,q)=2q+1 }[/math]
  • [math]\displaystyle{ f(3,2)=29 }[/math]
  • [math]\displaystyle{ f(n,q)\leq{^{n-1}q}=\underbrace{q^{q^{\cdot^{\cdot^{q}}}}}_{n-1} }[/math]

[math]\displaystyle{ Z_n }[/math] is the longest unavoidable pattern constructed by alphabet [math]\displaystyle{ \Delta=\{x_1,x_2,...,x_n\} }[/math] since [math]\displaystyle{ |Z_n|=2^n -1 }[/math].

Pattern reduction

Free letter

Given a pattern [math]\displaystyle{ p }[/math] over some alphabet [math]\displaystyle{ \Delta }[/math], we say [math]\displaystyle{ x\in\Delta }[/math] is free for [math]\displaystyle{ p }[/math] if there exist subsets [math]\displaystyle{ A,B }[/math] of [math]\displaystyle{ \Delta }[/math] such that the following hold:

  1. [math]\displaystyle{ uv }[/math] is a factor of [math]\displaystyle{ p }[/math] and [math]\displaystyle{ u\in A }[/math][math]\displaystyle{ uv }[/math] is a factor of [math]\displaystyle{ p }[/math] and [math]\displaystyle{ v\in B }[/math]
  2. [math]\displaystyle{ x\in A\backslash B\cup B\backslash A }[/math]

For example, let [math]\displaystyle{ p=abcbab }[/math], then [math]\displaystyle{ b }[/math] is free for [math]\displaystyle{ p }[/math] since there exist [math]\displaystyle{ A=ac,B=b }[/math] satisfying the conditions above.

Reduce

A pattern [math]\displaystyle{ p\in \Delta^* }[/math] reduces to pattern [math]\displaystyle{ q }[/math] if there exists a symbol [math]\displaystyle{ x\in \Delta }[/math] such that [math]\displaystyle{ x }[/math] is free for [math]\displaystyle{ p }[/math], and [math]\displaystyle{ q }[/math] can be obtained by removing all occurrence of [math]\displaystyle{ x }[/math] from [math]\displaystyle{ p }[/math]. Denote this relation by [math]\displaystyle{ p\stackrel{x}{\rightarrow}q }[/math].

For example, let [math]\displaystyle{ p=abcbab }[/math], then [math]\displaystyle{ p }[/math] can reduce to [math]\displaystyle{ q=aca }[/math] since [math]\displaystyle{ b }[/math] is free for [math]\displaystyle{ p }[/math].

Locked

A word [math]\displaystyle{ w }[/math] is said to be locked if [math]\displaystyle{ w }[/math] has no free letter; hence [math]\displaystyle{ w }[/math] can not be reduced.[6]

Transitivity

Given patterns [math]\displaystyle{ p,q,r }[/math], if [math]\displaystyle{ p }[/math] reduces to [math]\displaystyle{ q }[/math] and [math]\displaystyle{ q }[/math] reduces to [math]\displaystyle{ r }[/math], then [math]\displaystyle{ p }[/math] reduces to [math]\displaystyle{ r }[/math]. Denote this relation by [math]\displaystyle{ p\stackrel{*}{\rightarrow}r }[/math].

Unavoidability

A pattern [math]\displaystyle{ p }[/math] is unavoidable if and only if [math]\displaystyle{ p }[/math] reduces to a word of length one; hence [math]\displaystyle{ \exist w }[/math] such that [math]\displaystyle{ |w|=1 }[/math] and [math]\displaystyle{ p\stackrel{*}{\rightarrow}w }[/math].[7][4]

Graph pattern avoidance[8]

Avoidance / Matching on a specific graph

Given a simple graph [math]\displaystyle{ G=(V,E) }[/math], a edge coloring [math]\displaystyle{ c:E\rightarrow \Delta }[/math] matches pattern [math]\displaystyle{ p }[/math] if there exists a simple path [math]\displaystyle{ P=[e_1,e_2,...,e_r] }[/math] in [math]\displaystyle{ G }[/math] such that the sequence [math]\displaystyle{ c(P)=[c(e_1),c(e_2),...,c(e_r)] }[/math] matches [math]\displaystyle{ p }[/math]. Otherwise, [math]\displaystyle{ c }[/math] is said to avoid [math]\displaystyle{ p }[/math] or be [math]\displaystyle{ p }[/math]-free.

Similarly, a vertex coloring [math]\displaystyle{ c:V\rightarrow \Delta }[/math] matches pattern [math]\displaystyle{ p }[/math] if there exists a simple path [math]\displaystyle{ P=[c_1,c_2,...,c_r] }[/math] in [math]\displaystyle{ G }[/math] such that the sequence [math]\displaystyle{ c(P) }[/math] matches [math]\displaystyle{ p }[/math].

Pattern chromatic number

The pattern chromatic number [math]\displaystyle{ \pi_p(G) }[/math] is the minimal number of distinct colors needed for a [math]\displaystyle{ p }[/math]-free vertex coloring [math]\displaystyle{ c }[/math] over the graph [math]\displaystyle{ G }[/math].

Let [math]\displaystyle{ \pi_p(n)=\max\{\pi_p(G):G\in G_n\} }[/math] where [math]\displaystyle{ G_n }[/math] is the set of all simple graphs with a maximum degree no more than [math]\displaystyle{ n }[/math].

Similarly, [math]\displaystyle{ \pi_p'(G) }[/math] and [math]\displaystyle{ \pi_p'(n) }[/math] are defined for edge colorings.

Avoidability / Unavoidability on graphs

A pattern [math]\displaystyle{ p }[/math] is avoidable on graphs if [math]\displaystyle{ \pi_p(n) }[/math] is bounded by [math]\displaystyle{ c_p }[/math], where [math]\displaystyle{ c_p }[/math] only depends on [math]\displaystyle{ p }[/math].

  • Avoidance on words can be expressed as a specific case of avoidance on graphs; hence a pattern [math]\displaystyle{ p }[/math] is avoidable on any finite alphabet if and only if [math]\displaystyle{ \pi_p(P_n) \leq c_p }[/math] for all [math]\displaystyle{ n \in\Zeta ^+ }[/math], where [math]\displaystyle{ P_n }[/math] is a graph of [math]\displaystyle{ n }[/math] vertices concatenated.

Probabilistic bound on [math]\displaystyle{ \pi_p(n) }[/math]

There exists an absolute constant [math]\displaystyle{ c }[/math], such that [math]\displaystyle{ \pi_p(n)\leq cn^{\frac{m(p)}{m(p)-1}}\leq cn^2 }[/math] for all patterns [math]\displaystyle{ p }[/math] with [math]\displaystyle{ m(p)\geq2 }[/math].[8]

Given a pattern [math]\displaystyle{ p }[/math], let [math]\displaystyle{ n }[/math] represent the number of distinct symbols of [math]\displaystyle{ p }[/math]. If [math]\displaystyle{ |p|\geq2^n }[/math], then [math]\displaystyle{ p }[/math] is avoidable on graphs.

Explicit colorings

Given a pattern [math]\displaystyle{ p }[/math] such that [math]\displaystyle{ count_p(x) }[/math] is even for all [math]\displaystyle{ x\in p }[/math], then [math]\displaystyle{ \pi_p'(K_2^k)\leq2^k-1 }[/math] for all [math]\displaystyle{ k\geq 1 }[/math], where [math]\displaystyle{ K_n }[/math] is the complete graph of [math]\displaystyle{ n }[/math] vertices.[8]

Given a pattern [math]\displaystyle{ p }[/math] such that [math]\displaystyle{ m(p)\geq2 }[/math], and an arbitrary tree [math]\displaystyle{ T }[/math], let [math]\displaystyle{ S }[/math] be the set of all avoidable subpatterns and their reflections of [math]\displaystyle{ p }[/math]. Then [math]\displaystyle{ \pi_p(T)\leq 3\mu(S) }[/math].[8]

Given a pattern [math]\displaystyle{ p }[/math] such that [math]\displaystyle{ m(p)\geq2 }[/math], and a tree [math]\displaystyle{ T }[/math] with degree [math]\displaystyle{ n\geq2 }[/math]. Let [math]\displaystyle{ S }[/math] be the set of all avoidable subpatterns and their reflections of [math]\displaystyle{ p }[/math], then [math]\displaystyle{ \pi_p'(T)\leq 2(n-1)\mu(S) }[/math].[8]

Examples

  • The Thue–Morse sequence is cube-free and overlap-free; hence it avoids the patterns [math]\displaystyle{ xxx }[/math] and [math]\displaystyle{ xyxyx }[/math].[2]
  • A square-free word is one avoiding the pattern [math]\displaystyle{ xx }[/math]. The word over the alphabet [math]\displaystyle{ \{0,\pm1\} }[/math] obtained by taking the first difference of the Thue–Morse sequence is an example of an infinite square-free word.[9][10]
  • The patterns [math]\displaystyle{ x }[/math] and [math]\displaystyle{ xyx }[/math] are unavoidable on any alphabet, since they are factors of the Zimin words.[11][1]
  • The power patterns [math]\displaystyle{ x^n }[/math] for [math]\displaystyle{ n\geq 3 }[/math] are 2-avoidable.[1]
  • All binary patterns can be divided into three categories:[1]
    • [math]\displaystyle{ \varepsilon,x,xyx }[/math] are unavoidable.
    • [math]\displaystyle{ xx,xxy,xyy,xxyx,xxyy,xyxx,xyxy,xyyx,xxyxx,xxyxy,xyxyy }[/math] have avoidability index of 3.
    • others have avoidability index of 2.
  • [math]\displaystyle{ abwbaxbcyaczca }[/math] has avoidability index of 4, as well as other locked words.[6]
  • [math]\displaystyle{ abvacwbaxbcycdazdcd }[/math] has avoidability index of 5.[12]
  • The repetitive threshold [math]\displaystyle{ RT(n) }[/math] is the infimum of exponents [math]\displaystyle{ k }[/math] such that [math]\displaystyle{ x^k }[/math] is avoidable on an alphabet of size [math]\displaystyle{ n }[/math]. Also see Dejean's theorem.

Open problems

  • Is there an avoidable pattern [math]\displaystyle{ p }[/math] such that the avoidability index of [math]\displaystyle{ p }[/math] is 6?
  • Given an arbitrarily pattern [math]\displaystyle{ p }[/math], is there an algorithm to determine the avoidability index of [math]\displaystyle{ p }[/math]?[1]

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Lothaire, M. (2002). Algebraic Combinatorics on Words. Cambridge University Press. ISBN 9780521812207. https://archive.org/details/algebraiccombina0000loth. 
  2. 2.0 2.1 (in en) Combinatorics on Words: Christoffel Words and Repetitions in Words. American Mathematical Soc.. pp. 127. ISBN 978-0-8218-7325-0. https://books.google.com/books?id=3w9eR3u8GN4C&q=Combinatorics+on+words.+Christoffel+words+and+repetitions+in+words+2009. 
  3. 3.0 3.1 3.2 3.3 3.4 Schmidt, Ursula (1987-08-01). "Long unavoidable patterns" (in en). Acta Informatica 24 (4): 433–445. doi:10.1007/BF00292112. ISSN 1432-0525. 
  4. 4.0 4.1 4.2 Zimin, A. I. (1984). "Blocking Sets of Terms" (in en). Mathematics of the USSR-Sbornik 47 (2): 353–364. doi:10.1070/SM1984v047n02ABEH002647. ISSN 0025-5734. Bibcode1984SbMat..47..353Z. 
  5. Joshua, Cooper; Rorabaugh, Danny (2013). Bounds on Zimin Word Avoidance. Bibcode2014arXiv1409.3080C. https://archive.org/details/arxiv-1409.3080. 
  6. 6.0 6.1 Baker, Kirby A.; McNulty, George F.; Taylor, Walter (1989-12-18). "Growth problems for avoidable words" (in en). Theoretical Computer Science 69 (3): 319–345. doi:10.1016/0304-3975(89)90071-6. ISSN 0304-3975. 
  7. Bean, Dwight R.; Ehrenfeucht, Andrzej; McNulty, George F. (1979). "Avoidable patterns in strings of symbols." (in en). Pacific Journal of Mathematics 85 (2): 261–294. doi:10.2140/pjm.1979.85.261. ISSN 0030-8730. https://projecteuclid.org/euclid.pjm/1102783913. 
  8. 8.0 8.1 8.2 8.3 8.4 Grytczuk, Jarosław (2007-05-28). "Pattern avoidance on graphs" (in en). Discrete Mathematics. The Fourth Caracow Conference on Graph Theory 307 (11): 1341–1346. doi:10.1016/j.disc.2005.11.071. ISSN 0012-365X. 
  9. (in en) Combinatorics on Words: Christoffel Words and Repetitions in Words. American Mathematical Soc.. pp. 97. ISBN 978-0-8218-7325-0. https://books.google.com/books?id=3w9eR3u8GN4C&q=Combinatorics+on+words.+Christoffel+words+and+repetitions+in+words+2009. 
  10. Fogg, N. Pytheas (2002-09-23) (in en). Substitutions in Dynamics, Arithmetics and Combinatorics. Springer Science & Business Media. pp. 104. ISBN 978-3-540-44141-0. https://books.google.com/books?id=qBIsuwEACAAJ&q=Substitutions+in+dynamics,+arithmetics+and+combinatorics.. 
  11. Allouche, Jean-Paul; Shallit, Jeffrey; Shallit, Professor Jeffrey (2003-07-21) (in en). Automatic Sequences: Theory, Applications, Generalizations. Cambridge University Press. pp. 24. ISBN 978-0-521-82332-6. https://books.google.com/books?id=2ZsSUStt96sC&dq=Automatic+Sequences%3A+Theory%2C+Applications%2C+Generalizations&pg=PR13. 
  12. Clark, Ronald J. (2006-04-01). "The existence of a pattern which is 5-avoidable but 4-unavoidable". International Journal of Algebra and Computation 16 (2): 351–367. doi:10.1142/S0218196706002950. ISSN 0218-1967. 
  • Allouche, Jean-Paul; Shallit, Jeffrey (2003). Automatic Sequences: Theory, Applications, Generalizations. Cambridge University Press. ISBN 978-0-521-82332-6. 
  • Berstel, Jean; Lauve, Aaron; Reutenauer, Christophe; Saliola, Franco V. (2009). Combinatorics on words. Christoffel words and repetitions in words. CRM Monograph Series. 27. Providence, RI: American Mathematical Society. ISBN 978-0-8218-4480-9. 
  • Lothaire, M. (2011). Algebraic combinatorics on words. Encyclopedia of Mathematics and Its Applications. 90. With preface by Jean Berstel and Dominique Perrin (Reprint of the 2002 hardback ed.). Cambridge University Press. ISBN 978-0-521-18071-9. 
  • Pytheas Fogg, N. (2002). Berthé, Valérie; Ferenczi, Sébastien; Mauduit, Christian et al.. eds. Substitutions in dynamics, arithmetics and combinatorics. Lecture Notes in Mathematics. 1794. Berlin: Springer-Verlag. ISBN 3-540-44141-7.