Partial word

From HandWiki
Short description: Computer science string term

In computer science and the study of combinatorics on words, a partial word is a string that may contain a number of "do not know" or "do not care" symbols i.e. placeholders in the string where the symbol value is not known or not specified. More formally, a partial word is a partial function [math]\displaystyle{ u: \{ 0, \ldots, n-1 \} \rightarrow A }[/math] where [math]\displaystyle{ A }[/math] is some finite alphabet. If u(k) is not defined for some [math]\displaystyle{ k \in \{ 0, \ldots, n-1 \} }[/math] then the unknown element at place k in the string is called a "hole". In regular expressions (following the POSIX standard) a hole is represented by the metacharacter ".". For example, aab.ab.b is a partial word of length 8 over the alphabet A ={a,b} in which the fourth and seventh characters are holes.[1]

Algorithms

Several algorithms have been developed for the problem of "string matching with don't cares", in which the input is a long text and a shorter partial word and the goal is to find all strings in the text that match the given partial word.[2][3][4]

Applications

A compatibility graph of partial words

Two partial words are said to be compatible when they have the same length and when every position that is a non-wildcard in both of them has the same character in both. If one forms an undirected graph with a vertex for each partial word in a collection of partial words, and an edge for each compatible pair, then the cliques of this graph come from sets of partial words that all match at least one common string. This graph-theoretical interpretation of compatibility of partial words plays a key role in the proof of hardness of approximation of the clique problem, in which a collection of partial words representing successful runs of a probabilistically checkable proof verifier has a large clique if and only if there exists a valid proof of an underlying NP-complete problem.[5]

The faces (subcubes) of an [math]\displaystyle{ n }[/math]-dimensional hypercube can be described by partial words of length [math]\displaystyle{ n }[/math] over a binary alphabet, whose symbols are the Cartesian coordinates of the hypercube vertices (e.g., 0 or 1 for a unit cube). The dimension of a subcube, in this representation, equals the number of don't-care symbols it contains. The same representation may also be used to describe the implicants of Boolean functions.[6]

Related concepts

Partial words may be generalized to parameter words, in which some of the "do not know" symbols are marked as being equal to each other. A partial word is a special case of a parameter word in which each do not know symbol may be substituted by a character independently of all of the other ones.[7]

References

  1. Blanchet-Sadri, Francine (2008), Algorithmic Combinatorics on Partial Words, Discrete Mathematics and its Applications, Boca Raton, Florida: Chapman & Hall/CRC, ISBN 978-1-4200-6092-8 
  2. "Efficient string matching with don't-care patterns", Combinatorial algorithms on words (Maratea, 1984), NATO Adv. Sci. Inst. Ser. F Comput. Systems Sci., 12, Springer, Berlin, 1985, pp. 11–29 
  3. "An algorithm for string matching with a sequence of don't cares", Information Processing Letters 37 (3): 133–136, 1991, doi:10.1016/0020-0190(91)90032-D 
  4. Kalai, Adam (2002), "Efficient pattern-matching with don't cares", Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 6-8, 2002, San Francisco, CA, USA, ACM and SIAM, pp. 655–656, https://dl.acm.org/citation.cfm?id=545381.545468 
  5. Feige, U.; Goldwasser, S.; Lovász, L.; Safra, S; Szegedy, M. (1991), "Approximating clique is almost NP-complete", Proc. 32nd IEEE Symp. on Foundations of Computer Science, pp. 2–12, doi:10.1109/SFCS.1991.185341, ISBN 0-8186-2445-0 
  6. "The map method for synthesis of combinational logic circuits", Transactions of the American Institute of Electrical Engineers, Part I: Communication and Electronics 1953 (5): 593–599, 1953, doi:10.1109/TCE.1953.6371932 
  7. Prömel, Hans Jürgen (2002), "Large numbers, Knuth's arrow notation, and Ramsey theory", Synthese 133 (1–2): 87–105, doi:10.1023/A:1020879709125