OMeta

From HandWiki

OMeta is a specialized object-oriented programming language for pattern matching, developed by Alessandro Warth and Ian Piumarta in 2007 under the Viewpoints Research Institute. The language is based on Parsing Expression Grammars (PEGs) rather than Context-Free Grammars with the intent of providing "a natural and convenient way for programmers to implement tokenizers, parsers, visitors, and tree-transformers".[1] OMeta's main goal is to allow a broader audience to use techniques generally available only to language programmers, such as parsing.[1] It is also known for its use in quickly creating prototypes, though programs written in OMeta are noted to be generally less efficient than those written in vanilla (base language) implementations, such as JavaScript.[2][3]

OMeta is noted for its use in creating domain-specific languages, and especially for the maintainability of its implementations (Newcome). OMeta, like other meta languages, requires a host language; it was originally created as a COLA implementation.[1]

Description

OMeta is a meta-language used in the prototyping and creation of domain-specific languages. It was introduced as "an object-oriented language for pattern matching".[1] It uses parsing expression grammars (descriptions of languages "based on recognizing strings instead of generating them"[4]) designed "to handle arbitrary kinds of data", such as characters, numbers, strings, atoms, and lists. This increases its versatility, enabling it to work on both structured and unstructured data.[1]

The language's main advantage over similar languages is its ability to use the same code for all steps of compiling, (e.g. lexing and parsing). OMeta also supports the defining of production rules based on arguments; this can be used to add such rules to OMeta itself, as well as the host language that OMeta is running in. Additionally, these rules can use each other as arguments, creating "higher-order rules", and inherit each other to gain production rules from existing code. OMeta is capable of using host-language booleans (True/False) while pattern matching; these are referred to as "semantic predicates". OMeta uses generalized pattern-matching to allow programmers to more easily implement and extend phases of compilation with a single tool.[1]

OMeta uses grammars to determine the rules in which it operates. The grammars are able to hold an indefinite number of variables due to the use of an __init__ function called when a grammar is created. Grammars can inherit as well as call each other (using the "foreign production invocation mechanism", enabling grammars to "borrow" each other's input streams), much like classes in full programming languages.[1] OMeta also prioritizes options within a given grammar in order to remove ambiguity, unlike most meta-languages. After pattern-matching an input to a given grammar, OMeta then assigns each component of the pattern to a variable, which it then feeds into the host language.[5]

OMeta uses pattern matching in order to accomplish all of the steps of traditional compiling by itself. It first finds patterns in characters to create tokens, then it matches those tokens to its grammar to make syntax trees. Typecheckers then match patterns on the syntax trees to make annotated trees, and visitors do the same to produce other trees. A code generator then pattern-matches the trees to produce the code.[3] In OMeta, it is easy to "traverse through the parse tree since such functionality is natively supported".[3]

The meta-language is noted for its usability in most programming languages, though it is most commonly used in its language of implementation—OMeta/JS, for example, is used in JavaScript.[5] Because it requires a host language, the creators of OMeta refer to it as a "parasitic language".[6]

Development

Alessandro Warth and Ian Piumarta developed OMeta at the Viewpoints Research Institute, an organization intended to improve research systems and personal computing, in 2007. They first used a Combined Object Lambda Architecture, or COLA (a self-describing language investigated at Viewpoints Research Institute) as OMeta's host language, and later, assisted by Yoshiki Ohshima, ported it to Squeak Smalltalk to verify its usability with multiple host languages. OMeta was also used "to implement a nearly complete subset of…Javascript" as a case study in its introductory paper.[1]

Usage

OMeta, like other meta languages, is primarily used to create domain-specific languages (DSLs in short); specifically, it is used to quickly prototype DSLs — OMeta's slow running speed and unclear error reports remove much of its functionality as a full programming language (Heirbaut 73–74). OMeta is useful thanks to its ability to use one syntax for every phase of compiling, allowing it to be used rather than several separate tools in the creation of a compiler.[5] Additionally, OMeta is valued both for the speed at which it can be used to create DSLs and the significantly lower amount of code it requires to perform such a task as opposed to vanilla implementations, with reports showing around 26% as many lines of functional code as vanilla.[2]

Examples

The following is an example of a basic calculator language in C# using OMeta:

ometa BasicCalc <: Parser
 {
  Digit  = super:d                    -> d.ToDigit(),
  Number = Number:n Digit:D           -> (n * 10 + d)
         | Digit,
  AddExpr = AddExpr:x ‘+’ MulExpr:y  -> (x + y)
          | AddExpr:x ‘-’ MulExpr:y  -> (x - y)
          | MulExpr,
  MulExpr = MulExpr:x ‘*’ primExpr:y -> (x * y)
          | MulExpr:x ‘/’ primExpr:y -> (x / y)
          | PrimExpr,
 PrimExpr = ‘(‘ Expr:x ‘)’		-> x
          | Number,
     Expr = AddExpr
 }

[5]

It is also possible to create subclasses of languages you have written:

ometa ExponentCalc <: BasicCalc
 {
   MulExpr = MulExpr:x ‘^’ PrimExpr:e -> Math.pow(x,e)
           | super
 }

[5]

Previously written languages can also be called rather than inherited:

ometa ScientificCalc <: Parser
 {
       MathFunc :n = Token(n) Spaces,
   AdvExp          = MathFunc(‘sqrt’) AdvExp:x -> Math.Sqrt(x)
                   | FacExp
   FacExp          = PrimExp:x ‘!’
                       ->  {
                                 var r = 1;
                                 for(; x > 1; x--)
                                 {
                                   r *= x;
                                 }
                                 return r;
                           }
                   | PrimExp
   PrimExp         = foreign(ExponentCalc.Expr):x -> x
   Expr     = AdvExp
 }

[5]

Versions

OMeta can theoretically be implemented into any host language, but it is used most often as OMeta/JS, a JavaScript implementation.[5] Warth has stated that patterns in "OMeta/X---where X is some host language" are better left to be influenced by "X" than standardized within OMeta, due to the fact that different host languages recognize different types of objects.[6]

MetaCOLA

MetaCOLA was the first implementation of OMeta, used in the language's introductory paper. MetaCOLA implemented OMeta's first test codes, and was one of the three forms (the others being OMeta/Squeak and a nearly-finished OMeta/JS) of the language made prior to its release.[1]

OMeta/Squeak

OMeta/Squeak was a port of OMeta used during the initial demonstration of the system. OMeta/Squeak is used "to experiment with alternative syntaxes for the Squeak EToys system" OMeta/Squeak requires square brackets and "pointy brackets" (braces) in rule operations, unlike OMeta/JS, which requires only square brackets.[6] OMeta/Squeak 2, however, features syntax more similar to that of OMeta/JS.[7] Unlike the COLA implementation of OMeta, the Squeak version does not memorize intermediate results (store numbers already used in calculation).[1]

OMeta/JS

OMeta/JS is OMeta in the form of a JavaScript implementation. Language implementations using OMeta/JS are noted to be easier to use and more space-efficient than those written using only vanilla JavaScript, but the former have been shown to perform much more slowly. Because of this, OMeta/JS is seen as a highly useful tool for prototyping, but is not preferred for production language implementations.[3]

Vs. JavaScript

The use of DSL development tools, such as OMeta, are considered much more maintainable than "vanilla implementations" (i. e. JavaScript) due to their low NCLOC (Non-Comment Lines of Code) count. This is due in part to the "semantic action code which creates the AST objects or performs limited string operations". OMeta's lack of "context-free syntax" allows it to be used in both parser and lexer creation at the cost of extra lines of code. Additional factors indicating OMeta's maintainability include a high maintainability index "while Halstead Effort indicate[s] that the vanilla parser requires three times more development effort compared to the OMeta parser". Like JavaScript, OMeta/JS supports "the complete syntax notation of Waebric".[3]

One of the major advantages of OMeta responsible for the difference in NCLOC is OMeta's reuse of its "tree walking mechanism" by allowing the typechecker to inherit the mechanism from the parser, which causes the typechecker to adapt to changes in the OMeta parser, while JavaScript's tree walking mechanism contains more code and must be manually adapted to the changes in the parser. Another is the fact that OMeta's grammars have a "higher abstraction level...than the program code". It can also be considered "the result of the semantic action code which creates the AST objects or performs limited string operations", though the grammar's non-semantics create a need for relatively many lines of code per function because of explicit whitespace definition—a mechanism implemented to allow OMeta to act as a single tool for DSL creation.[3]

In terms of performance, OMeta is found to run at slow speeds in comparison to vanilla implementations. The use of backtracking techniques by OMeta is a potential major cause for this (OMeta's parser "includes seven look-ahead operators...These operators are necessary to distinguish certain rules from each other and cannot be left out of the grammar"); however, it is more likely that this performance drop is due to OMeta's method of memoization:

"The storage of intermediate parsing steps causes the size of the parsing table to be proportional with the number of terminals and non-terminals (operands) used in the grammar. Since the grammar of the OMeta parser contains 446 operands, it is believed that performance is affected negatively.".[3]

Where OMeta gains time on the vanilla implementation, however, is in lexing. JavaScript's vanilla lexer slows down significantly due to a method by which the implementation converts the entire program into a string through Java before the lexer starts. Despite this, the OMeta implementation runs significantly slower overall.[3]

OMeta also falls behind in terms of error reporting. While vanilla implementations return the correct error message in about "92% of the test cases" in terms of error location, OMeta simply returns "Match failed!" to any given error. Finding the source through OMeta requires "manually...counting the newline characters in the semantic action code in order to output at least the line number at which parsing fails".[3]

OMeta#

OMeta# is a project by Jeff Moser meant to translate OMeta/JS into a C# functionality; as such, the design of OMeta# is based on Alessandro Warth's OMeta/JS design.. The goal of the project is to give users the ability to make working languages with high simplicity. Specifically, OMeta# is intended to work as a single tool for .NET language development, reduce the steep learning curve of language development, become a useful teaching resource, and be practical for use in real applications.[5] OMeta# currently uses C# 3.0 as OMeta's host language rather than 4.0; because C# 3.0 is a static language rather than a dynamic one, recognition of the host language within OMeta# is "two to three times uglier and larger than it might have been" in a dynamically typed language.[8]

OMeta# uses .NET classes, or Types, as grammars and methods for the grammars’ internal "rules". OMeta# uses braces ( { and } ) to recognize its host language in grammars. The language has a focus on strong, clean, static typing much like that of its host language, though this adds complexity to the creation of the language. New implementations in C# must also be compatible with the .NET meta-language, making the creation even more complex. Additionally, to prevent users from accidentally misusing the metarules in OMeta#, Moser has opted to implement them as "an explicit interface exposed via a property (e.g. instead of "_apply", I have "MetaRules.Apply")." Later parts of OMeta# are written in the language itself, though the functionality of the language remains fairly tied to C#.[9] The OMeta# source code is posted on Codeplex, and is intended to remain as an open-source project. However, updates have been on indefinite hiatus since shortly after the project's beginnings, with recommits by the server on October 1, 2012.[5]

IronMeta

Gordon Tisher created IronMeta for .NET in 2009, and while similar to OMeta#, it's a much more supported and robust implementation, distributed under BSD license on GitHub.

Ohm

Ohm is a successor to Ometa that aims to improve on it by (amongst other things) separating the grammar from the semantic actions.[10]

See also

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 Warth, Alessandro, and Ian Piumarta. "OMeta: An Object-Oriented Language for Pattern Matching." ACM SIGPLAN 2007 Dynamic Languages Symposium (DLS '07). 03rd ed. Vol. TR-2007. Glendale, CA: Viewpoints Research Institute, 2007. VPRI Technical Report. Web. 30 Sept. 2013.
  2. 2.0 2.1 Klint, Paul, Tijs Van Der Storm, and Jurgen Vinju. "On the Impact of DSL Tools on the Maintainability of Language Implementations." LDTA '10 Proceedings of the Tenth Workshop on Language Descriptions, Tools and Applications. New York, NY. N.p., 2010. Web. 30 Sept. 2013.
  3. 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Heirbaut, Nickolas. "Two Implementation Techniques for Domain Specific Languages Compared: OMeta/JS vs. Javascript." Thesis. University of Amsterdam, 2009. Web. 30 Sept. 2013.<http://dare.uva.nl/document/153293>.
  4. Mascarenhas, Fabio, Sergio Medeiros, and Roberto Ierusalimschy. Parsing Expression Grammars for Structured Data. N.p.: n.p., n.d. Web.<http://www.lbd.dcc.ufmg.br/colecoes/sblp/2011/003.pdf>.
  5. 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 Moser, Jeff. "Moserware." : OMeta#: Who? What? When? Where? Why?, Blogger, 24 June 2008. Web. 30 Sept. 2013.
  6. 6.0 6.1 6.2 Warth, Alessandro. "[Ometa] On OMeta's Syntax." [Ometa] On OMeta's Syntax. N.p., 4 July 2008. Web. 16 Oct. 2013.<http://vpri.org/pipermail/ometa/2008-July/000051.html>.
  7. Warth, Alessandro. "OMeta/Squeak 2." OMeta/Squeak 2. N.p., n.d. Web. 16 Oct. 2013.<http://tinlizzie.org/ometa/ometa2.html>.
  8. Moser, Jeff. "Moserware." : Meta-FizzBuzz, Blogger, 25 August 2008. Web. 30 Sept. 2013.
  9. Moser, Jeff. "Moserware." : Building an Object-Oriented Parasitic Metalanguage Blogger, 31 July 2008. Web. 30 Sept. 2013.
  10. "Ohm Philosophy". https://github.com/cdglabs/ohm/blob/master/doc/philosophy.md.