E-graph
In computer science, an e-graph is a data structure that stores an equivalence relation over terms of some language.
Definition and operations
Let [math]\displaystyle{ \Sigma }[/math] be a set of uninterpreted functions, where [math]\displaystyle{ \Sigma_n }[/math] is the subset of [math]\displaystyle{ \Sigma }[/math] consisting of functions of arity [math]\displaystyle{ n }[/math]. Let [math]\displaystyle{ \mathbb{id} }[/math] be a countable set of opaque identifiers that may be compared for equality, called e-class IDs. The application of [math]\displaystyle{ f\in\Sigma_n }[/math] to e-class IDs [math]\displaystyle{ i_1, i_2, \ldots, i_n\in\mathbb{id} }[/math] is denoted [math]\displaystyle{ f(i_1, i_2, \ldots, i_n) }[/math] and called an e-node.
The e-graph then represents equivalence classes of e-nodes, using the following data structures:[1]
- A union-find structure [math]\displaystyle{ U }[/math] representing equivalence classes of e-class IDs, with the usual operations [math]\displaystyle{ \mathrm{find} }[/math], [math]\displaystyle{ \mathrm{add} }[/math] and [math]\displaystyle{ \mathrm{merge} }[/math]. An e-class ID [math]\displaystyle{ e }[/math] is canonical if [math]\displaystyle{ \mathrm{find}(U, e) = e }[/math]; an e-node [math]\displaystyle{ f(i_1,\ldots,i_n) }[/math] is canonical if each [math]\displaystyle{ i_j }[/math] is canonical ([math]\displaystyle{ j }[/math] in [math]\displaystyle{ 1,\ldots,n }[/math]).
- An association of e-class IDs with sets of e-nodes, called e-classes. This consists of
- a hashcons [math]\displaystyle{ H }[/math] (i.e. a mapping) from canonical e-nodes to e-class IDs, and
- an e-class map [math]\displaystyle{ M }[/math] that maps e-class IDs to e-classes, such that [math]\displaystyle{ M }[/math] maps equivalent IDs to the same set of e-nodes: [math]\displaystyle{ \forall i, j\in\mathbb{id},M[i]=M[j]\Leftrightarrow \mathrm{find}(U,i)=\mathrm{find}(U,j) }[/math]
Invariants
In addition to the above structure, a valid e-graph conforms to several data structure invariants.[2] Two e-nodes are equivalent if they are in the same e-class. The congruence invariant states that an e-graph must ensure that equivalence is closed under congruence, where two e-nodes [math]\displaystyle{ f(i_1,\ldots,i_n),f(j_1,\ldots,j_n) }[/math] are congruent when [math]\displaystyle{ \mathrm{find}(U, i_k)=\mathrm{find}(U, j_k),k\in \{1,\ldots,n\} }[/math]. The hashcons invariant states that the hashcons maps canonical e-nodes to their e-class ID.
Operations
E-graphs expose wrappers around the [math]\displaystyle{ \mathrm{add} }[/math], [math]\displaystyle{ \mathrm{find} }[/math], and [math]\displaystyle{ \mathrm{merge} }[/math] operations from the union-find that preserve the e-graph invariants. The last operation, e-matching, is described below.
E-matching
Let [math]\displaystyle{ V }[/math] be a set of variables and let [math]\displaystyle{ \mathrm{Term}(\Sigma, V) }[/math] be the smallest set that includes the 0-arity function symbols (also called constants), includes the variables, and is closed under application of the function symbols. In other words, [math]\displaystyle{ \mathrm{Term}(\Sigma, V) }[/math] is the smallest set such that [math]\displaystyle{ V\subset\mathrm{Term}(V, \Sigma) }[/math], [math]\displaystyle{ \Sigma_0\subset\mathrm{Term}(\Sigma, V) }[/math], and when [math]\displaystyle{ x_1, \ldots, x_n\in \mathrm{Term}(\Sigma, V) }[/math] and [math]\displaystyle{ f\in\Sigma_n }[/math], then [math]\displaystyle{ f(x_1,\ldots,x_n)\in\mathrm{Term}(\Sigma, V) }[/math]. A term containing variables is called a pattern, a term without variables is called ground.
An e-graph [math]\displaystyle{ E }[/math] represents a ground term [math]\displaystyle{ t\in\mathrm{Term}(\Sigma, \emptyset) }[/math] if one of its e-classes represents [math]\displaystyle{ t }[/math]. An e-class [math]\displaystyle{ C }[/math] represents [math]\displaystyle{ t }[/math] if some e-node [math]\displaystyle{ f(i_1,\ldots,i_n)\in C }[/math] does. An e-node [math]\displaystyle{ f(i_1,\ldots,i_n)\in C }[/math] represents a term [math]\displaystyle{ g(j_1,\ldots,j_n) }[/math] if [math]\displaystyle{ f=g }[/math] and each e-class [math]\displaystyle{ M[i_k] }[/math] represents the term [math]\displaystyle{ j_k }[/math] ([math]\displaystyle{ k }[/math] in [math]\displaystyle{ 1,\ldots,n }[/math]).
e-matching is an operation that takes a pattern [math]\displaystyle{ p\in\mathrm{Term}(\Sigma, V) }[/math] and an e-graph [math]\displaystyle{ E }[/math], and yields all pairs [math]\displaystyle{ (\sigma, C) }[/math] where [math]\displaystyle{ \sigma\subset V\times\mathbb{id} }[/math] is a substitution mapping the variables in [math]\displaystyle{ p }[/math] to e-class IDs and [math]\displaystyle{ C\in\mathbb{id} }[/math] is an e-class ID such that each term [math]\displaystyle{ \sigma(p) }[/math] is represented by [math]\displaystyle{ C }[/math]. There are several known algorithms for e-matching,[3][4] the relational e-matching algorithm is based on worst-case optimal joins and is worst-case optimal.[5]
Complexity
- An e-graph with n equalities can be constructed in O(n log n) time.[6]
Equality saturation
Equality saturation is a technique for building optimizing compilers using e-graphs.[7] It operates by applying a set of rewrites using e-matching until the e-graph is saturated, a timeout is reached, an e-graph size limit is reached, a fixed number of iterations is exceeded, or some other halting condition is reached. After rewriting, an optimal term is extracted from the e-graph according to some cost function, usually related to AST size or performance considerations.
Applications
E-graphs are used in automated theorem proving. They are a crucial part of modern SMT solvers such as Z3[8] and CVC4, where they are used to decide the empty theory by computing the congruence closure of a set of equalities, and e-matching is used to instantiate quantifiers.[9] In DPLL(T)-based solvers that use conflict-driven clause learning (also known as non-chronological backtracking), e-graphs are extended to produce proof certificates.[10] E-graphs are also used in the Simplify theorem prover of ESC/Java.[11]
Equality saturation is used in specialized optimizing compilers,[12] e.g. for deep learning[13] and linear algebra.[14] Equality saturation has also been used for translation validation applied to the LLVM toolchain.[15]
E-graphs have been applied to several problems in program analysis, including fuzzing,[16] abstract interpretation,[17][18] and library learning.[19]
References
- ↑ (Willsey Nandi)
- ↑ (Willsey Nandi)
- ↑ (de Moura Bjørner)
- ↑ Moskal, Michał; Łopuszański, Jakub; Kiniry, Joseph R. (2008-05-06). "E-matching for Fun and Profit" (in en). Electronic Notes in Theoretical Computer Science. Proceedings of the 5th International Workshop on Satisfiability Modulo Theories (SMT 2007) 198 (2): 19–35. doi:10.1016/j.entcs.2008.04.078. ISSN 1571-0661.
- ↑ Zhang, Yihong; Wang, Yisu Remy; Willsey, Max; Tatlock, Zachary (2022-01-12). "Relational e-matching". Proceedings of the ACM on Programming Languages 6 (POPL): 35:1–35:22. doi:10.1145/3498696.
- ↑ (Flatt Coward)
- ↑ (Tate Stepp)
- ↑ de Moura, Leonardo; Bjørner, Nikolaj (2008). "Z3: An Efficient SMT Solver". in Ramakrishnan, C. R.; Rehof, Jakob (in en). Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science. 4963. Berlin, Heidelberg: Springer. pp. 337–340. doi:10.1007/978-3-540-78800-3_24. ISBN 978-3-540-78800-3.
- ↑ Rümmer, Philipp (2012). Bjørner, Nikolaj; Voronkov, Andrei. eds. "E-Matching with Free Variables" (in en). Logic for Programming, Artificial Intelligence, and Reasoning. Lecture Notes in Computer Science (Berlin, Heidelberg: Springer): 359–374. doi:10.1007/978-3-642-28717-6_28. ISBN 978-3-642-28717-6. https://link.springer.com/chapter/10.1007/978-3-642-28717-6_28.
- ↑ (Flatt Coward)
- ↑ Detlefs, David; Nelson, Greg; Saxe, James B. (2005-05-01). "Simplify: a theorem prover for program checking". Journal of the ACM 52 (3): 365–473. doi:10.1145/1066100.1066102. ISSN 0004-5411. https://doi.org/10.1145/1066100.1066102.
- ↑ Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002-05-17). "Denali: a goal-directed superoptimizer". ACM SIGPLAN Notices 37 (5): 304–314. doi:10.1145/543552.512566. ISSN 0362-1340. https://doi.org/10.1145/543552.512566.
- ↑ Yang, Yichen; Phothilimtha, Phitchaya Mangpo; Wang, Yisu Remy; Willsey, Max; Roy, Sudip; Pienaar, Jacques (2021-03-17). "Equality Saturation for Tensor Graph Superoptimization". arXiv:2101.01332 [cs.AI].
- ↑ Wang, Yisu Remy; Hutchison, Shana; Leang, Jonathan; Howe, Bill; Suciu, Dan (2020-12-22). "SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra". arXiv:2002.07951 [cs.DB].
- ↑ Stepp, Michael; Tate, Ross; Lerner, Sorin (2011). "Equality-Based Translation Validator for LLVM". in Gopalakrishnan, Ganesh; Qadeer, Shaz (in en). Computer Aided Verification. Lecture Notes in Computer Science. 6806. Berlin, Heidelberg: Springer. pp. 737–742. doi:10.1007/978-3-642-22110-1_59. ISBN 978-3-642-22110-1.
- ↑ "Wasm-mutate: Fuzzing WebAssembly Compilers with E-Graphs (EGRAPHS 2022) - PLDI 2022". https://pldi22.sigplan.org/details/egraphs-2022-papers/3/Wasm-mutate-Fuzzing-WebAssembly-Compilers-with-E-Graphs.
- ↑ Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-03-17). Abstract Interpretation on E-Graphs.
- ↑ Coward, Samuel; Constantinides, George A.; Drane, Theo (2022-05-30). Combining E-Graphs with Abstract Interpretation.
- ↑ Cao, David; Kunkel, Rose; Nandi, Chandrakana; Willsey, Max; Tatlock, Zachary; Polikarpova, Nadia (2023-01-09). "babble: Learning Better Abstractions with E-Graphs and Anti-Unification". Proceedings of the ACM on Programming Languages 7 (POPL): 396–424. doi:10.1145/3571207. ISSN 2475-1421.
- de Moura, Leonardo; Bjørner, Nikolaj (2007). "Efficient E-Matching for SMT Solvers". in Pfenning, Frank (in en). Automated Deduction – CADE-21. Lecture Notes in Computer Science. 4603. Berlin, Heidelberg: Springer. pp. 183–198. doi:10.1007/978-3-540-73595-3_13. ISBN 978-3-540-73595-3. https://link.springer.com/chapter/10.1007/978-3-540-73595-3_13.
- Willsey, Max; Nandi, Chandrakana; Wang, Yisu Remy; Flatt, Oliver; Tatlock, Zachary; Panchekha, Pavel (2021-01-04). "egg: Fast and extensible equality saturation". Proceedings of the ACM on Programming Languages 5 (POPL): 23:1–23:29. doi:10.1145/3434304. https://doi.org/10.1145/3434304.
- Tate, Ross; Stepp, Michael; Tatlock, Zachary; Lerner, Sorin (2009-01-21). "Equality saturation". Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '09. Savannah, GA, USA: Association for Computing Machinery. pp. 264–276. doi:10.1145/1480881.1480915. ISBN 978-1-60558-379-2. https://doi.org/10.1145/1480881.1480915.
- Flatt, Oliver; Coward, Samuel; Willsey, Max; Tatlock, Zachary; Panchekha, Pavel (October 2022). "Small Proofs from Congruence Closure" (in en). Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022. TU Wien Academic Press. pp. 75–83. doi:10.34727/2022/isbn.978-3-85448-053-2_13. ISBN 978-3-85448-053-2. https://repositum.tuwien.at/handle/20.500.12708/81325.
External links
Original source: https://en.wikipedia.org/wiki/E-graph.
Read more |