Pointer analysis
In computer science, pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables, or storage locations. It is often a component of more complex analyses such as escape analysis. A closely related technique is shape analysis.
This is the most common colloquial use of the term. A secondary use has pointer analysis be the collective name for both points-to analysis, defined as above, and alias analysis. Points-to and alias analysis are closely related but not always equivalent problems.
Example
For the following example program, a points-to analysis would compute that the points-to set of p
is {x
, y
}.
int x; int y; int* p = unknown() ? &x : &y;
Introduction
As a form of static analysis, fully precise pointer analysis can be shown to be undecidable.[1] Most approaches are sound, but range widely in performance and precision. Many design decisions impact both the precision and performance of an analysis; often (but not always) lower precision yields higher performance. These choices include:[2][3]
- Field sensitivity (also known as structure sensitivity): An analysis can either treat each field of a struct or object separately, or merge them.
- Array sensitivity: An array-sensitive pointer analysis models each index in an array separately. Other choices include modelling just the first entry separately and the rest together, or merging all array entries.
- Context sensitivity or polyvariance: Pointer analyses may qualify points-to information with a summary of the control flow leading to each program point.
- Flow sensitivity: An analysis can model the impact of intraprocedural control flow on points-to facts.
- Heap modeling: Run-time allocations may be abstracted by:
- their allocation sites (the statement or instruction that performs the allocation, e.g., a call to
malloc
or an object constructor), - a more complex model based on a shape analysis,
- the type of the allocation, or
- one single allocation (this is called heap-insensitivity).
- their allocation sites (the statement or instruction that performs the allocation, e.g., a call to
- Heap cloning: Heap- and context-sensitive analyses may further qualify each allocation site by a summary of the control flow leading to the instruction or statement performing the allocation.
- Subset constraints or equality constraints: When propagating points-to facts, different program statements may induce different constraints on a variable's points-to sets. Equality constraints (like those used in Steensgaard's algorithm) can be tracked with a union-find data structure, leading to high performance at the expense of the precision of a subset-constraint based analysis (e.g., Andersen's algorithm).
Context-Insensitive, Flow-Insensitive Algorithms
Pointer analysis algorithms are used to convert collected raw pointer usages (assignments of one pointer to another or assigning a pointer to point to another one) to a useful graph of what each pointer can point to.[4]
Steensgaard's algorithm and Andersen's algorithm are common context-insensitive, flow-insensitive algorithms for pointer analysis. They are often used in compilers, and have implementations in SVF [5] and LLVM.
Flow-Insensitive Approaches
Many approaches to flow-insensitive pointer analysis can be understood as forms of abstract interpretation, where heap allocations are abstracted by their allocation site (i.e., a program location).[6]
Many flow-insensitive algorithms are specified in Datalog, including those in the Soot analysis framework for Java.[7]
Context-sensitive, flow-sensitive algorithms achieve higher precision, generally at the cost of some performance, by analyzing each procedure several times, once per context.[8] Most analyses use a "context-string" approach, where contexts consist of a list of entries (common choices of context entry include call sites, allocation sites, and types).[9] To ensure termination (and more generally, scalability), such analyses generally use a k-limiting approach, where the context has a fixed maximum size, and the least recently added elements are removed as needed.[10] Three common variants of context-sensitive, flow-insensitive analysis are:[11]
- Call-site sensitivity
- Object sensitivity
- Type sensitivity
Call-site sensitivity
In call-site sensitivity, the points-to set of each variable (the set of abstract heap allocations each variable could point to) is further qualified by a context consisting of a list of callsites in the program. These contexts abstract the control-flow of the program.
The following program demonstrates how call-site sensitivity can achieve higher precision than a flow-insensitive, context-insensitive analysis.
int *id(int* x) { return x; } int main() { int y; int z; int *y2 = id(&y); // call-site 1 int *z2 = id(&z); // call-site 2 return 0; }
For this program, a context-insensitive analysis would (soundly but imprecisely) conclude that x can point to either the allocation holding y or that of z, so y2 and z2 may alias, and both could point to either allocation. A callsite-sensitive analysis would analyze id twice, once for call-site 1 and once for call-site 2, and the points-to facts for x would be qualified by the call-site, enabling the analysis to deduce that when main returns, y2 can only point to the allocation holding y and z2 can only point to the allocation holding z.
Object sensitivity
In an object sensitive analysis, the points-to set of each variable is qualified by the abstract heap allocation of the receiver object of the method call. Unlike call-site sensitivity, object-sensitivity is non-syntactic or non-local: the context entries are derived during the points-to analysis itself.[12]
Type sensitivity
Type sensitivity is a variant of object sensitivity where the allocation site of the receiver object is replaced by the class/type containing the method containing the allocation site of the receiver object.[13] This results in strictly fewer contexts than would be used in an object-sensitive analysis, which generally means better performance.
References
- ↑ Reps, Thomas (2000-01-01). "Undecidability of context-sensitive data-dependence analysis". ACM Transactions on Programming Languages and Systems 22 (1): 162–186. doi:10.1145/345099.345137. ISSN 0164-0925.
- ↑ Barbara G. Ryder (2003). "Dimensions of Precision in Reference Analysis of Object-Oriented Programming Languages". pp. 126–137. doi:10.1007/3-540-36579-6_10.
- ↑ (Hind {{{2}}})
- ↑ Zyrianov, Vlas; Newman, Christian D.; Guarnera, Drew T.; Collard, Michael L.; Maletic, Jonathan I. (2019). "srcPtr: A Framework for Implementing Static Pointer Analysis Approaches". Montreal, Canada: IEEE. https://www.zyrianov.org/papers/ICPC19.pdf.
- ↑ Sui, Yulei; Xue, Jingling (2016). "SVF: interprocedural static value-flow analysis in LLVM". ACM. https://yuleisui.github.io/publications/cc16.pdf.
- ↑ Smaragdakis, Yannis; Bravenboer, Martin; Lhoták, Ondrej (2011-01-26). "Pick your contexts well". Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '11. Austin, Texas, USA: Association for Computing Machinery. pp. 17–30. doi:10.1145/1926385.1926390. ISBN 978-1-4503-0490-0. https://doi.org/10.1145/1926385.1926390.
- ↑ Antoniadis, Tony; Triantafyllou, Konstantinos; Smaragdakis, Yannis (2017-06-18). "Porting doop to Soufflé". Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis. SOAP 2017. Barcelona, Spain: Association for Computing Machinery. pp. 25–30. doi:10.1145/3088515.3088522. ISBN 978-1-4503-5072-3. https://doi.org/10.1145/3088515.3088522.
- ↑ (Smaragdakis Balatsouras)
- ↑ Thiessen, Rei; Lhoták, Ondřej (2017-06-14). "Context transformations for pointer analysis". ACM SIGPLAN Notices 52 (6): 263–277. doi:10.1145/3140587.3062359. ISSN 0362-1340. https://doi.org/10.1145/3140587.3062359.
- ↑ (Li Tan)
- ↑ (Smaragdakis Balatsouras)
- ↑ (Smaragdakis Balatsouras)
- ↑ (Smaragdakis Balatsouras)
Bibliography
- Zyrianov, Vlas; Newman, Christian D.; Guarnera, Drew T.; Collard, Michael L.; Maletic, Jonathan I. (2019). "srcPtr: A Framework for Implementing Static Pointer Analysis Approaches". Montreal, Canada: IEEE. https://www.zyrianov.org/papers/ICPC19.pdf.
- Smaragdakis, Yannis; Balatsouras, George (2015). "Pointer Analysis". Foundations and Trends in Programming Languages 2 (1): 1–69. doi:10.1561/2500000014. https://yanniss.github.io/points-to-tutorial15.pdf. Retrieved May 30, 2019.
- Li, Yue; Tan/, Tian; Møller, Anders; Smaragdakis, Yannis (2020-05-18). "A Principled Approach to Selective Context Sensitivity for Pointer Analysis". ACM Transactions on Programming Languages and Systems 42 (2): 10:1–10:40. doi:10.1145/3381915. ISSN 0164-0925. https://doi.org/10.1145/3381915.
- Michael Hind (2001). "Pointer analysis: haven't we solved this problem yet?". ACM. pp. 54–61. ISBN 1-58113-413-4. http://www.cs.trinity.edu/~mlewis/CSCI3294-F01/Papers/p54-hind.pdf.
- Steensgaard, Bjarne (1996). "Points-to analysis in almost linear time". New York, NY, USA: ACM. pp. 32–41. doi:10.1145/237721.237727. ISBN 0-89791-769-3. http://cms.dc.uba.ar/materias/aap/2008/c2/descargas/pointsTo.pdf.
- Andersen, Lars Ole (1994). Program Analysis and Specialization for the C Programming Language (PDF) (PhD thesis).
Original source: https://en.wikipedia.org/wiki/Pointer analysis.
Read more |