Pointer analysis

Short description: Determining what or where each pointer points to in program code

In computer science, pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables, or storage locations. It is often a component of more complex analyses such as escape analysis. A closely related technique is shape analysis.

This is the most common colloquial use of the term. A secondary use has pointer analysis be the collective name for both points-to analysis, defined as above, and alias analysis. Points-to and alias analysis are closely related but not always equivalent problems.

Example

For the following example program, a points-to analysis would compute that the points-to set of p is {x, y}.

int x;
int y;
int* p = unknown() ? &x : &y;

Introduction

As a form of static analysis, fully precise pointer analysis can be shown to be undecidable.^[1] Most approaches are sound, but range widely in performance and precision. Many design decisions impact both the precision and performance of an analysis; often (but not always) lower precision yields higher performance. These choices include:^[2]^[3]

Field sensitivity (also known as structure sensitivity): An analysis can either treat each field of a struct or object separately, or merge them.
Array sensitivity: An array-sensitive pointer analysis models each index in an array separately. Other choices include modelling just the first entry separately and the rest together, or merging all array entries.
Context sensitivity or polyvariance: Pointer analyses may qualify points-to information with a summary of the control flow leading to each program point.
Flow sensitivity: An analysis can model the impact of intraprocedural control flow on points-to facts.
Heap modeling: Run-time allocations may be abstracted by:
- their allocation sites (the statement or instruction that performs the allocation, e.g., a call to malloc or an object constructor),
- a more complex model based on a shape analysis,
- the type of the allocation, or
- one single allocation (this is called heap-insensitivity).
Heap cloning: Heap- and context-sensitive analyses may further qualify each allocation site by a summary of the control flow leading to the instruction or statement performing the allocation.
Subset constraints or equality constraints: When propagating points-to facts, different program statements may induce different constraints on a variable's points-to sets. Equality constraints (like those used in Steensgaard's algorithm) can be tracked with a union-find data structure, leading to high performance at the expense of the precision of a subset-constraint based analysis (e.g., Andersen's algorithm).

Context-Insensitive, Flow-Insensitive Algorithms

Pointer analysis algorithms are used to convert collected raw pointer usages (assignments of one pointer to another or assigning a pointer to point to another one) to a useful graph of what each pointer can point to.^[4]

Steensgaard's algorithm and Andersen's algorithm are common context-insensitive, flow-insensitive algorithms for pointer analysis. They are often used in compilers, and have implementations in SVF ^[5] and LLVM.

Flow-Insensitive Approaches

Many approaches to flow-insensitive pointer analysis can be understood as forms of abstract interpretation, where heap allocations are abstracted by their allocation site (i.e., a program location).^[6]

Flow-insensitive pointer analyses often abstract possible runtime allocations by their allocation site. At runtime, this program creates three separate heap allocations. A flow-insensitive pointer analysis would treat these as a single abstract memory location, leading to a loss of precision.

Many flow-insensitive algorithms are specified in Datalog, including those in the Soot analysis framework for Java.^[7]

Context-sensitive, flow-sensitive algorithms achieve higher precision, generally at the cost of some performance, by analyzing each procedure several times, once per context.^[8] Most analyses use a "context-string" approach, where contexts consist of a list of entries (common choices of context entry include call sites, allocation sites, and types).^[9] To ensure termination (and more generally, scalability), such analyses generally use a k-limiting approach, where the context has a fixed maximum size, and the least recently added elements are removed as needed.^[10] Three common variants of context-sensitive, flow-insensitive analysis are:^[11]

Call-site sensitivity
Object sensitivity
Type sensitivity

Call-site sensitivity

In call-site sensitivity, the points-to set of each variable (the set of abstract heap allocations each variable could point to) is further qualified by a context consisting of a list of callsites in the program. These contexts abstract the control-flow of the program.

The following program demonstrates how call-site sensitivity can achieve higher precision than a flow-insensitive, context-insensitive analysis.

int *id(int* x) {
  return x;
}
int main() {
  int y;
  int z;
  int *y2 = id(&y); // call-site 1
  int *z2 = id(&z); // call-site 2
  return 0;
}

For this program, a context-insensitive analysis would (soundly but imprecisely) conclude that x can point to either the allocation holding y or that of z, so y2 and z2 may alias, and both could point to either allocation. A callsite-sensitive analysis would analyze id twice, once for call-site 1 and once for call-site 2, and the points-to facts for x would be qualified by the call-site, enabling the analysis to deduce that when main returns, y2 can only point to the allocation holding y and z2 can only point to the allocation holding z.

Object sensitivity

In an object sensitive analysis, the points-to set of each variable is qualified by the abstract heap allocation of the receiver object of the method call. Unlike call-site sensitivity, object-sensitivity is non-syntactic or non-local: the context entries are derived during the points-to analysis itself.^[12]

Type sensitivity

Type sensitivity is a variant of object sensitivity where the allocation site of the receiver object is replaced by the class/type containing the method containing the allocation site of the receiver object.^[13] This results in strictly fewer contexts than would be used in an object-sensitive analysis, which generally means better performance.

References

↑ Reps, Thomas (2000-01-01). "Undecidability of context-sensitive data-dependence analysis". ACM Transactions on Programming Languages and Systems 22 (1): 162–186. doi:10.1145/345099.345137. ISSN 0164-0925.
↑ Barbara G. Ryder (2003). "Dimensions of Precision in Reference Analysis of Object-Oriented Programming Languages". pp. 126–137. doi:10.1007/3-540-36579-6_10.
↑ (Hind {{{2}}})
↑ Zyrianov, Vlas; Newman, Christian D.; Guarnera, Drew T.; Collard, Michael L.; Maletic, Jonathan I. (2019). "srcPtr: A Framework for Implementing Static Pointer Analysis Approaches". Montreal, Canada: IEEE. https://www.zyrianov.org/papers/ICPC19.pdf.
↑ Sui, Yulei; Xue, Jingling (2016). "SVF: interprocedural static value-flow analysis in LLVM". ACM. https://yuleisui.github.io/publications/cc16.pdf.
↑ Smaragdakis, Yannis; Bravenboer, Martin; Lhoták, Ondrej (2011-01-26). "Pick your contexts well". Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '11. Austin, Texas, USA: Association for Computing Machinery. pp. 17–30. doi:10.1145/1926385.1926390. ISBN 978-1-4503-0490-0. https://doi.org/10.1145/1926385.1926390.
↑ Antoniadis, Tony; Triantafyllou, Konstantinos; Smaragdakis, Yannis (2017-06-18). "Porting doop to Soufflé". Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis. SOAP 2017. Barcelona, Spain: Association for Computing Machinery. pp. 25–30. doi:10.1145/3088515.3088522. ISBN 978-1-4503-5072-3. https://doi.org/10.1145/3088515.3088522.
↑ (Smaragdakis Balatsouras)
↑ Thiessen, Rei; Lhoták, Ondřej (2017-06-14). "Context transformations for pointer analysis". ACM SIGPLAN Notices 52 (6): 263–277. doi:10.1145/3140587.3062359. ISSN 0362-1340. https://doi.org/10.1145/3140587.3062359.
↑ (Li Tan)
↑ (Smaragdakis Balatsouras)
↑ (Smaragdakis Balatsouras)
↑ (Smaragdakis Balatsouras)

Bibliography

Zyrianov, Vlas; Newman, Christian D.; Guarnera, Drew T.; Collard, Michael L.; Maletic, Jonathan I. (2019). "srcPtr: A Framework for Implementing Static Pointer Analysis Approaches". Montreal, Canada: IEEE. https://www.zyrianov.org/papers/ICPC19.pdf.
Smaragdakis, Yannis; Balatsouras, George (2015). "Pointer Analysis". Foundations and Trends in Programming Languages 2 (1): 1–69. doi:10.1561/2500000014. https://yanniss.github.io/points-to-tutorial15.pdf. Retrieved May 30, 2019.
Li, Yue; Tan/, Tian; Møller, Anders; Smaragdakis, Yannis (2020-05-18). "A Principled Approach to Selective Context Sensitivity for Pointer Analysis". ACM Transactions on Programming Languages and Systems 42 (2): 10:1–10:40. doi:10.1145/3381915. ISSN 0164-0925. https://doi.org/10.1145/3381915.
Michael Hind (2001). "Pointer analysis: haven't we solved this problem yet?". ACM. pp. 54–61. ISBN 1-58113-413-4. http://www.cs.trinity.edu/~mlewis/CSCI3294-F01/Papers/p54-hind.pdf.
Steensgaard, Bjarne (1996). "Points-to analysis in almost linear time". New York, NY, USA: ACM. pp. 32–41. doi:10.1145/237721.237727. ISBN 0-89791-769-3. http://cms.dc.uba.ar/materias/aap/2008/c2/descargas/pointsTo.pdf.
Andersen, Lars Ole (1994). Program Analysis and Specialization for the C Programming Language (PDF) (PhD thesis).

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Pointer analysis. Read more

[1] Reps, Thomas (2000-01-01). "Undecidability of context-sensitive data-dependence analysis". ACM Transactions on Programming Languages and Systems 22 (1): 162–186. doi:10.1145/345099.345137. ISSN 0164-0925.

[2] Barbara G. Ryder (2003). "Dimensions of Precision in Reference Analysis of Object-Oriented Programming Languages". pp. 126–137. doi:10.1007/3-540-36579-6_10.

[3] (Hind {{{2}}})

[4] Zyrianov, Vlas; Newman, Christian D.; Guarnera, Drew T.; Collard, Michael L.; Maletic, Jonathan I. (2019). "srcPtr: A Framework for Implementing Static Pointer Analysis Approaches". Montreal, Canada: IEEE. https://www.zyrianov.org/papers/ICPC19.pdf.

[5] Sui, Yulei; Xue, Jingling (2016). "SVF: interprocedural static value-flow analysis in LLVM". ACM. https://yuleisui.github.io/publications/cc16.pdf.

[6] Smaragdakis, Yannis; Bravenboer, Martin; Lhoták, Ondrej (2011-01-26). "Pick your contexts well". Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. POPL '11. Austin, Texas, USA: Association for Computing Machinery. pp. 17–30. doi:10.1145/1926385.1926390. ISBN 978-1-4503-0490-0. https://doi.org/10.1145/1926385.1926390.

[7] Antoniadis, Tony; Triantafyllou, Konstantinos; Smaragdakis, Yannis (2017-06-18). "Porting doop to Soufflé". Proceedings of the 6th ACM SIGPLAN International Workshop on State of the Art in Program Analysis. SOAP 2017. Barcelona, Spain: Association for Computing Machinery. pp. 25–30. doi:10.1145/3088515.3088522. ISBN 978-1-4503-5072-3. https://doi.org/10.1145/3088515.3088522.

[8] (Smaragdakis Balatsouras)

[9] Thiessen, Rei; Lhoták, Ondřej (2017-06-14). "Context transformations for pointer analysis". ACM SIGPLAN Notices 52 (6): 263–277. doi:10.1145/3140587.3062359. ISSN 0362-1340. https://doi.org/10.1145/3140587.3062359.

[10] (Li Tan)

[11] (Smaragdakis Balatsouras)

[12] (Smaragdakis Balatsouras)

[13] (Smaragdakis Balatsouras)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Collapse v t e Compiler optimizations
Basic block	Peephole optimization
Loop optimization	Induction variable Strength reduction Loop fusion Loop inversion Loop interchange Loop-invariant code motion Loop nest optimization Loop unrolling Loop splitting Loop unswitching Software pipelining Automatic parallelization
Data-flow analysis	Common subexpression elimination Constant folding Induction variable recognition and elimination Dead store elimination Use-define chain Live variable analysis Available expression
SSA-based	Global value numbering Sparse conditional constant propagation
Code generation	Register allocation Instruction selection Instruction scheduling Rematerialization
Functional	Tail call elimination Deforestation
Global	Interprocedural optimization
Other	Bounds-checking elimination Compile-time function execution Dead code elimination Inline expansion Jump threading
Static analysis	Alias analysis Pointer analysis Shape analysis Escape analysis Array access analysis Dependence analysis Control flow analysis Data-flow analysis

Anonymous

Search

Pointer analysis

Namespaces

More

Page actions

Contents

Example

Introduction

Context-Insensitive, Flow-Insensitive Algorithms

Flow-Insensitive Approaches

Call-site sensitivity

Object sensitivity

Type sensitivity

References

Bibliography

Navigation

Navigation

Help

Translate

Wiki tools

Wiki tools

Anonymous

Search

Pointer analysis

Example

Introduction

Context-Insensitive, Flow-Insensitive Algorithms

Flow-Insensitive Approaches

Call-site sensitivity

Object sensitivity

Type sensitivity

References

Bibliography

Navigation

Wiki tools

Page tools

Other projects

Categories