Live variable analysis

From HandWiki

In compilers, live variable analysis (or simply liveness analysis) is a classic data-flow analysis to calculate the variables that are live at each point in the program. A variable is live at some point if it holds a value that may be needed in the future, or equivalently if its value may be read before the next time the variable is written to.


Consider the following program:

b = 3
c = 5
a = f(b * c)

The set of live variables between lines 2 and 3 is {b, c} because both are used in the multiplication on line 3. But the set of live variables after line 1 is only {b}, since variable c is updated later, on line 2. The value of variable a is not used in this code.

Note that the assignment to a may be eliminated as a is not used later, but there is insufficient information to justify removing all of line 3 as f may have side effects (printing b * c, perhaps).

Expression in terms of dataflow equations

Liveness analysis is a "backwards may" analysis. The analysis is done in a backwards order, and the dataflow confluence operator is set union. In other words, if applying liveness analysis to a function with a particular number of logical branches within it, the analysis is performed starting from the end of the function working towards the beginning (hence "backwards"), and a variable is considered live if any of the branches moving forward within the function might potentially (hence "may") need the variable's current value. This is in contrast to a "backwards must" analysis which would instead enforce this condition on all branches moving forward.

The dataflow equations used for a given basic block s and exiting block f in live variable analysis are the following:

[math]\displaystyle{ {\mbox{GEN}}[s] }[/math]: The set of variables that are used in s before any assignment.
[math]\displaystyle{ {\mbox{KILL}}[s] }[/math]: The set of variables that are assigned a value in s (in many books[clarification needed], KILL (s) is also defined as the set of variables assigned a value in s before any use, but this does not change the solution of the dataflow equation):

[math]\displaystyle{ {\mbox{LIVE}}_{in}[s] = {\mbox{GEN}}[s] \cup ({\mbox{LIVE}}_{out}[s] - {\mbox{KILL}}[s]) }[/math]
[math]\displaystyle{ {\mbox{LIVE}}_{out}[final] = {\emptyset} }[/math]
[math]\displaystyle{ {\mbox{LIVE}}_{out}[s] = \bigcup_{p \in succ[s]} {\mbox{LIVE}}_{in}[p] }[/math]
[math]\displaystyle{ {\mbox{GEN}}[d : y \leftarrow f(x_1,\cdots,x_n)] = \{x_1,...,x_n\} }[/math]
[math]\displaystyle{ {\mbox{KILL}}[d : y \leftarrow f(x_1,\cdots,x_n)] = \{y\} }[/math]

The in-state of a block is the set of variables that are live at the start of the block. Its out-state is the set of variables that are live at the end of it. The out-state is the union of the in-states of the block's successors. The transfer function of a statement is applied by making the variables that are written dead, then making the variables that are read live.

Second example

// in: {}
b1: a = 3; 
    b = 5;
    d = 4;
    x = 100; //x is never being used later thus not in the out set {a,b,d}
    if a > b then
// out: {a,b,d}    //union of all (in) successors of b1 => b2: {a,b}, and b3:{b,d}  

// in: {a,b}
b2: c = a + b;
    d = 2;
// out: {b,d}

// in: {b,d}
b3: endif
    c = 4;
    return b * d + c;
// out:{}

The in-state of b3 only contains b and d, since c has been written. The out-state of b1 is the union of the in-states of b2 and b3. The definition of c in b2 can be removed, since c is not live immediately after the statement.

Solving the data flow equations starts with initializing all in-states and out-states to the empty set. The work list is initialized by inserting the exit point (b3) in the work list (typical for backward flow). Its computed in-state differs from the previous one, so its predecessors b1 and b2 are inserted and the process continues. The progress is summarized in the table below.

processing out-state old in-state new in-state work list
b3 {} {} {b,d} (b1,b2)
b1 {b,d} {} {} (b2)
b2 {b,d} {} {a,b} (b1)
b1 {a,b,d} {} {} ()

Note that b1 was entered in the list before b2, which forced processing b1 twice (b1 was re-entered as predecessor of b2). Inserting b2 before b1 would have allowed earlier completion.

Initializing with the empty set is an optimistic initialization: all variables start out as dead. Note that the out-states cannot shrink from one iteration to the next, although the out-state can be smaller than the in-state. This can be seen from the fact that after the first iteration the out-state can only change by a change of the in-state. Since the in-state starts as the empty set, it can only grow in further iterations.