# Live variable analysis

In compilers, **live variable analysis** (or simply **liveness analysis**) is a classic data-flow analysis to calculate the variables that are *live* at each point in the program. A variable is *live* at some point if it holds a value that may be needed in the future, or equivalently if its value may be read before the next time the variable is written to.

## Example

Consider the following program:

b = 3 c = 5 a = f(b * c)

The set of live variables between lines 2 and 3 is {`b`

, `c`

} because both are used in the multiplication on line 3. But the set of live variables after line 1 is only {`b`

}, since variable `c`

is updated later, on line 2. The value of variable `a`

is not used in this code.

Note that the assignment to `a`

may be eliminated as `a`

is not used later, but there is insufficient information to justify removing all of line 3 as `f`

may have side effects (printing `b * c`

, perhaps).

## Expression in terms of dataflow equations

Liveness analysis is a "backwards may" analysis. The analysis is done in a backwards order, and the dataflow confluence operator is set union. In other words, if applying liveness analysis to a function with a particular number of logical branches within it, the analysis is performed starting from the end of the function working towards the beginning (hence "backwards"), and a variable is considered live if any of the branches moving forward within the function might potentially (hence "may") need the variable's current value. This is in contrast to a "backwards must" analysis which would instead enforce this condition on all branches moving forward.

The dataflow equations used for a given basic block *s* and exiting block *f* in live variable analysis are the following:

- [math]\displaystyle{ {\mbox{GEN}}[s] }[/math]: The set of variables that are used in s before any assignment.

- [math]\displaystyle{
{\mbox{KILL}}[s] }[/math]: The set of variables that are assigned a value in s (in many books
^{[clarification needed]}, KILL (s) is also defined as the set of variables assigned a value in s*before any use*, but this does not change the solution of the dataflow equation):

- [math]\displaystyle{ {\mbox{LIVE}}_{in}[s] = {\mbox{GEN}}[s] \cup ({\mbox{LIVE}}_{out}[s] - {\mbox{KILL}}[s]) }[/math]
- [math]\displaystyle{ {\mbox{LIVE}}_{out}[final] = {\emptyset} }[/math]
- [math]\displaystyle{ {\mbox{LIVE}}_{out}[s] = \bigcup_{p \in succ[s]} {\mbox{LIVE}}_{in}[p] }[/math]

- [math]\displaystyle{ {\mbox{GEN}}[d : y \leftarrow f(x_1,\cdots,x_n)] = \{x_1,...,x_n\} }[/math]
- [math]\displaystyle{ {\mbox{KILL}}[d : y \leftarrow f(x_1,\cdots,x_n)] = \{y\} }[/math]

The in-state of a block is the set of variables that are live at the start of the block. Its out-state is the set of variables that are live at the end of it. The out-state is the union of the in-states of the block's successors. The transfer function of a statement is applied by making the variables that are written dead, then making the variables that are read live.

## Second example

// in: {} b1: a = 3; b = 5; d = 4; x = 100; //x is never being used later thus not in the out set {a,b,d} if a > b then // out: {a,b,d} //union of all (in) successors of b1 => b2: {a,b}, and b3:{b,d} // in: {a,b} b2: c = a + b; d = 2; // out: {b,d} // in: {b,d} b3: endif c = 4; return b * d + c; // out:{} |

The in-state of b3 only contains *b* and *d*, since *c* has been written. The out-state of b1 is the union of the in-states of b2 and b3. The definition of *c* in b2 can be removed, since *c* is not live immediately after the statement.

Solving the data flow equations starts with initializing all in-states and out-states to the empty set. The work list is initialized by inserting the exit point (b3) in the work list (typical for backward flow). Its computed in-state differs from the previous one, so its predecessors b1 and b2 are inserted and the process continues. The progress is summarized in the table below.

processing | out-state | old in-state | new in-state | work list |
---|---|---|---|---|

b3 | {} | {} | {b,d} | (b1,b2) |

b1 | {b,d} | {} | {} | (b2) |

b2 | {b,d} | {} | {a,b} | (b1) |

b1 | {a,b,d} | {} | {} | () |

Note that b1 was entered in the list before b2, which forced processing b1 twice (b1 was re-entered as predecessor of b2). Inserting b2 before b1 would have allowed earlier completion.

Initializing with the empty set is an optimistic initialization: all variables start out as dead. Note that the out-states cannot shrink from one iteration to the next, although the out-state can be smaller than the in-state. This can be seen from the fact that after the first iteration the out-state can only change by a change of the in-state. Since the in-state starts as the empty set, it can only grow in further iterations.

## References