COMS W4117
Compilers and Translators:
Software Verification Tools
Lecture 7: Data-Flow Analysis Frameworks
September 25, 2007
Lecture Outline
- Review
- Data-flow analysis frameworks
- Iterative algorithm for general frameworks
- Meaning of a data-flow solution
- Framework for constant propagation
- Reading
1. Review
- Available expressions
- Basic questions about the iterative algorithm
- Lattices and semilattices
- Transfer functions
2. Data-Flow Analysis Frameworks
- Basic questions
- When is the iterative algorithm correct?
- How precise is the solution?
- Under what conditions does the iterative algorithm converge?
- What is the meaning of the solution to the data-flow equations?
- Definition of a data-flow analysis framework
- Data-flow graph with an ENTRY and EXIT node.
- Direction: forwards or backwards.
- Semilattice with values V for IN and OUT and a meet operator ∧.
- A family of transfer functions for the blocks of the data-flow graph.
- A constant value for the boundary condition.
- Monotone frameworks
- A framework is monotone (order preserving) if for all x and y in V and f in F,
x ≤ y implies f(x) ≤ f(y).
- Or, equivalently, f(x ∧ y) ≤ f(x) ∧ f(y).
- Fixedpoints
- Let f be a function on a set V. An element x in V is a fixedpoint if f(x) = x.
- Let f be a monotone function. A maximum (least) fixedpoint of f is the greatest
(least) element x such that f(x) = x.
- Knaster-Tarski theorem: If f is a monotone function on a complete lattice L,
then the set of fixedpoints of f in L is also a complete lattice.
- If f is a monotone function on a complete lattice L, then the maximum fixedpoint of f
is the limit of the sequence
- f0 = TOP
- fn+1 = f(fn)
-
- Distributive frameworks
- A framework is distributive if for all x and y in V and f in F,
f(x ∧ y) = f(x) ∧ f(y).
- Every distributive framework is monotone (but not conversely).
- Reaching definitions, live variables, and available expressions are all
distributive frameworks.
3. The Iterative Algorithm for General Frameworks
- Input: a data-flow analysis framework.
- Output: the sets IN[B] and OUT[B]
for each block B in the data-flow graph.
- Method: The algorithm is given in Fig. 9.23, p. 627.
- Important properties of the algorithm:
- If the algorithm converges, the result is a solution to the data-flow equations.
- If the framework is monotone, the solution is the MFP of the data-flow equations.
- If the semilattice of the framework is monotone and of finite height, the
algorithm is guaranteed to converge.
4. Meaning of a Data-Flow Solution
- For a forward data-flow framework, the IDEAL[B] solution for a block B is the meet over all
feasible paths P of fP(vENTRY).
- Any answer greater than IDEAL is incorrect.
- Any answer smaller than or equal to IDEAL is safe.
- MOP[B], the meet over all paths solution, assumes that every path in the data-flow graph
can be taken, but some of these paths may not be feasible.
Thus, MOP[B] ≤ IDEAL[B] for all blocks B.
- The MFP solution is always smaller than or equal to the MOP solution.
Therefore, MFP ≤ MOP ≤ IDEAL.
5. Framework for Constant Propagation
- Constant propagation, often called "constant folding", replaces expressions that always evaluate
to the same constant by that constant.
- Constant-propagation framework
- The constant semilattice for a single variable has
- All constants appropriate for the type of the variable.
- The value NAC for not-a-constant. NAC is the bottom element.
- The value UNDEF for undefined. UNDEF is the top element.
- The following properties hold:
- UNDEF ∧ v = v for all values v.
- NAC ∧ v = NAC for all values v.
- c ∧ c = c for all constants c.
- c ∧ d = NAC for any two distinct constants c and d.
- Transfer functions
- A data-flow value for the framework is a map from each variable in the program
to one of the values in the constant semilattice.
- We use m(v) to denote the lattice value of program variable v.
- The set F consists of transfer functions that accept a map of variables to values
in the constant semilattice and return return another such map.
- Let fs be the transfer function for statement s, and let m and m' be
data-flow values such that m' = fs(m).
- Forms for statement s:
- If s is x = c, then m'(x) = c.
- If s is x = y + z, then m'(x) =
- m(y) + m(z) if m(y) and m(z) are constant values
- NAC if either m(y) or m(z) is NAC
- UNDEF otherwise
- If the right side of statement s is anything else (e.g., function call, pointer expression),
then m'(x) = NAC.
- This framework is monotone but not distributive.
6. Reading
aho@cs.columbia.edu