This note is a review of the book "Principles of Program Analysis," to help understand the narrative. The book uses a method of description this is needlessly too-complex. The basic approach is:
1. Statements into a graph, expressions & sub-expressions into nodes
2. Form base set of attributes for nodes
3. Form complete attributes of each node
4. Answer questions about procedures, and other objects of the program.
Compilers use these techniques (or similar ones) to:
1. Remove superfluous computations (dead code propagation, constant propagation)
2. Merge redundant computations
3. To schedule computation and other operations
Analysis tools may employ these techniques to suggest that the program may have implementation mistakes. These tools are slowly improving in the industry, but are decades behind compiler tools. That said, this book is interested in the academic analysis, and is several steps removed from what it takes to produce good tools.
I. Notation used in the book.
The text prefers to use a small number of "abstractions" for a foundation:
* "Lattices" are used for structures
* The process of applying rules, broadly, uses the concept of fixed point
* Working thru constraints is handled by work-lists
I'll go into depth on the notation below, as some of the book is written needlessly academically (poorly) in how it uses terms.
A. Syntax: The language is broken down along syntax into nodes. Implicitly there is only one operation per node. Expressions are decomposed into separate sub-expression for each action
Note: To look at an analysis, the book often defines a small grammar; some massaging is often needed to make the analysis work.
B. Semantics: (1) A set of values, state, variables and their type, sets of variables (closures). (2) Specifies how a program transforms one value into another
C. Program analysis examines (1) sets of properties (2) specifies how a program transforms one property into another
D. Labels. Nodes are assigned a unique numerical identifier. A node could be identified by an internal pointer. Using a file-line-column-span (e.g. mapping to the source file) is not recommended. Constant folding and merging duplicate code operations, make it possible for several different source-file locations to map to the same node.
E. "Fixed point" is a term that the book uses, but no one should ever use -- it's arrogant, and there are easier ways to say any statement that uses the term. "Fixed point" is used, idiomatically, to mean repeatedly resolving references - e.g. values expressions - until no more can be resolved. Specific examples of use include: Producing a trace, Constant folding, Dead Code elimination, Abstract interpretation
The technical meaning of fixed point is a value that a function (when given it as an argument) returns. In this case, the "value" is the set of variables and their values (or unresolved expression, as the case may be). The function is the process of resolving expressions into values. This is repeated until nothing more can be resolved this way.
F. "Lattice". The text prefers to make structures into complete lattice for its analysis. Again, this term is arrogant, and there is always a clearer, easier way to make any statement that employs it. Lattices are essentially tree structures: the set of child nodes (of two nodes) don't partially overlap - they are either a subset, the same, or share no common elements. In, complete lattices all children (subsets) have a greatest lower bound, a least upper bound, a least and a greatest element. The right most child node is often the left most child of a sibling.
Treating lattices with is indicated to be done bit vectors, although not clearly defined in the book.
G. Work-list builds a set of items that satisfies constraints. These constraints are in a graph structure, and numbered. These algorithms relate to repeatedly applying the rules until solved (see fixed point)
The techniques described should be sound and complete, and the book includes a brief discussion on how to tell if they are. Start with a restricted class analysis. Define correctness relations for each type of analysis. Starts with simple and expands to more intermediary steps in the analysis. This leads to what the elements analysis are: Values, Heap, Property, Expression, Pointer, Selector, Type, State, Location, Variable, Label, Constraint
Types of analysis by pairs of these elements:
* Going from one state to another in in the program uses "Constant propagation analysis" techniques
* Going from one environment to another in the program uses "Control flow analysis" techniques
* Where (the label) variables got their value uses "Data flow" techniques
And so on.
This is where the book does much better. It describes how to perform a variety of analysis techniques (not just the ones indicated above): value and data flow (with equational and constraint based approaches), variable analysis, type analysis, utility (liveliness) of variables and expressions, reference and shape analysis, control flow analysis, constraint based analysis, working with object-oriented languages, intraprocedural and interprocedural analysis, and abstract interpretation
The book also includes the use of types in a program -- something academics texts occasionally like to pretend doesn't exist, rendering them useless.