A Survey of Rewriting Strategies in Program Transformation Systems Part 1 Author: Eelco Visser Speaker: Wei Zhao
Dec 30, 2015
A Survey of Rewriting Strategies in Program
Transformation SystemsPart 1
Author: Eelco VisserSpeaker: Wei Zhao
A Survey of Rewriting Strategies in Program
Transformation SystemsPart 1
Author: Eelco VisserSpeaker: Wei Zhao
2/29
Outline
Section 1: introduction Section 2: A Taxonomy of program
transformation Section 3: Program Representation Section 4: Implementation of Program
Transformation Section 5: Program Transformation
Paradigms Section 6: conclusions
4/29
Definitions
Syntax allow us to transform a program
Semantics (intentional and extensional) to reason the validity of transformation
PT: is the act of changing one program into another
Source & target languages
5/29
Section 1: Introduction Definition of program transformation (PT) The aim of PT: programmer productivity,
maintainability, re-usability The aim of PT research:
The exist PT systems are specialized Programming languages have structural/semantic
similarities To capture generic, language independent schemas of
transformation; to develop efficient implementation that scales up to large programs
The aim of the paper: understand the similarities/differences among PT systems; concentrate on the transformation strategies
6/29
Section 2: Taxonomy of PT
Two main scenarios: Translation: source and target
languages are different Rephrasing: source and target
languages are the same Sub-scenarios based on the level of
abstraction of a program and to what extent they preserve the semantics of a program
7/29
2.1 Translation scenarios
Sub-scenarios are based on the level of abstraction of a program
Aims at preserving the extensional semantics, but usually not possible
Sub-scenarios: Synthesis, migration, reverse engineering, analysis
8/29
Synthesis
Translate from the high level to lower level abstraction
Design is traded for increased efficiency Examples:
Refinement: deriving implementation from high level specification
Compilation: Parser/pretty-printer generation from context-
free grammar
9/29
Migration
Translate a program into another language at the same level of abstraction
Examples: transformations between dialects (from Fortran 77 to Fortran 90, from Pascal to C)
10/29
Reverse Engineering
To extract the program from a lower-level to a high-level program or some higher-level aspects
The dual of synthesis Examples:
Decompilation of an object program into a high-level program
Architecture (design) extraction Documentation generation Software visualization (some aspects are depicted
in an abstract way)
11/29
Analysis
Reduce a program to one aspect such as its control- or data- flow.
A transformation to a sub-language to an aspect language
12/29
2.2 Rephrasing scenarios
Rephrasing says the same thing in different words
Improving some aspect of the program The semantics of the program are
changed Sub-scenarios: normalization,
optimization, refactoring, and renovation
13/29
Normalization
Reduces a program to a program in a sub-language decreasing its syntactic complexity
Examples Desugaring: syntactic sugar of a language is
transformed into more fundamental constructs Desugar Haskell to its kernel language EBNF to BNF
Simplification: a program is reduced to a normal (standard form)
Canonical form of intermediate representation Algebraic simplification of expressions
14/29
Optimization
Improves the run-time/space performance
Examples: Fusion Inlining Constant propagation Constant folding Common-subexpression elimination Dead code elimination
15/29
refactoring
Refactoring improves the design of a program by restructuring, preserving functionality
Obfuscation makes program harder to understand by renaming variables, inserting dead code preventing reverse engineering
16/29
renovation
Renovation repairs an error or bring the program up to date with changed requirements
The extensional semantics is changed
Example: Y2K
17/29
Section3: program representation
Some PT systems directly work on text Most systems use structured representation so parser and unparser are needed for
conversion between text and structure In some programming framework (IP),
programs are stored, edited and processed as source graphs.
What representation form should be used?
18/29
3.1 parse tree or AST Internal representation of a program to be
transformed Parser tree contain syntactic information such as layout (white
space and comments), parentheses, and extra nodes disambiguating grammar transformation
Normally AST is used since this info is irrelevant for the transformation
Layout Some applications (renovation, refactoring) need restore the layout
through the transformation. parse tree is used Where to insert the comments in a generic manner? (origin tracking)
Types Optimization and compilation might need type info to be stored in an
extension of a tree format
19/29
3.2 concrete or abstract syntax
The representation of program fragments in the specification of transformation rules
ASTs are usually represented using data structure: records, objects, algebraic data types, terms
When the ASTs (in a data structure facility) are easy to be composed and decomposed Such as in compiler, small fragments are manipulated each time AST is used directly in the specification So, the AST can be used as an intermediate language such as multiple
language can be expressed in. Otherwise (when the conceptual distance between concrete
program and the data structure access operations used in AST is too large) Transformation languages support concrete object syntax for the
programmer to define the transformations while internally use the AST.
20/29
3.3 Trees or Graphs Program structures can be represented by trees, DAGs, or full fledged
graphs with cycles Copy
Tree requires a complete copy For DAG, only a pointer to the tree gets copied
sub-tree shared by multiple context; reduced memory usage; testing of equality is cheap But the context need to be rebuilt when a tree is transformed because two occurrences of a
shared tree that are syntactically the same can have a different meaning depending on their context
Annotation DAG: annotations associated with each occurrences of the tree, Not desirable
Full fledged graph for back links (loops, link to declarations) More problematic: sub-graph can have links to the entire graph, it may be required to
reconstruct the entire graph after a transformation if it is necessary to keep the original graph.
21/29
3.4 Variable binding
Transformation system has no special knowledge for the variables in the syntax tree
Transformations need to be aware of variables by means of extra conditions to avoid the problem of free variable capture and lifting the variable occurrences out of binding
Higher-order abstract syntax (HOAS) : transparent handling of variable binding by encoding variable binding as lambda abstractions
FreshML has a weaker mechanism by transparently refreshes variable names
All approaches that rename variables are in conflict with requirements of preserving the original name (refactoring, renovation)
Associating declaration info (symbol table, annotating the usage occurrences of a symbol with the declaration)
22/29
3.5 exchange format
Program representation should be exchangeable among the transformation components
Examples XML: supports exchange of tree shaped
data Annotated Term Format: supports
exchange of DAGs maintaining maximal sharing
23/29
Section4 implementation of PT
A complex program transformation is achieved through a number of consecutive modifications of a program
A Rule: defines a basic step in the transformation of a program A Strategy: is a plan for achieving a complex transformation
using a set of rules Strategy for the reproducible and automated transformations
versus the interactive transformation Example
let x=3 in x+y let x=3 in 3+y 3+y (inline, dead variable)
Let x=3 in x+ylet x=3 in let newfun(z)=z+y in newfun(x)let newfun(z)=z+y in let x=3 in newfun(x)(extract function, hoist)
24/29
4.1. Transformation rules
Rules preserves the extensional semantics Some other aspects change (might be
intentional semantics): time/space resource usage
A rule: Syntactic recognition Some semantic condition verification The replacement contains
A term pattern A function constructing a new tree or graph A semantic action with arbitrary side-effects
25/29
4.2 transformation strategies A set of transformation rules for a programming
language induces a rewrite relation on programs If the relation is confluent and terminating, there is a
unique normal form for every program; the matter is applying the rules most efficiently to reach the normal form
In PT, it is usually not the case: A set of transformation rules can give rise to infinite
branches (by inlining recursive function) Inverses to undone a transformation(by distribution or
commutatively rules) Non-confluence in which a program can be transformed
into different programs
26/29
4.2 cont’ For a specific program it is always possible to find the shortest
path to the optimal solution for a specific transformation task A strategy is a algorithm for choosing a path in the rewrite
relation Given one set of rules, there can be many strategies, each
achieving a different goal A strategy can be provided by the engine or the user:
Fixed application order: the engine applies rules exhaustively according to a built-in strategy
Automatic dependency analysis: based on the analysis of rules Goal driven: to apply rules to achieve a user defined goal Strategy menu Completely programmable in a strategy language
The strategy needs to be expressed and implemented formally
27/29
4.2 cont’ strategies languages ingredients Sequential composition
Consecutively, conditionally, iteratively or recursively compose the rules
Non-deterministic programming Speculatively explore paths until an acceptable
solution is found Explore all paths in parallel and choose the
best (cost function to compare the solution) Goal based exploration: discard the path
inconsistent with a set of constraints
28/29
4.2 cont’ strategies languages ingredients Structural traversal:
A rewrite relation includes application of rules in any context
The strategy should determine the location where the rule is applied
find the location in the program representation structure, apply the rule, rebuild the context
It is inefficient to directly use the tree paths (context) Some mechanism is needed to traverse syntax trees and
to apply transformation rules at subtree Language specific/generic transformation mechanism Top-down / bottom-up traversal Primitive and composite traversal
29/29
4.2 cont’ strategies languages ingredients Information carrying strategies:
strategies may carry information that can be used in making decisions about paths to take an in passing context-sensitive information to rules An important aspect: the scope in which
the information is valid
30/29
4.2 cont’ strategies languages ingredients Separation of rules and strategies
In the implementation, the rule can be hardwired in the definition of the strategies
Strategies are parameterized with a set of rules Clearer specification that allow reasoning about smaller entities (rules,
strategies) separately Separation definition enables the reuse of rules, strategies, and the
generic implementation of aspects of transformation system for a class of languages
Intertwining may sometimes be required for efficiency reason, but should be done by a compiler rather than the specifier
Abstraction The strategy language should have proper abstraction over the rules
and strategy, I.e. it should be possible to name and parameterize the rules and strategies