A recommender system for generalizing and refining code templates Towards Coen De Roover, Tim Molderez
Jul 13, 2015
A recommender system for generalizing and refining
code templates
Towards
Coen De Roover, Tim Molderez
Life without templates?manual implementations
tools taking exotic specifications
Imperative Program Transformation by Rewriting 53
It says that an assignment x := v (where v is a variable) can be replaced byx := c if the “last assignment” to v was v := c (where c is a constant). The sidecondition formalises the notion of “last assignment”, and will be explain later inthe paper:
n : (x := v) =⇒ x := cif
n ⊢ A△(¬def(v) U def(v) ∧ stmt(v := c))conlit(c)
The rewrite language has several important properties:
– The specification is in the form of a rewrite system with the advantages ofsuccinctness and intuitiveness mentioned above.
– The rewrite system works over a control flow graph representation of theprogram. It does this by identifying and manipulating graph blocks whichare based on the idea of basic blocks but with finer granularity.
– The rewrites are executable. An implementation exists to automatically de-termine when the rewrite applies and to perform the transformation justfrom the specification.
– The relation between the conditions on the control flow graph and the op-erational semantics of the program seems to lend itself to formal reasoningabout the transformation.
The paper is organised as follows. §2 covers earlier work in the area andprovides the motivation for this work. §3 describes our method of rewriting overcontrol graphs. §4 describes the form of side conditions for those rewrites. §5gives three examples of common transformations and their application whengiven as rewrites. §6 discusses what has been achieved and possible applicationsof this work.
2 Background
Implementing optimising transformations is hard: building a good optimisingcompiler is a major effort. If a programmer wishes to adapt a compiler to aparticular task, for example to improve the optimisation of certain library calls,intricate knowledge of the compiler internals is necessary. This contrasts with thedescription of such optimisations in textbooks [1,3,26], where they are often de-scribed in a few lines of informal English. It is not surprising, therefore, that theprogram transformation community has sought declarative ways of programmingtransformations, to enable experimentation without excessive implementation ef-fort. The idea to describe program transformations by rewriting is almost as oldas the subject itself. One early implementation can be found in the TAMPRsystem by Boyle, which has been under development since the early ’70s [8,9].TAMPR starts with a specification, which is translated to pure lambda calculus,and rewriting is performed on the pure lambda expressions. Because programs
But specifying templates is still hard…
often requires multiple iterations
no unwanted matches
no required matches are missed generalization
refinement
no support for editing process
code templatesource code + meta-variables + matching directives
no disciplined methods for generalizing/refining templates
no automated support in the form of recommender system
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
I’m a “structural search and replace” on steroids
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
template
matches
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
template
matches
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
template
matches
[<component>]@[<directive>]
match any match is in source code, with matching type and properties
orsimple qualified name match is any name resolving to name in template(equals ?var) any exposes match
child, child+, child* any match is corresponding child of parent match, nested within that child (+), or either (*)
match|set list match has at least given elements, in any order
match|regexp list match has elements described by regexp
(type ?type)type/variable declaration/referenceexpression
match resolves to, is of, or declares the type of its argument
(subtype ?type), (subtype+ ?type), (subtype* /type)
type/variable declaration/reference
match resolves to a (transitive +, reflexive *) subtype of the given argument
(refers-to ?var) expression match lexically refers to local, parameter or field denoted by its argument
(invokes ?method) invocation expression match invokes given argument
[<component>]@[<directive>]
[….acceptVisitor(…)]@[(invokes ?method)][public void acceptVisitor(ComponentVisitor v)…]@[(equals ?method)]
constraining syntax, structure, data flow, control flow of matches
grouping of templates
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
? ?
?
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
return age; return ?v;introduce-variable
generalize-aliases
atomicmutation
composite mutation
public class Book {private Integer count;public Integer getCount() {
return count;}
}
public class Book {private Integer ?v1;public Integer getCount() {
return [?v2]@[(refers-to ?v1)];}
}
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
(Operator. "add-directive-invokes" operators/add-directive-invokes :refinement "Add directive invokes." opscope-subject applicability|methodinvocation "Requires matches to invoke the binding for the meta-variable." [(make-operand "Meta-variable (e.g., ?v)" opscope-variable validity|variable)])
constraints on their subject
constraints on their operands
constraints enable checking applicability of operator, validity of its operands + generating possible values!
(Operator. "remove-node" operators/remove-node :destructive "Remove from template." opscope-subject applicability|deleteable "Removes its selection from the template." [])
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
populate determine fitness
best individual good
enough?
select parents crossover & mutation
given enumeration of desired matches
yay!
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
individual = group of templates - copies of existing group (from editor state) - one template group per row,
consisting of a template per column in desired matches (from scratch)
populate
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
a) match each template group in population against program
b) determine precision and recall w.r.t. desired matches
c) penalize excess use of directives (KISS)
determine fitness
concurrently!
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
16 2 Tree-based GP
3
1y
∗
+
yx
+
+
2x
/
CrossoverPoint
CrossoverPoint
3
+
2x
/
(x+y)+3
(y+1) (x/2)*
(x/2)+3
Parents Offspring
GARBAGE
Figure 2.5: Example of subtree crossover. Note that the trees on the leftare actually copies of the parents. So, their genetic material can freely beused without altering the original individuals.
to crossover operations frequently exchanging only very small amounts ofgenetic material (i.e., small subtrees); many crossovers may in fact reduceto simply swapping two leaves. To counter this, Koza (1992) suggested thewidely used approach of choosing functions 90% of the time and leaves 10%of the time. Many other types of crossover and mutation of GP trees arepossible. They will be described in Sections 5.2 and 5.3, pages 42–46.
The most commonly used form of mutation in GP (which we will callsubtree mutation) randomly selects a mutation point in a tree and substi-tutes the subtree rooted there with a randomly generated subtree. This isillustrated in Figure 2.6. Subtree mutation is sometimes implemented ascrossover between a program and a newly generated random program; thisoperation is also known as “headless chicken” crossover (Angeline, 1997).
Another common form of mutation is point mutation, which is GP’srough equivalent of the bit-flip mutation used in genetic algorithms (Gold-berg, 1989). In point mutation, a random node is selected and the primitivestored there is replaced with a di↵erent random primitive of the same aritytaken from the primitive set. If no other primitives with that arity ex-ist, nothing happens to that node (but other nodes may still be mutated).When subtree mutation is applied, this involves the modification of exactlyone subtree. Point mutation, on the other hand, is typically applied on a
[Genetic programming, a field guide]
select parents
mutation crossover
1/ advanced code templates in Ekeko/X2/ formal operators for template mutation3/ genetic search for mutation recommendations
best templates after 30 iterations
[public void acceptVisitor(ComponentVisitor v);]@[(invoked-by ?v17892744)]comp.acceptVisitor(v)
0.91
desired
[public void acceptVisitor(ComponentVisitor v){...}]@[(invoked-by ?v20420073)]comp.acceptVisitor(v)
0.91
[public void acceptVisitor(ComponentVisitor v) ??v23406365]@[(invoked-by ??v23499077)]?v23184877(v)
0.90
[….acceptVisitor(…)]@[(invokes ?method)][public void acceptVisitor(ComponentVisitor v)…]@[(equals ?method)]
Ongoing Experiment
RQ1: how effective is the search in findingtemplate changes?
RQ2: do users find the recommended changes helpful?
RQ3: do composite, template-specific mutations converge more quickly to a solution than generic code mutations?
finetuning: very sensitive to probabilities of crossover and mutation, quality of RNG, diversity in population, …