Programming with “Big Code” - GitHub Pages · Predicting Program Properties from “Big Code”, POPL 2015 Fast and Precise Statistical Code Completion, ETH TR Statistical Feedback

Post on 23-May-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Programming with “Big Code”: Lessons, Techniques, Applications

Pavol Bielik, Veselin Raychev, Martin VechevDepartment of Computer ScienceETH Zurich

Work @ ETH Zurich

Work on “Big Code” started a few years ago

Code Completion with Statistical Language Models, PLDI 2014Machine Translation for Programming Languages, Onward 2014Predicting Program Properties from “Big Code”, POPL 2015Fast and Precise Statistical Code Completion, ETH TRStatistical Feedback Generation for Programs, ETH TRProgramming with Big Code: Lessons, Techniques and Applications, SNAPL 2015

Prof.Martin Vechev

Prof.AndreasKrause

VeselinRaychev

PavolBielik

Svetoslav Karaivanov

ChristineZeller

PascalRoos

Applications[PLDI 14]SLANG: Code Completion

Intent i = new Intent();

?ctx.sendBroadcast(i);

All of these benefit from the “Big Code” and lead to applications not possible with previous techniques

Applications[PLDI 14]SLANG: Code Completion

Intent i = new Intent();

?ctx.sendBroadcast(i);

P( Java | C# )P( C# | Java )P( Java )

[Onward 14]Programming Language Translation

All of these benefit from the “Big Code” and lead to applications not possible with previous techniques

...for x in range(a):

print a[x]

[submitted]Statistical Feedback Generation

Applications[PLDI 14]SLANG: Code Completion

Intent i = new Intent();

?ctx.sendBroadcast(i);

likely error

P( Java | C# )P( C# | Java )P( Java )

[Onward 14]Programming Language Translation

All of these benefit from the “Big Code” and lead to applications not possible with previous techniques

[POPL 15]JSNice: DeobfuscationType Prediction

...for x in range(a):

print a[x]

[submitted]Statistical Feedback Generation

Applications[PLDI 14]SLANG: Code Completion

Intent i = new Intent();

?ctx.sendBroadcast(i);

likely error

P( Java | C# )P( C# | Java )P( Java )

[Onward 14]Programming Language Translation

All of these benefit from the “Big Code” and lead to applications not possible with previous techniques

Probabilistic Programming Systems: Dimensions

Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model(ML)

What is a generic metric for code?Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model(ML)

✔ Cross Entropy → ✗ Code Completion✔ BLEU Score → ✗ Program Translation

Probabilistic Programming Systems: Dimensions

Traditional metrics might not be indicative of client performance

What is the best program representation?Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model(ML)

Probabilistic Programming Systems: Dimensions

What is the best program representation?Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model(ML)

Probabilistic Programming Systems: Dimensions

Sequences

req → {<open, 0>, <send, 0>}source → {..., <open, 2>}

=

a +

x y

Trees

Graphical Models Feature Vectors

req → (0,0,1,1,0)source → (1,0,0,0,0)...

What is the best program representation?Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model(ML)

Probabilistic Programming Systems: Dimensions

Choosing the right representation is crucial

Feedback Generation: Sequence representations

Allamanis et. al. [2013] 46.4%

Hsiao et. al. [2014] 50.8%

Incorporate semantic information 75.3%

Incorporate dataflow analysis 86.3%

Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model(ML)

How to extract program representation?SLANG (APIs): alias and typestate analysisJSNice (Variable Names): scope and alias analysisFeedback Generation: alias, control-flow and typestate analysis

Probabilistic Programming Systems: Dimensions

req.open("GET", source, false);

req → {<open, 0>, <send, 0>}source → {..., <open, 2>}

Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model(ML)

How to extract program representation?SLANG (APIs): alias and typestate analysisJSNice (Variable Names): scope and alias analysisFeedback Generation: alias, control-flow and typestate analysis

Design scalable yet precise enough algorithms

Probabilistic Programming Systems: Dimensions

1

0.5

0

no alias analysiswith alias analysis

1% 10% 100%

[Precision vs % of data used]

Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model(ML)

What is the suitable probabilistic model?N-gram language model

Probabilistic context-free grammarsNeural networks

(Structured) Support vector machineConditional Random Fields

...

Probabilistic Programming Systems: Dimensions

Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model(ML)

What is the suitable probabilistic model?N-gram language model

Probabilistic context-free grammarsNeural networks

(Structured) Support vector machineConditional Random Fields

...

Probabilistic Programming Systems: Dimensions

Baseline 25.3%

Independent 54.1%

Structured 63.4%

Structured prediction is critical

Programming with “Big Code”Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model

Code completionDeobfuscation

Program synthesis

Feedback generation

Translation

alias analysis

typestate analysis

Graphical Models

N-gram language modelSVM Structured SVM

Neural Networks

Sequences (sentences)

Trees

Translation TableFeature Vectors

control-flow analysis

scope analysis

argmax P(y | x)

y ∈ Ω

Programming with “Big Code”Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model

Code completionDeobfuscation

Program synthesis

Feedback generation

Translation

alias analysis

typestate analysis

Graphical Models

N-gram language modelSVM Structured SVM

Neural Networks

Sequences (sentences)

Trees

Translation TableFeature Vectors

control-flow analysis

scope analysis

argmax P(y | x)

y ∈ Ω

Greedy MAP Inference

http://www.nice2predict.org/http://www.srl.inf.ethz.ch/spas.php More information and tutorials at:

General framework

http://www.nice2predict.org/

We have open-sourced our prediction engine and we are extending it with new capabilities

Upcoming PLDI’15 tutorial

Programming with “Big Code”Applications

Intermediate Representation

Analyze Program(PL)

Train Model(ML)

Query Model

Code completionDeobfuscation

Program synthesis

Feedback generation

Translation

alias analysis

typestate analysis

Graphical Models

N-gram language modelSVM Structured SVM

Neural Networks

Sequences (sentences)

Trees

Translation TableFeature Vectors

control-flow analysis

scope analysis

argmax P(y | x)

y ∈ Ω

Greedy MAP Inference

http://www.nice2predict.org/http://www.srl.inf.ethz.ch/spas.php More information and tutorials at:

top related