Top Banner
Microsoft Research Cambridge Joint work with Earl T. Barr, Marc Brockschmidt, Santanu Dash, Mahmoud Khademi
64

Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Aug 17, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Microsoft Research Cambridge

Joint work with Earl T. Barr, Marc

Brockschmidt, Santanu Dash,

Mahmoud Khademi

Page 2: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Program

Structure

Deep

Learning✓ Understands images/language/speech

✓ Finds patterns in noisy data

- Requires many samples

- Handling structured data is hard

✓ Interpretable

✓ Generalisation verifiable

- Manual effort

- Limited to specialists

DPU

Page 3: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 4: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Machine Learning (ML) component →Artificial Intelligence (AI) Tool

Page 5: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Research in ML4Code

Page 6: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 7: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

https://visualstudio.microsoft.com/services/intellicode/

http://www.eclipse.org/recommenders/

Page 8: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

public class TextRunnerTest extends TestCase {void execTest(String testClass, boolean success) throws Exception {

...InputStream i = p.getInputStream();while ((i.read()) != -1);...

}...

}

Suggested Name

input (81.9%)

Page 9: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 10: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

http://jsnice.org/

Deep Learning Type Inference

V. Hellendoorn, C. Bird, E.T. Barr, M. Allamanis. 2018

Predicting Program Properties from Code

V. Raychev, M. Vechev, A. Krause. 2015

Page 11: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 12: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Variable Misuse

Allamanis et al. “Learning to Represent Programs with Graphs”. 2018

Page 13: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Defined Types

string

string

Page 14: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Allamanis, Brockschmidt, Khademi. ICLR 2018

Page 15: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

int int int

int

return

for (int i =0; < ; ++)

if ( [ ]>0)

+= [ ];

int int int

int

return

for (int i = 0; i < lim; i++)

if (arr[i] > 0)

sum += arr[i];

Page 16: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Assert.NotNull(clazz);

Assert . (NotNull …

ExpressionStatement

InvocationExpression

MemberAccessExpression ArgumentList

Next Token

AST Child

Page 17: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

(x, y) = Foo();

while (x > 0)

x = x + y;

Last Write

Last Use

Computed From

Page 18: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

int int int

int

return

for (int i =0; < ; ++)

if ( [ ]>0)

+= [ ];

~900 nodes/graph ~8k edges/graph

Page 19: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

BA

EG

D

C

F

Li et al (2015). Gated Graph Sequence Neural Networks.

BA

EG

D

C

F

Gilmer et al (2017). Neural Message Passing for Quantum Chemistry.

Page 20: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

F

D

E

E F

D F

F

Page 21: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

F

F

D

E

E F

D F

Li et al (2015). Gated graph sequence neural networks.

Page 22: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 23: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 24: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Li et al (2015). Gated Graph Sequence Neural Networks.Gilmer et al (2017). Neural Message Passing for Quantum Chemistry.

• node selection• node classification• graph classification

https://github.com/Microsoft/gated-graph-neural-network-samples

Page 25: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 26: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 27: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

BA

E G

D

C

F

Page 28: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

BA

E G

D

C

F

Page 29: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 30: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Seen Projects: 24 F/OSS C# projects (2060 kLOC): Used for train and test

3.8 type-correct alternative variables per slot (median 3, σ= 2.6)

Accuracy (%) BiGRU BiGRU+Dataflow GGNN

Seen Projects 50.0 73.7 85.5

Page 31: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Accuracy (%) BiGRU BiGRU+Dataflow GGNN

Seen Projects 50.0 73.7 85.5

Unseen Projects 28.9 60.2 78.2

Seen Projects: 24 F/OSS C# projects (2060 kLOC): Used for train and test

Unseen Projects: 3 F/OSS C# projects (228 kLOC): Used only for test

3.8 type-correct alternative variables per slot (median 3, σ= 2.6)

Page 32: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 33: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Dash, Allamanis, Barr. FSE 2018

Page 34: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

def

addToCart(productId, providerId, cartId)

username := password

temperature + numOfOranges

Page 35: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Defined Types

string

string

Page 36: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

string EncryptAndSignCookie(string cookieValue, FormsAuthenticationConfiguration config) {

string encryptedCookie = config.CryptographyConfiguration.EncryptionProvider.Encrypt(cookieValue);

var hmacBytes = GenerateHmac(encryptedCookie, config);string hmacString = Convert.ToBase64String(hmacBytes);

return hmacString + encryptedCookie;}

Page 37: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 38: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 39: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

def

return

Page 40: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 41: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

def string

Page 42: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

def

def string

Page 43: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 44: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 45: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 46: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 47: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 48: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 49: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 50: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 51: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 52: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 53: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 54: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 55: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 56: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 57: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 58: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 59: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Full name of node or constant value in bepuphysics

damping, SuspensionDamping, starchDamping, dampingConstant, angularDamping, LinearDamping

currentDistance, distance3, candidateDistance, pointDistance, distanceFromMaximum, grabDistance, VariableLinearSpeedCurve::GetDistance, tempDistance

goalVelocity, driveSpeed, GoalSpeed

minRadius, MinimumRadius, Radius, minimumRadiusA, WrappedShape::ComputeMinimumRadius, topRadius, MaximumRadius, graphicalRadius, TransformableShape::ComputeMaximumRadius

blendedCoefficient, KineticFriction, dynamicCoefficient,KineticBreakingFrictionCoefficient

angle, myMaximumAngle, MinimumAngle, currentAngle, MaximumAngle, steeringAngle, MathHelper::WrapAngle

targetHeight, Height, ProneHeight, crouchingHeight, standingHeight

Mass, effectiveMass, newMassA, newMass

M22, m11, M44, resultM44, M43, intermediate, m31, X, Y, Z

Page 60: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard
Page 61: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

UI/UX

ML Capabilities

Metrics

Low resources

Page 62: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Learning Signals

target

prediction

𝑓𝜃(𝑥)input

data 𝑥

model of problem

• Given dataset 𝑥1, 𝑦1 , … , 𝑥𝑁 , 𝑦𝑁• Minimize Loss ℒ 𝜃 =

1

𝑁σ𝑖 𝐿 𝑓𝜃 𝑥𝑖 , 𝑦𝑖

Page 63: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Slides at:

http://web.cs.ucdavis.edu/~su/SteeleSplash2016.pdf

Page 64: Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Deep Program Understanding

Cambridge, UK

Learning from Human

Aspects of Code

Reasoning over

Rich Structures

Towards Learned Program

Analyses with Machine Learning

miltos1

https://miltos.allamanis.com