Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Microsoft Research Cambridge

Joint work with Earl T. Barr, Marc

Brockschmidt, Santanu Dash,

Mahmoud Khademi

Program

Structure

Deep

Learning✓ Understands images/language/speech

✓ Finds patterns in noisy data

- Requires many samples

- Handling structured data is hard

✓ Interpretable

✓ Generalisation verifiable

- Manual effort

- Limited to specialists

DPU

Machine Learning (ML) component →Artificial Intelligence (AI) Tool

Research in ML4Code

https://visualstudio.microsoft.com/services/intellicode/

http://www.eclipse.org/recommenders/

public class TextRunnerTest extends TestCase {void execTest(String testClass, boolean success) throws Exception {

...InputStream i = p.getInputStream();while ((i.read()) != -1);...

}...

}

Suggested Name

input (81.9%)

http://jsnice.org/

Deep Learning Type Inference

V. Hellendoorn, C. Bird, E.T. Barr, M. Allamanis. 2018

Predicting Program Properties from Code

V. Raychev, M. Vechev, A. Krause. 2015

Variable Misuse

Allamanis et al. “Learning to Represent Programs with Graphs”. 2018

Defined Types

string

string

Allamanis, Brockschmidt, Khademi. ICLR 2018

int int int

int

return

for (int i =0; < ; ++)

if ( [ ]>0)

+= [ ];

int int int

int

return

for (int i = 0; i < lim; i++)

if (arr[i] > 0)

sum += arr[i];

Assert.NotNull(clazz);

Assert . (NotNull …

ExpressionStatement

InvocationExpression

MemberAccessExpression ArgumentList

Next Token

AST Child

(x, y) = Foo();

while (x > 0)

x = x + y;

Last Write

Last Use

Computed From

int int int

int

return

for (int i =0; < ; ++)

if ( [ ]>0)

+= [ ];

~900 nodes/graph ~8k edges/graph

BA

EG

D

C

F

Li et al (2015). Gated Graph Sequence Neural Networks.

BA

EG

D

C

F

Gilmer et al (2017). Neural Message Passing for Quantum Chemistry.

F

D

E

E F

D F

F

F

F

D

E

E F

D F

Li et al (2015). Gated graph sequence neural networks.

Li et al (2015). Gated Graph Sequence Neural Networks.Gilmer et al (2017). Neural Message Passing for Quantum Chemistry.

• node selection• node classification• graph classification

https://github.com/Microsoft/gated-graph-neural-network-samples

BA

E G

D

C

F

BA

E G

D

C

F

Seen Projects: 24 F/OSS C# projects (2060 kLOC): Used for train and test

3.8 type-correct alternative variables per slot (median 3, σ= 2.6)

Accuracy (%) BiGRU BiGRU+Dataflow GGNN

Seen Projects 50.0 73.7 85.5

Accuracy (%) BiGRU BiGRU+Dataflow GGNN

Seen Projects 50.0 73.7 85.5

Unseen Projects 28.9 60.2 78.2

Seen Projects: 24 F/OSS C# projects (2060 kLOC): Used for train and test

Unseen Projects: 3 F/OSS C# projects (228 kLOC): Used only for test

3.8 type-correct alternative variables per slot (median 3, σ= 2.6)

Dash, Allamanis, Barr. FSE 2018

def

addToCart(productId, providerId, cartId)

username := password

temperature + numOfOranges

Defined Types

string

string

string EncryptAndSignCookie(string cookieValue, FormsAuthenticationConfiguration config) {

string encryptedCookie = config.CryptographyConfiguration.EncryptionProvider.Encrypt(cookieValue);

var hmacBytes = GenerateHmac(encryptedCookie, config);string hmacString = Convert.ToBase64String(hmacBytes);

return hmacString + encryptedCookie;}

def

return

def string

def

def string

Full name of node or constant value in bepuphysics

damping, SuspensionDamping, starchDamping, dampingConstant, angularDamping, LinearDamping

currentDistance, distance3, candidateDistance, pointDistance, distanceFromMaximum, grabDistance, VariableLinearSpeedCurve::GetDistance, tempDistance

goalVelocity, driveSpeed, GoalSpeed

minRadius, MinimumRadius, Radius, minimumRadiusA, WrappedShape::ComputeMinimumRadius, topRadius, MaximumRadius, graphicalRadius, TransformableShape::ComputeMaximumRadius

blendedCoefficient, KineticFriction, dynamicCoefficient,KineticBreakingFrictionCoefficient

angle, myMaximumAngle, MinimumAngle, currentAngle, MaximumAngle, steeringAngle, MathHelper::WrapAngle

targetHeight, Height, ProneHeight, crouchingHeight, standingHeight

Mass, effectiveMass, newMassA, newMass

M22, m11, M44, resultM44, M43, intermediate, m31, X, Y, Z

UI/UX

ML Capabilities

Metrics

Low resources

Learning Signals

target

prediction

𝑓𝜃(𝑥)input

data 𝑥

model of problem

• Given dataset 𝑥1, 𝑦1 , … , 𝑥𝑁 , 𝑦𝑁• Minimize Loss ℒ 𝜃 =

1

𝑁σ𝑖 𝐿 𝑓𝜃 𝑥𝑖 , 𝑦𝑖

Slides at:

http://web.cs.ucdavis.edu/~su/SteeleSplash2016.pdf

Deep Program Understanding

Cambridge, UK

Learning from Human

Aspects of Code

Reasoning over

Rich Structures

Towards Learned Program

Analyses with Machine Learning

miltos1

https://miltos.allamanis.com

Microsoft Research CambridgeProgram Structure Deep Learning Understands images/language/speech Finds patterns in noisy data - Requires many samples - Handling structured data is hard

Documents