Microsoft Research Cambridge Joint work with Earl T. Barr, Marc Brockschmidt, Santanu Dash, Mahmoud Khademi
Microsoft Research Cambridge
Joint work with Earl T. Barr, Marc
Brockschmidt, Santanu Dash,
Mahmoud Khademi
Program
Structure
Deep
Learning✓ Understands images/language/speech
✓ Finds patterns in noisy data
- Requires many samples
- Handling structured data is hard
✓ Interpretable
✓ Generalisation verifiable
- Manual effort
- Limited to specialists
DPU
Machine Learning (ML) component →Artificial Intelligence (AI) Tool
Research in ML4Code
https://visualstudio.microsoft.com/services/intellicode/
http://www.eclipse.org/recommenders/
public class TextRunnerTest extends TestCase {void execTest(String testClass, boolean success) throws Exception {
...InputStream i = p.getInputStream();while ((i.read()) != -1);...
}...
}
Suggested Name
input (81.9%)
http://jsnice.org/
Deep Learning Type Inference
V. Hellendoorn, C. Bird, E.T. Barr, M. Allamanis. 2018
Predicting Program Properties from Code
V. Raychev, M. Vechev, A. Krause. 2015
Variable Misuse
Allamanis et al. “Learning to Represent Programs with Graphs”. 2018
Defined Types
string
string
Allamanis, Brockschmidt, Khademi. ICLR 2018
int int int
int
return
for (int i =0; < ; ++)
if ( [ ]>0)
+= [ ];
int int int
int
return
for (int i = 0; i < lim; i++)
if (arr[i] > 0)
sum += arr[i];
Assert.NotNull(clazz);
Assert . (NotNull …
ExpressionStatement
InvocationExpression
MemberAccessExpression ArgumentList
Next Token
AST Child
(x, y) = Foo();
while (x > 0)
x = x + y;
Last Write
Last Use
Computed From
int int int
int
return
for (int i =0; < ; ++)
if ( [ ]>0)
+= [ ];
~900 nodes/graph ~8k edges/graph
BA
EG
D
C
F
Li et al (2015). Gated Graph Sequence Neural Networks.
BA
EG
D
C
F
Gilmer et al (2017). Neural Message Passing for Quantum Chemistry.
F
D
E
E F
D F
F
F
F
D
E
E F
D F
Li et al (2015). Gated graph sequence neural networks.
Li et al (2015). Gated Graph Sequence Neural Networks.Gilmer et al (2017). Neural Message Passing for Quantum Chemistry.
• node selection• node classification• graph classification
https://github.com/Microsoft/gated-graph-neural-network-samples
BA
E G
D
C
F
BA
E G
D
C
F
Seen Projects: 24 F/OSS C# projects (2060 kLOC): Used for train and test
3.8 type-correct alternative variables per slot (median 3, σ= 2.6)
Accuracy (%) BiGRU BiGRU+Dataflow GGNN
Seen Projects 50.0 73.7 85.5
Accuracy (%) BiGRU BiGRU+Dataflow GGNN
Seen Projects 50.0 73.7 85.5
Unseen Projects 28.9 60.2 78.2
Seen Projects: 24 F/OSS C# projects (2060 kLOC): Used for train and test
Unseen Projects: 3 F/OSS C# projects (228 kLOC): Used only for test
3.8 type-correct alternative variables per slot (median 3, σ= 2.6)
Dash, Allamanis, Barr. FSE 2018
def
addToCart(productId, providerId, cartId)
username := password
temperature + numOfOranges
Defined Types
string
string
string EncryptAndSignCookie(string cookieValue, FormsAuthenticationConfiguration config) {
string encryptedCookie = config.CryptographyConfiguration.EncryptionProvider.Encrypt(cookieValue);
var hmacBytes = GenerateHmac(encryptedCookie, config);string hmacString = Convert.ToBase64String(hmacBytes);
return hmacString + encryptedCookie;}
def
return
def string
def
def string
Full name of node or constant value in bepuphysics
damping, SuspensionDamping, starchDamping, dampingConstant, angularDamping, LinearDamping
currentDistance, distance3, candidateDistance, pointDistance, distanceFromMaximum, grabDistance, VariableLinearSpeedCurve::GetDistance, tempDistance
goalVelocity, driveSpeed, GoalSpeed
minRadius, MinimumRadius, Radius, minimumRadiusA, WrappedShape::ComputeMinimumRadius, topRadius, MaximumRadius, graphicalRadius, TransformableShape::ComputeMaximumRadius
blendedCoefficient, KineticFriction, dynamicCoefficient,KineticBreakingFrictionCoefficient
angle, myMaximumAngle, MinimumAngle, currentAngle, MaximumAngle, steeringAngle, MathHelper::WrapAngle
targetHeight, Height, ProneHeight, crouchingHeight, standingHeight
Mass, effectiveMass, newMassA, newMass
M22, m11, M44, resultM44, M43, intermediate, m31, X, Y, Z
UI/UX
ML Capabilities
Metrics
Low resources
Learning Signals
target
prediction
𝑓𝜃(𝑥)input
data 𝑥
model of problem
• Given dataset 𝑥1, 𝑦1 , … , 𝑥𝑁 , 𝑦𝑁• Minimize Loss ℒ 𝜃 =
1
𝑁σ𝑖 𝐿 𝑓𝜃 𝑥𝑖 , 𝑦𝑖
Slides at:
http://web.cs.ucdavis.edu/~su/SteeleSplash2016.pdf
Deep Program Understanding
Cambridge, UK
Learning from Human
Aspects of Code
Reasoning over
Rich Structures
Towards Learned Program
Analyses with Machine Learning
miltos1
https://miltos.allamanis.com