Top Banner
Meaningful Variable Names for Decompiled Code: A Machine Translation Approach Alan Jaffe, Jeremy Lacomis, Edward J. Schwartz*, Claire Le Goues, and Bogdan Vasilescu *
56

Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Jul 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Meaningful Variable Names for Decompiled Code:

A Machine Translation Approach

Alan Jaffe, Jeremy Lacomis, Edward J. Schwartz*, Claire Le Goues, and Bogdan Vasilescu

*

Page 2: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Problem: Obfuscated Variable Names in Code

2

function callback(error, response, body) {if (!error && response.statusCode == 200) {

var info = JSON.parse(body);…

function callback(o, s, a) {if (!o && s.statusCode == 200) {

var c = JSON.parse(a);…

Minified JavaScript:

Page 3: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Problem: Obfuscated Variable Names in Code

3

function callback(error, response, body) {if (!error && response.statusCode == 200) {

var info = JSON.parse(body);…

function callback(o, s, a) {if (!o && s.statusCode == 200) {

var c = JSON.parse(a);…

Minified JavaScript:

Page 4: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Problem: Obfuscated Variable Names in Code

4

function callback(error, response, body) {if (!error && response.statusCode == 200) {

var info = JSON.parse(body);…

function callback(o, s, a) {if (!o && s.statusCode == 200) {

var c = JSON.parse(a);…

cp = buf;(void)asxTab(level + 1);for (n = asnContents(asn, buf, 512); n > 0; n--) {

printf(" %02X ", *(cp++));}

v14 = &v15;asxTab(a2 + 1);for (v13 = asnContents(a1, &v15, 512LL); v13 > 0; --v13) {

v9 = (unsignedchar*)(v14++);printf(" %02X ", *v9);

}

Minified JavaScript:

Decompiled C Code:

Page 5: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Problem: Obfuscated Variable Names in Code

5

function callback(error, response, body) {if (!error && response.statusCode == 200) {

var info = JSON.parse(body);…

function callback(o, s, a) {if (!o && s.statusCode == 200) {

var c = JSON.parse(a);…

cp = buf;(void)asxTab(level + 1);for (n = asnContents(asn, buf, 512); n > 0; n--) {

printf(" %02X ", *(cp++));}

v14 = &v15;asxTab(a2 + 1);for (v13 = asnContents(a1, &v15, 512LL); v13 > 0; --v13) {

v9 = (unsignedchar*)(v14++);printf(" %02X ", *v9);

}

Minified JavaScript:

Decompiled C Code:

Page 6: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Problem: Obfuscated Variable Names in Code

6

function callback(error, response, body) {if (!error && response.statusCode == 200) {

var info = JSON.parse(body);…

function callback(o, s, a) {if (!o && s.statusCode == 200) {

var c = JSON.parse(a);…

Minified JavaScript:

• Software is “natural” [Hindle et al., 2011].

Page 7: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Problem: Obfuscated Variable Names in Code

7

function callback(error, response, body) {if (!error && response.statusCode == 200) {

var info = JSON.parse(body);…

function callback(o, s, a) {if (!o && s.statusCode == 200) {

var c = JSON.parse(a);…

Minified JavaScript:

• Software is “natural” [Hindle et al., 2011].

• Use large corpora + machine learning to predict better identifier names.• Corpora are easy to generate!

Page 8: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Problem: Obfuscated Variable Names in Code

8

function callback(error, response, body) {if (!error && response.statusCode == 200) {

var info = JSON.parse(body);…

function callback(o, s, a) {if (!o && s.statusCode == 200) {

var c = JSON.parse(a);…

Minified JavaScript:

• Software is “natural” [Hindle et al., 2011].

• Use large corpora + machine learning to predict better identifier names.• Corpora are easy to generate!

• Bavishi et al., Context2Name, 2017• Vasilescu et al., JSNaughty, 2017• Raychev et al., JSNice, 2015

Page 9: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Problem: Obfuscated Variable Names in Code

9

cp = buf;(void)asxTab(level + 1);for (n = asnContents(asn, buf, 512); n > 0; n--) {

printf(" %02X ", *(cp++));}

v14 = &v15;asxTab(a2 + 1);for (v13 = asnContents(a1, &v15, 512LL); v13 > 0; --v13) {

v9 = (unsignedchar*)(v14++);printf(" %02X ", *v9);

}

Decompiled C Code:

Can we use similar strategies for decompiled code?

Page 10: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Statistical Machine Translation (SMT)

10

• Noisy channel model

Page 11: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Statistical Machine Translation (SMT)

11

• Noisy channel model• English à French:

Page 12: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Statistical Machine Translation (SMT)

12

• Noisy channel model• English à French:

Va faire de la recherche!Go do some research!

Page 13: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Statistical Machine Translation (SMT)

13

• Noisy channel model• English à French:

Va faire de la recherche!Go do some research!

!"#$!%&( ) *)

Page 14: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Statistical Machine Translation (SMT)

14

• Noisy channel model• English à French:

Va faire de la recherche!Go do some research!

= "#$%"&' ) * +))(+)

)(*)"#$%"&') + *)

= "#$%"&') * +))(+)

Page 15: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Statistical Machine Translation (SMT)

15

• Noisy channel model• English à French:

Va faire de la recherche!Go do some research!

= "#$%"&' ) * +))(+)

)(*)"#$%"&') + *)

= "#$%"&') * +))(+)

Translation Model: Probability that f is a translation of e

Page 16: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Statistical Machine Translation (SMT)

16

• Noisy channel model• English à French:

Va faire de la recherche!Go do some research!

= "#$%"&' ) * +))(+)

)(*)"#$%"&') + *)

= "#$%"&') * +))(+)

Language Model: “Fluency” of e

Page 17: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Statistical Machine Translation (SMT)

17

• Noisy channel model• English à French:

Va faire de la recherche!Go do some research!

= "#$%"&' ) * +))(+)

)(*)"#$%"&') + *)

= "#$%"&') * +))(+)

) * +): Translation Model

)(+): Language ModelMOSES SMT:

Page 18: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

SMT Model for Natural Language

18

Aligned French/English corpus

English corpus

Page 19: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

SMT Model for Minified JavaScript

19

Aligned original/minified source corpus

Original source corpus

Page 20: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Problem: Obfuscated Identifiers in Code

21

cp = buf;(void)asxTab(level + 1);for (n = asnContents(asn, buf, 512); n > 0; n--) {

printf(" %02X ", *(cp++));}

v14 = &v15;asxTab(a2 + 1);for (v13 = asnContents(a1, &v15, 512LL); v13 > 0; --v13) {

v9 = (unsignedchar*)(v14++);printf(" %02X ", *v9);

}

Decompiled C Code:

Can we use SMT for decompiled code?

Page 21: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

SMT Model for Decompiled Code?

22

Aligned original/decompiled source corpus

Original source corpus

Page 22: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

SMT Model for Decompiled Code?

23

Aligned original/decompiled source corpus

Original source corpus

Nontrivial

Page 23: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

24

Difficulty: Decompilation Changes Structure

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Original Source Decompiled Code

Page 24: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

25

Difficulty: Decompilation Changes Structure

• Different line count.

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Original Source Decompiled Code9 Lines 8 Lines

Page 25: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

26

Difficulty: Decompilation Changes Structure

• Different line count.• Different numbers of variables.

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Original Source Decompiled Code

Page 26: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

27

Difficulty: Decompilation Changes Structure

• Different line count.• Different numbers of variables.• Different types of loops.

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Original Source Decompiled Code

Page 27: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Decompiled Code Corpus Generation

28

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Decompiled Code

Page 28: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Decompiled Code Corpus Generation

29

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Decompiled Code

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

Original Code

Page 29: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Decompiled Code Corpus Generation

30

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Decompiled Code

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

Original Code

Page 30: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Decompiled Code Corpus Generation

31

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Decompiled Code

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

Original Code

Page 31: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Decompiled Code Corpus Generation

32

#include <stdio.h>int main() {int v1 = 0;int __;for (__ = 0; __ < 10; ++__)

printf("%d\n", __);return v1;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Decompiled Code

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

Original Code

Page 32: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Decompiled Code Corpus Generation

33

#include <stdio.h>int main() {int v1 = 0;int cur;for (cur = 0; cur < 10; ++cur)

printf("%d\n", cur);return v1;

}

❌ �

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Decompiled Code

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

Original CodeRenamed Decompiled Code

Page 33: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Better SMT Model for Decompiled Code

36

Aligned renamed/decompiled source corpus

Renamed source corpus

Page 34: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

37

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Choosing Renamings

Original Code Decompiled Code

Page 35: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

38

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Choosing Renamings

Original Code Decompiled Code

Page 36: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

39

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Choosing Renamings

• Not used as the return value.

Original Code Decompiled Code

Page 37: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

40

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Choosing Renamings

• Not used as the return value.• Used inside of a loop.

Original Code Decompiled Code

Page 38: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

41

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Choosing Renamings

• Not used as the return value.• Used inside of a loop.• Used in a function call.

Original Code Decompiled Code

Page 39: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

42

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int v2;for (v2 = 0; v2 < 10; ++v2)

printf("%d\n", v2);return v1;

}

Choosing Renamings

• Not used as the return value.• Used inside of a loop.• Used in a function call.• Same operations.

Original Code Decompiled Code

Page 40: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

43

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int __;for (__ = 0; __ < 10; ++__)

printf("%d\n", __);return v1;

}

Choosing Renamings

• Not used as the return value.• Used inside of a loop.• Used in a function call.• Same operations.

Original Code Decompiled Code

Page 41: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

44

#include <stdio.h>int main() {int cur = 0;while (cur <= 9) {

printf("%d\n", cur);++cur;

}return 0;

}

#include <stdio.h>int main() {int v1 = 0;int cur;for (cur = 0; cur < 10; ++cur)

printf("%d\n", cur);return v1;

}

Choosing Renamings

• Not used as the return value.• Used inside of a loop.• Used in a function call.• Same operations.

Original Code Decompiled Code

Page 42: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

System Architecture

45

Page 43: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Results and Evaluation

46

my_rc base2_string(base2_handle base2_h, char* buffer,size_t buffer_size)

Original

Page 44: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Results and Evaluation

47

my_rc base2_string(base2_handle base2_h, char* buffer,size_t buffer_size)

my_rc base2_string(base2_handle a1, char* a2,size_t a3)

Original

Decompiled

Page 45: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Results and Evaluation

48

my_rc base2_string(base2_handle base2_h, char* buffer,size_t buffer_size)

my_rc base2_string(base2_handle a1, char* a2,size_t a3)

Original

Decompiled

my_rc base2_string(base2_handle base2_h, char* buf,size_t len)

Renamed Decompiled

Page 46: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Results and Evaluation

49

my_rc base2_string(base2_handle base2_h, char* buffer,size_t buffer_size)

Original

my_rc base2_string(base2_handle base2_h, char* buf,size_t len)

Renamed Decompiled

Page 47: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Results and Evaluation

50

my_rc base2_string(base2_handle base2_h, char* buffer,size_t buffer_size)

Original

my_rc base2_string(base2_handle base2_h, char* buf,size_t len)

Renamed Decompiled

Exact

Page 48: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Results and Evaluation

51

my_rc base2_string(base2_handle base2_h, char* buffer,size_t buffer_size)

Original

my_rc base2_string(base2_handle base2_h, char* buf,size_t len)

Renamed Decompiled

Approx

Page 49: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Results and Evaluation

52

my_rc base2_string(base2_handle base2_h, char* buffer,size_t buffer_size)

Original

my_rc base2_string(base2_handle base2_h, char* buf,size_t len)

Renamed Decompiled

Not a match

Page 50: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Results and Evaluation

53

my_rc base2_string(base2_handle base2_h, char* buffer,size_t buffer_size)

Original

my_rc base2_string(base2_handle base2_h, char* buf,size_t len)

Renamed Decompiled

• 12.7% Exact• 16.2% Exact + Approx

Page 51: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Results and Evaluation

54

my_rc base2_string(base2_handle base2_h, char* buffer,size_t buffer_size)

Original

my_rc base2_string(base2_handle base2_h, char* buf,size_t len)

Renamed Decompiled

Not a match

• 12.7% Exact• 16.2% Exact + Approx

Page 52: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Results and Evaluation

55

my_rc base2_string(base2_handle base2_h, char* buffer,size_t buffer_size)

my_rc base2_string(base2_handle a1, char* a2,size_t a3)

Original

Decompiled

my_rc base2_string(base2_handle base2_h, char* buf,size_t len)

Renamed Decompiled

• 12.7% Exact• 16.2% Exact + Approx

Page 53: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Preliminary Investigation: Human Study

• Presented users with short snippets (<50 lines) of decompiled code, asked to perform various maintenance tasks, graded and timed:

56

Page 54: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Preliminary Investigation: Human Study

• Presented users with short snippets (<50 lines) of decompiled code, asked to perform various maintenance tasks, graded and timed:

57

1 int x = 1;2 int y = 0;3 while (x <= 5) {4 y += 2;5 x += 1;6 }7 printf("%d", y);

- What is the value of the variable y on line 7?

Page 55: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

Preliminary Investigation: Human Study

• Presented users with short snippets (<50 lines) of decompiled code, asked to perform various maintenance tasks, graded and timed:

58

1 int x = 1;2 int y = 0;3 while (x <= 5) {4 y += 2;5 x += 1;6 }7 printf("%d", y);

- What is the value of the variable y on line 7?

• For correct answers, the time to answer using our renamings was statistically significantly lower than when using the decompiler names.

Page 56: Meaningful Variable Names for Decompiled Code: A Machine ...Statistical Machine Translation (SMT) 12 •Noisy channel model •English àFrench: Go do some research! Vafaire de la

System Architecture

45

Conclusion

•Questions?•Suggestions?

59