Top Banner
/* iComment: Bugs or Bad Comments? */ Lin Tan, Ding Yuan, Gopal Krishna, Yuanyuan Zhou Published in SOSP 2007 Presented by Kevin Boos
23

* iComment : Bugs or Bad Comments? *

Feb 26, 2016

Download

Documents

Gail

/* iComment : Bugs or Bad Comments? */. Lin Tan, Ding Yuan, Gopal Krishna, Yuanyuan Zhou Published in SOSP 2007 Presented by Kevin Boos. In a Nutshell. iComment : static analysis + NLP Detects code-comment mismatches Uses both source code and comments. Roadmap. iComment Paper - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: /*  iComment : Bugs or Bad Comments?   */

/* iComment: Bugs or Bad Comments? */Lin Tan, Ding Yuan, Gopal Krishna, Yuanyuan

ZhouPublished in SOSP 2007

Presented by Kevin Boos

Page 2: /*  iComment : Bugs or Bad Comments?   */

In a Nutshell• iComment: static analysis + NLP• Detects code-comment mismatches• Uses both source code and comments

Page 3: /*  iComment : Bugs or Bad Comments?   */

Roadmap• iComment Paper

o Motivationo Challenges o Contributionso Approach & Methodologyo Resultso Related Work

• Complexity• Authors’ other works

Page 4: /*  iComment : Bugs or Bad Comments?   */

Motivation• Software bugs affect reliability.

o Mismatches between code and developer assumptions

// Caller must acquire lock.static int reset_hardware(...) {

//access shared data. }

static int in2000_bus_reset(...) {reset_hardware(...);

}

Page 5: /*  iComment : Bugs or Bad Comments?   */

Prevalence of Comments

• Comments = developer assumptionso Must hold locks, interrupts must be disabled,

etc.

• Other tools do not utilize comments! o Ignore valuable information (dev. intentions)

Software Lines of Code

Lines of Comments

Linux 5 million 1 millionMozilla 3.3 million 0.5 million

Page 6: /*  iComment : Bugs or Bad Comments?   */

Code vs. CommentsCode Comment Implication Precise Imprecise Comments are harder to analyze. Can be tested

Can NOT be tested

Software evolution makes comments less reliable.

Harder to understand

Easier to understand

Developers read comments before code.Wrong comments mislead programmers.

• Developer assumptions can’t always be inferred from source code

• Comments and code are redundanto or should be…

Page 7: /*  iComment : Bugs or Bad Comments?   */

Inconsistencies• What’s wrong: comments or code?

o Developer mistakeo Out of date o Copy and paste error (clone detection)

• Bad code might be bugs• Bad comments cause future bugs

Page 8: /*  iComment : Bugs or Bad Comments?   */

Challenges• Parsing and understanding

commentso Natural language is ambiguous and varying

/* We need to acquire the IRQ lock before calling … */ /* Lock must be acquired on entry to this function. *//* Caller must hold instance lock! */

• NLP only captures sentence structureo No concept of understandingo Decent accuracyo Comments may be grammar disasters…

Page 9: /*  iComment : Bugs or Bad Comments?   */

Contributions• First step towards automatically

analyzing comments o Combines NLP, machine learning, static

analysis• Identifies inconsistent code &

comments• Real-world applicability

o Discovered 60 new bugs or bad comments• Only two topics: locks & calls

Page 10: /*  iComment : Bugs or Bad Comments?   */

Approach• Two types of comments

o Explanatory: /* set the access flags */o Assumptions/Rules: /* don’t call with lock held

*/• Check comment rules topic-by-topic

o General frameworko Users choose the hot topics

Page 11: /*  iComment : Bugs or Bad Comments?   */

Rule Templates• <Lock L> must be held before entering <Function F>. • <Lock L> must NOT be held before entering <Function F>. • <Lock L> must be held in <Function F>. • <Lock L> must NOT be held in <Function F>. • <Function A> must be called from <Function B> • <Function A> must NOT be called from <Function B>

• Other templates exist (see paper)• User can add more templates

Page 12: /*  iComment : Bugs or Bad Comments?   */

Handling Comments• Extract comments

o NLP, keyword filters, correlated word filters• Classify comments (rule generation)

o Manually label small subseto Create decision tree with machine learningo Decision tree matches comments to templateso Fill template parameters with actual variables

• Training is optional for users

Page 13: /*  iComment : Bugs or Bad Comments?   */

Rule Checker• Static analysis

o Flow sensitive and context sensitiveo Scope of comments

• Display the inconsistencieso Sorted by ranking (support probability)

Page 14: /*  iComment : Bugs or Bad Comments?   */

EvaluationSoftware SLOC #Cmts. Languag

e Description Linux 5.0 M 1.0 M C OS Mozilla 3.3 M 0.51M C, C++ Browser Suite

Wine 1.5 M 0.22 M C Runs Windows Apps in Linux

Apache 0.27 M 0.06 M C Web Server • Four large software projects• Two topics: locks and function calls• Average training data: 18%

Page 15: /*  iComment : Bugs or Bad Comments?   */

ResultsSoftware Mismatch Bugs Bad

Cmts. FP Rules

Linux 51 (14) 30 (11) 21 (3) 32 1209

Mozilla 6 (5) 2 (1) 4 (4) 3 410 Wine 2 1 1 3 149 Apache 1 0 1 0 64

Total 60 (19) 33 (12) 27 (7) 38 1832 • Automatically detected 60 new bugs and bad

commentso 19 new bugs and bad comments already confirmed by developers

• False positives exist (38%)o Incorrectly generated ruleso Inaccuracy of checking rule

Page 16: /*  iComment : Bugs or Bad Comments?   */

Training Accuracy

Linux Mozilla Wine Apache 90.8% 91.3% 96.4% 100%

• Accuracy: % of correct mismatches

Training SW Mozilla Wine Apache Linux 81.5% 78.6% 83.3% Linux+Mozilla —— 89.3% 88.9%

—— Software-specific training ——

—— Cross-software training ——

Page 17: /*  iComment : Bugs or Bad Comments?   */

Related Work• Extracting rules from source code

o iComment employs static analysis but not dynamic traces

• Annotations o Poor adoption rateso Requires manual effort per comment

• Documentation generationo No usage of NLPo iComment also analyzes unstructured

comments

Page 18: /*  iComment : Bugs or Bad Comments?   */

Complexity• Detecting inconsistencies

o NLP• Abstracted away by tools

o Machine learning• Simple manual training rules

• Code maintenanceo Developers may forget to be thorough

• Automatic bug detectiono Locking errors are extremely complex

Page 19: /*  iComment : Bugs or Bad Comments?   */

Author Bio• Primary author: Lin Tan• Improving software reliability

o Commentso Source codeo Execution traceso Manual input

• HotComments – prior ideas paper

Page 20: /*  iComment : Bugs or Bad Comments?   */

Author Bio• Secondary author: Ding Yuan• Reliability of large software systems• Better logging

o Enhanced output

Page 21: /*  iComment : Bugs or Bad Comments?   */

Author Bio• Professor: Yuanyuan Zhou• Better debuggers, software reliability

• Founded PatternInsight

Page 22: /*  iComment : Bugs or Bad Comments?   */

PatternInsight Startup

• http://patterninsight.com/

Page 23: /*  iComment : Bugs or Bad Comments?   */

Conclusion• Comment-code inconsistencies are

bado Poorer software quality and reliability

• First work to automatically analyze commentso Uses NLP and static code analysis

• Detected real bugs in Linux/Mozilla• Manages complexity of code

consistency and maintenance