Top Banner
Using Natural Language Program Analysis to Locate and understand Action- Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and K. Vijay-Shanker Presented By: Paul Heintzelman
27

Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Using Natural Language Program Analysis to Locate and understand

Action-Oriented Concerns

David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and K. Vijay-Shanker

Presented By: Paul Heintzelman

Page 2: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Global Concepts

• Concept assignment problem

• Hybrid of structural and natural language information

• Concern Comprehension

• Action-oriented relations between identifiers – Represented by Action-oriented identifier

graph model (AOIG)

Page 3: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Why Action-Oriented Concerns

• In OOP– Code is organized by objects

• Objects are nouns

• Objects and actions conflict– Code organized by objects causes actions to

be scattered

• Therefore in OOP action-oriented concerns tend to be scattered and more difficult to locate

Page 4: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Paper Contributions

• AOIG– Interactive query expansion algorithm– A result graph construction algorithm– An Eclipse plug-in

• Evaluation– Comparison of search effectiveness of tools – Per task analysis– Comparison of user effort

Page 5: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

AOIG

Page 6: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

State of the Art

• Search-based approaches– Lexical searches

• Lead to over-generalized searches – Information retrieval

• Does not separate verbs and objects• Uses word frequency

• Program navigation– Uses structural information e.g. call, inheritance

graphs...– Accurate but difficult to seed

• Dynamic approaches– Requires test case to enact concept

Page 7: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Challenges

• Map high level concepts to queries– Aid user in mapping concepts

• Inability to search with high precision and recall– Search NLP representation of concern

• Understanding large result sets– Return results in an explorable graph

Page 8: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Overview of Approach

• User formulates a query– Query must include verb-direct object pairings

• User expands query– Recommendations based on query words and

source code

• Searches the AOIG– Interact with result graph

Page 9: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Independent Variables

• Search Tools– Find-Concept– ELex built in Eclipse search– GES Google Eclipse search (modified)

• Search Tasks– Application concept pairing

• Human Subjects– 13 professional programmers– 5 grad students

Page 10: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

• Applications– 4 large open source java projects

• 9 concepts taken from bug reports

– 1 training application• 2 concepts

Application Concept Pairing

Page 11: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Forming the Initial Query

• User generates abstract initial query– e.g. “automatically finish the word”

• User decomposes abstract query into verb-direct object pairs– e.g. “finish” and “word”

• Find concept maintains both verb query and direct object query

• Initial query expansion– User is presented with alternative forms of words in

both queries

Page 12: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Query Expansion

• Iterative steps– Generate recommended list

• Similar semantics is weighted more heavily than similar use• 10 ranked recommendations

– User examines recommendations• User selects words to add to queries• User can view a list of methods fitting the current queries

• Stop when user is satisfied– Augment user query with get, set, execute, construct

• Use AOIG to map verb-direct object pairs to source code– Generate result graph

Page 13: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Word Recommendation

• Similar semantics– Stemming

• Recommends different forms of words in either list• e.g. If “finish” is in verb-query, “finished” will be recommended

– Synonyms • Recommend a word if synonym exists in either list• e.g. Recommend “complete” if “finish” is in list

• Similar use– Recommend words that occur near words in either query– e.g. Recommend “word” if “complete” is in the verb query and

“complete word” is in the AOIG

Page 14: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Evolution of a Query

Page 15: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Result Graph

Page 16: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Find-Concept Process

Page 17: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Research Questions

• Which search tool is most effective at locating concerns by forming and executing a query?

• Which search tool requires the least amount of human effort to form an effective query?

Page 18: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Evaluation

• Effectiveness– Use the harmonic mean of precision and

recall (f-measure)• (2 * precision * recall)/(precision + recall)

– Result set is compared to evaluation set• Evaluation set is 90% generated by a member

unfamiliar with the work of this paper

• Effort– Measured amount of time required to form

each query

Page 19: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Experimental Setup

• Training– Subjects are guided through the use of each

tool on the two training tasks

• Task setup– Users are presented concepts in a visual form– Users confirm that they understood each task

Page 20: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Experimental Procedure

• 9 tasks• 18 programmers• 6 groups• 6 of every task tool combination

Page 21: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Results

• Find-Concept vs. ELex– Consistently outperformed ELex

• Find-Concept vs. GES– Outperformed GES on 4 tasks – Outperformed by GES on 2 tasks

• AOIG to blame?

– Performed equally to GES on 3 tasks

Page 22: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Effectiveness

Page 23: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Effort

• Human Effort was very similar with all tools

Page 24: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Threats to Validity

• The selected tasks favored one tool– Concerns selected from bug reports

• Evaluation sets created for evaluation– 90% generated by member unfamiliar with work

• Results may not generalize to all Java applications– Tested on reasonably-sized applications

• Results may not generalize to all types of concepts

Page 25: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Conclusion

• Interactive query expansion algorithm

• Graph construction algorithm

• Find-Concept performs well against state of the art tools

• All evaluated tools required similar human effort

Page 26: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Future Work

• Create a more effective AOIGBuilder

• Evaluate the effect of application’s quality and size on results

• Evaluate the effect of incorporating naming conventions

• Perform a study on how many tasks focus on actions

• Automate query expansion

Page 27: Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.

Additional threats to validity

• Effort and Effectiveness are not really independent

• Relies heavily on unjustified heuristic– Augmenting query

• Search tools are often used in conjunction with structural tools