Top Banner
Modeling the Evolution of Topics in Source Code Histories Stephen W. Thomas Bram Adams Ahmed E. Hassan Dorothea Blostein
30

Sthomas slides

Apr 12, 2017

Download

Documents

SAIL_QU
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sthomas slides

Modeling the Evolution of Topics in Source Code Histories

Stephen W. Thomas

Bram Adams

Ahmed E. Hassan

Dorothea Blostein

Page 2: Sthomas slides

[2]

Page 3: Sthomas slides

[3]Time

Pop

ular

ity

Linux Development

Audio Codecs

What have the Skype developers been

interested in?

Microsoft manager

Page 4: Sthomas slides

[4]

What are developers working on?

Option 1: Speak with every developer

Time

Pop

ular

ity

Linux Development

Audio Codecs

Option 2: Use automated tool

Page 5: Sthomas slides

[5]

Tool: Topic Evolution Models

…Topic “Linux”

… Topic “codec”Topi

c P

opul

arity

Time

V1.0 V1.1 V1.2 V2.0 V4.0

Topic “GUI”…

Applied to Source Code Histories

Page 6: Sthomas slides

[6]

Success in Other Domains

Email Archives

Conference Proceedings Newspaper Articles

Page 7: Sthomas slides

[7]

Topic Evolution on Source Code

Page 8: Sthomas slides

Topic Model

Mapping Topics Over Time

Background: The Hall Model

[8]

V1.0V1.1

V1.2V1.3

Page 9: Sthomas slides

XMLFile I/O

XMLGUI

GUIFile I/O

XMLFile I/O

XMLFile I/O

XMLFile I/O

XMLFile I/O

XMLGUI

XMLGUI

XMLGUI

XMLGUI

GUIFile I/O

[9]

V1 V2 V3 V4 V5

File

ID

Topic 1: XMLTopic 2: GUITopic 3: File I/O

Expect:

XMLFile I/O

XMLGUI

GUIFile I/O

XMLFile I/O

XMLFile I/O

XMLFile I/O

XMLFile I/O

XMLGUI

XMLGUI

XMLGUI

XMLGUI

GUIFile I/O

Topic 1: XML+ File I/OTopic 2: XML + GUITopic 3: GUI+ File I/O

Get:

Topic 1

Topic 3

Topic 2

Problem: Topics are muddled, not distinct

Page 10: Sthomas slides

[10]

Pop

ular

ity

File I/O

XMLGUIExpect:

V1 V2 V3 V4 V5

File

ID

XMLFile I/O

XMLGUI

GUIFile I/O

XMLFile I/O

XMLFile I/O

XMLFile I/O

XMLFile I/O

XMLGUI

XMLGUI

XMLGUI

XMLGUI

GUIFile I/O

XMLGUI

Problem: Evolutions not sensitive or accurate

Pop

ular

ity Get:

Topic 3Topic 1

Topic 2

Topic 1

Topic 3

Topic 2

Topic 2

Page 11: Sthomas slides

[11]

Problems due to duplication

Topics are muddled, not distinct

Evolutions are not accurate

Found in Source Code Histories

Page 12: Sthomas slides

63% files don’t change

84% files don’t change

99.8% words don’t change

99.8% words don’t change

[12]

JHotDraw

Real-World Duplication

Page 13: Sthomas slides

The Diff Model

[13]

Page 14: Sthomas slides

Topic Model

MappingTopics Over

TimeDiff Reconstruction

Step

The Diff Model

[14]

V1.0V1.1

V1.2V1.3

Page 15: Sthomas slides

...

if (vacstmt->options & VACOPT_VACUUM){ PreventTransactionChain(isTopLevel, stmttype); in_outer_xact = false;}...

...// Don’t run VACUUM in user transition block!if (vacstmt->options & VACOPT_VACUUM){ PreventTransactionChain(isTopLevel, stmttype); in_inner_xact = false;}...

Version 5.3.7 Version 5.3.8

// Don’t run VACUUM in user transition block!in_inner_xact = false;

Diff

in_outer_xact = false;

Deleted lines Added lines

Diff Step

[15]

Page 16: Sthomas slides

[16]

GUI (77%)XML (23%)

SecondVersion

FirstVersion

GUI (90%)XML (10%) =- +

Reconstructing Topic Memberships

(1000 * 90%) - (200*100%) + (150*20%) = 730

?

(950 lines)(150 lines)(200 lines)(1000 lines)

(1000 * 10%) - (200*0%) + (150*80%) = 220

Topic Model

DeletedLines

GUI (100%)XML (0%)

Topic Model

AddedLines

GUI (20%)XML (80%)

Topic Model Infer

Page 17: Sthomas slides

Case Studies

[17]

JHotDraw

Drawing Application Framework (Java)

13 releases (5.2.0 – 7.5.1)613 files84K SLOC

Database Management System(C)

46 releases (7.0.0 – 8.3.5)844 files501K SLOC

Page 18: Sthomas slides

I bet the Diff model discovers topics that are more distinct!

[18]

Page 19: Sthomas slides

High KL divergence High distinctness [19]

Measuring Distinctness

xml fopen button element menu fclose attribute

Wor

d P

roba

bilit

y XML topic

GUI topic

xml fopen button element menu fclose attribute

Wor

d P

roba

bilit

y

xml fopen button element menu fclose attribute

Wor

d P

roba

bilit

y XML + File IO topic

xml fopen button element menu fclose attribute

Wor

d P

roba

bilit

y XML + GUI topic

Low KL divergence Low distinctness

With KL-Divergence

Page 20: Sthomas slides

[20]

Average Topic Distinctness

Topic 1Topic 2Topic 3Topic 4Topic 5…Topic K

Topic 1Topic 2Topic 3Topic 4Topic 5…Topic K

Hall TopicsTopic 1Topic 2Topic 3Topic 4Topic 5…Topic K

Topic 1Topic 2Topic 3Topic 4Topic 5…Topic K

Diff Topics

Page 21: Sthomas slides

+32% +38%

Diff makes more distinct topics

[21]

JHotDraw

Page 22: Sthomas slides

[22]

Topics are muddled, not distinct

Evolutions are not accurate

Diff makes more distinct topics

Problems due to duplicationFound in Source Code Histories

Page 23: Sthomas slides

I bet the Diff model discovers more accurate topic evolutions

[23]

Page 24: Sthomas slides

[24]

No oracle dataset

Measuring Accuracy

Create simulatedscenario by handTruth known

1.

Manually investigateevolutions in JHotDraw and PSQL

2.

Truth learned

Page 25: Sthomas slides

v1 v2 v3 v4 v5 v6 v7 v8 v9 v10

copy copy copy copy copy copy copy copy copy

PSQLbackend.access

Simulated Project

[25]

Page 26: Sthomas slides

v1 v2 v3 v4 v5 v6 v7 v8 v9 v10

3 files from PSQLtimezone

Simulated Scenario 1

timezone topic

[26]

Page 27: Sthomas slides

Manual Investigation

[27]

Topic 1

2. Validate against project documentation (commit logs, release notes, etc.)

1 .Select change events

Page 28: Sthomas slides

Diff makes more accurate topics

[28]

+25% precision

SimulatedProject

+33% precision

JHotDraw

+47% precision

+100% recall

Page 29: Sthomas slides

[29]

Topics are muddled, not distinct

Evolutions are not accurate

Diff makes more distinct topics

Diff makes more accurate evolutions

Problems due to duplicationFound in Source Code Histories

Page 30: Sthomas slides

[30]

Summary