Top Banner
Commit Quality in Five High Performance Computing Projects Kapil Agrawal, Sadika Amreen, and Audris Mockus University of Tennessee, Knoxville {kagrawa1@vols,samreen@vols,audris@}utk.edu International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop on Software Engineeri / 22
22

Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Apr 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Commit Quality in Five HighPerformance Computing Projects

Kapil Agrawal, Sadika Amreen, and Audris Mockus

University of Tennessee, Knoxville

{kagrawa1@vols,samreen@vols,audris@}utk.edu

International Workshop on Software Engineering forHigh Performance Computing in Science, Firenze,

Italy May 27, 2015

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 1

/ 22

Page 2: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Outline

1 Motivation and Goals

2 Context

3 MethodApproachMeasures

4 ResultsFraction of Unique Commit Comments

Discussion

Number of Delta and Comment LengthDiscussion

5 Conclusion

6 Questions

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 2

/ 22

Page 3: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Motivation and Goals

What are software development practices in HighPerformance Computing (HPC)?

ObjectiveMeasure and compare HPC and non-HPC practices

MethodCreate code commit quality measures.

Derive from the version control systems (VCS)Conduct a case study

Five key HPC infrastructure frameworksThree highly diverse non-HPC open source projects

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 3

/ 22

Page 4: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Context - Why/Who/What Changed

VCS tracks code (why/who/what changed), allowsshared development, and supports complex workflows

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 4

/ 22

Page 5: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Sample of Projects

HPC middlware from ICL at UTKDiverse non-HPC projects hosted on BitBucket

HPC1 OpenMPI2 OpenSHMEM3 PaRSEC4 PLASMA5 MAGMA

Non-HPC1 Bitbucket Tutorial2 Django-piston3 Linux kernel

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 5

/ 22

Page 6: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Approach - Multiple Exploratory CaseStudy

Literal ReplicationLiteral replication: five similar widely used parallelcomputing frameworks

Theoritical ReplicationTheoretical replication: three extreme non-HPC projectsfrom BitbucketProjects

1 Bitbucket Tutorial2 Django-piston3 Linux kernel

1 Most forked2 Most watched3 Most commits

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 6

/ 22

Page 7: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Project Summaries: HPC

Different VCS systems:Git, Mercurial (hg), and SVN for HPCGit and Mercurial for non-HPC

Repos Authors Time Cmts/UCmts VCS

OpenMPI 116 2003- 20K / 20K GH-hgOpenSHMEM 20 2010- 1K / 1K GH-hgPaRSEC 33 2009- 8K / 7K BB-hgPLASMA 20 2008- 4K / 4K SVNMAGMA 21 2013- 4+K / 4-K SVN

Table : Overview of HPC projects

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 7

/ 22

Page 8: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Project Summaries: non-HPC

Selection criteria:1 The most unique commit comments2 The most forked3 The most watched

Repos Authors Time Cmts/UCmts VCS

eniliolopez/linux 15k 2005- 446K / 442K BB-gittutorials.bitbucket 2.6k 2012- 6K / 5.5K BB-hgdjango-piston 33 2009-2012 254 / 252 BB-hg

Table : Overview of Non-HPC projects

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 8

/ 22

Page 9: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Measures: Basic

Total Number of CommitsEffort that went into creating and maintaining theproject

A normalizing factor in commit quality measures

Number of Authors in a projectA social characteristic of a project

E.g., commercial projects → fewer more equalcontributors

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 9

/ 22

Page 10: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Measures: Commit Quality

Number of Unique Commit MessagesEach commit message should be specific

No generic commit messages: ”fix,” ”fixed bug,” or”initial commit”

Mature → each commit messages is unique

The size of Commit CommentsA specific format and detail

Very small commit messages may indicateimmaturity

Mature → larger commit messages

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 10

/ 22

Page 11: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Measures: Commit Quality 2

Number of deltaThe number of files modified or added in a commit

Although convenient, several tasks in a singlecommit is bad practice

Commits with more delta → less mature projects

Fraction of Unique Commit CommentsA high ratio → commit comments are tailored toeach commit.

A lower ratio indicates that same comments werereused for new commits or were generic.

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 11

/ 22

Page 12: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Results

Fraction of Unique Commit Comments

Figure : Trend in commit quality of HPC projectsKapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15

International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 12/ 22

Page 13: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Results Contd.

Fraction of Unique Commit Comments

Figure : Trend in commit quality of HPC projects

nUC - Number of Unique Commit CommentsnTC - Total number of Commit Comments

nUCnTC - Comment Quality

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 13

/ 22

Page 14: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Implications: Frctn of Unique Cmt Cmts

Number of Commits - HPC vs Non-HPCInitial spike in number of commits (HPC) → thestarting activities of the project

Linux kernel (non-HPC) shows very steadydevelopment (no sharp peaks)

Commit Quality - HPC vs Non-HPCnUCnTC ∈ [0.9, 1.0]→ effort to document the changesnUCnTC ratio for non-HPC projects less consistent thanfor HPC

Average life of five years, Average nUCnTC ≥ 0.9

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 14

/ 22

Page 15: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Results Contd.

Number of Delta and Comment Length

Figure : Size of Commits for HPC projectsKapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15

International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 15/ 22

Page 16: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Results Contd.

Number of Delta and Comment Length

Figure : Size of Commits for Non-HPC projects

Delta - Total Number of files modified or added in asingle commit.

Comment Size - Number of character in each commitcomment.Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15

International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 16/ 22

Page 17: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Discussion

Commit cComment Size - HPC vs Non-HPCHPC projects: 200-1300 characters

non-HPC projects: 50-150 characters

More effort in HPC community.

Delta per commit - HPC vs Non-HPCHPC: 5-6, up to 9 in PLASMA

non-HPC: approximately 2.More delta per commit:

Tangled changes?More complex tasks?

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 17

/ 22

Page 18: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

HPC: Higher Quality but More Complex

Observation 1HPC middleware projects have higher commit quality:

Fraction of unique commit messages

Message size

Observation 2HPC middleware projects have more complex commits:

The number of files modified in a commit

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 18

/ 22

Page 19: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Conclusion

Despite the HPC community being early in embracingcode sharing, it has lagged in efficiently using the toolsthat define open source development.

The results of our investigation on HPC and otherprojects suggest the specific hypotheses that we plan toinvestigate on a more comprehensive set of projects.

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 19

/ 22

Page 20: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Future Work

Are the typical VCS and issue trackers most suitable forHPC development practices? If not, what modificationsare needed to make HPC development most productive?

We hope that our initial findings would help pose moreprecise questions in this area and the methods usedwould help answer such questions in the future.

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 20

/ 22

Page 21: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

Questions?

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 21

/ 22

Page 22: Commit Quality in Five High Performance Computing Projects · 2 Django-piston 3 Linux kernel Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15 International Workshop

The End

Kapil Agrawal, Sadika Amreen, and Audris Mockus (UTK) SE4HPC’15International Workshop on Software Engineering for High Performance Computing in Science, Firenze, Italy May 27, 2015 22

/ 22