Top Banner
Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka Univ Measuring Copying of Java Archives Tetsuya Kanda 1 , Daniel M. German 2,1 , Takashi Ishio 1 , Katsuro Inoue 1 1 Osaka University, Japan 2 University of Victoria, Canada
16

Measuring Copying of Java Archives

Feb 22, 2016

Download

Documents

junior silva

Measuring Copying of Java Archives. Tetsuya Kanda 1 , Daniel M. German 2,1 , Takashi Ishio 1 , Katsuro Inoue 1 1 Osaka University, Japan 2 University of Victoria, Canada. Reusing a library. Reuse existing libraries by copying them into the software development project - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Measuring Copying of Java Archives

Tetsuya Kanda1, Daniel M. German2,1, Takashi Ishio1, Katsuro Inoue1

1 Osaka University, Japan2 University of Victoria, Canada

Page 2: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Reusing a library• Reuse existing libraries by copying them

into the software development project• Black-box reuse

2

Software

Copy

THEUSEFUL LIBRARY

Page 3: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Library in Java• JAR files (Java archive file) are built on the

ZIP file format • A Jar file can contain another jar file inside.

3

Java archiveTHEUSEFUL LIBRARYjar files

in the library

Page 4: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Duplication of jar files• Since a Jar file can contain another jar file

inside, they can be duplicated

• Jar files in another jar file might cause further duplication

4

Software Copy

Java archive

Page 5: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Question• How many jar files in a large software

repository contain jar files inside?• Are there any duplication of jar files inside?

5

Java archivejar files

in the library

Page 6: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Definition: Top-level jar file• A jar file found in the repository

– A component ready to be reused

6

Top-level jar

Page 7: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Definition: Inner jar file• A jar file that is included in another jar file

7

A.jar(Top-level jar)

Inner jar files of A.jar

Page 8: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

The experiment• Objective:

– Find how many top-level jar files contain duplicate inner jar files inside

• Target:– Maven Central repository

• Default repository of Apache Maven• Contains many popular libraries and projects.

8

Page 9: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Counting inner jar files• 599,498 top-level jar files in Maven Central

repository   (without duplications)• 4,747 contains jar files inside

9

# inner jar filesMax 282Average 13.1Median 2Min 1 (in 1,833 of top-level jar files)

Page 10: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Reused jar files• 118,361 different inner jar files are

contained in other jar files• 89,054 of them are found as top-level jar

files in Maven Central repository– There is a possibility of causing further

duplication in software projects.

10

Page 11: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Duplication of inner jar files

Top-level

Contains Inner jar

Having Duplication

TotalSame Different Both

#files 4,747 105 394 30 469

11

The same version

The different versions

Having the same file name andthe same file hash of the contents

Having the same file name with the exception of version names

llibA-1.0.jarhash:3bf7

llibA-1.0.jarhash:3bf7

llibB-1.0.jar llibB-1.2.jar

Page 12: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Duplication of inner jar files

Top-level

Contains Inner jar

Having Duplication

TotalSame Different Both

#files 4,747 105 394 30 469

12

Contain the same version of the same library

Ver.1

Ver.1

Page 13: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Duplication of inner jar files

Top-level

Contains Inner jar

Having Duplication

TotalSame Different Both

#files 4,747 105 394 30 469

13

Contain the different versions of the same library

Ver.1

Ver.2

Page 14: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Duplication of inner jar files

Top-level

Contains Inner jar

Having Duplication

TotalSame Different Both

#files 4,747 105 394 30 469

14

Contain both the same version and the different versionsof the same library

Ver.1

Ver.2

Ver.1

Page 15: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Concluding remarks• About 5,000 jar files in the Maven Central

repository contain other jar files• About 470 of them contains

duplicate libraries• Most of inner jar files are also found as

Maven components– There are still possibility of further duplications.

15

Page 16: Measuring Copying of Java Archives

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Future works• Find duplications of jar files and class files

in distributed software applications– eclipse, JBoss, …

• Analyze the behavior of the software which contains duplicated libraries– Understanding the impact of duplication

16