This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Overview and Evaluation ofJava Component Search Syste
Experimental evaluation for SPARS-J Conclusion and Future work
18Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experimental Evaluation
1. Comparison of each ranking method in SPARS-JWe investigate the best ranking methodCR vs. KR vs. CR+KR
2. Comparison with other search enginesWe verify SPARS-J’s effectiveness as a software component search engine.vs. Google, Namazu
3. Application of SPARS-J in actual development environment
We confirm that SPARS-J is useful to management and understanding of software.
19Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 1: Comparison of ranking method in SPARS-J
Purpose of ExperimentWe investigate the best method among 3 ranking method in SPARS-J.
1. CR (Based on Use-relation)2. KR (Based on TF-IDF)3. CR+KR ( Integrating 1 & 2)
Preparation Database from Java source codes publicly available
About 140,000 files from JDK, SourceForge, etc.....
Keywords10 queries assumed development of simple system
20Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 1: Comparison of ranking method in SPARS-J
Criterion of EvaluationPrecision of components in the top 10 Result :
The percentage of suitable components– User tends to look at only a higher ranked results.– High precision means that there are many useful components in ran
ge of user’s visibility.
Ndpm :The percentage of the component pair which differs rank order between two ranking methods.– We define user‘s ideal ranking in advance, and calculate ndpm.
» The quantitative indicator which shows a distance from ideal– Ndpm considers all the components in a search result.
» Its distance becomes large when required components are ranked low.
21Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result (Experiment 1)
Keyword CR KR
CR+KR
CR KRCR+K
R
A 1 1 1 0.036 0.048 0.037
B 1 1 1 0.194 0.261 0.221
C 0.5 0.5 0.5 0.133 0.117 0.092
D 0.4 0.9 0.8 0.123 0.200 0.189
E 0.4 0.4 0.4 0.208 0.192 0.194
F 0.2 0.2 0.2 0.184 0.184 0.160
G 0.9 1 1 0.081 0.103 0.080
H 1 0.8 1 0.047 0.109 0.052
I 0.6 0.7 0.7 0.210 0.324 0.267
J 0.5 0.7 0.7 0.219 0.243 0.114
Ave. 0.65 0.72 0.73 0.1430.17
80.141
Precision Ndpm
22Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Consideration (Experiment 1)
By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level.
Precision: KR,CR+KR ≫ CRNdpm: CR,CR+KR ≫ KR
Characteristic of each method CR
CR generally ranks components in desirable order. Higher ranked components are important but often have no relevance to keyword.
KRKR generally appreciates components which have strong relevance. In required component, keyword doesn’t always appear with high frequency.
CR+KRCR+KR has good result at both precision and ndpm.CR+KR has the best of both ranking
We use CR+KR as a default ranking method.
23Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 2:Comparison with other search engines
Purpose of ExperimentWe verify SPARS-J’s effectiveness as a software component search engine.
1. SPARS-JDatabase from 140,000 files (Same as Experiment 1)We use CR+KR as ranking method.
2. GoogleFamous web search Engine Input queries to www.google.co.jp
3. NamazuFull-text search system for documents.Namazu uses TF-IDF to rank documents.Database from 140,000 files (Same files as SPARS-J)
Preparation Keywords: 10 queries (Same as Experiment 1)Criterion of Evaluation: Precision of the top 10 Result
24Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result (Experiment 2)
keyword SPARS-J Google Namazu
A 1 0.7 0.9
B 1 0.4 0.6
C 0.5 0.3 0.4
D 0.8 0.3 0.6
E 0.4 0.1 0.3
F 0.2 0 0.1
G 1 0.3 0.4
H 1 0.1 0.2
I 0.7 0.4 0.4
J 0.7 0.4 0.7
Ave. 0.73 0.3 0.46
Precision of the top 10 result
25Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Consideration (Experiment 2)
By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level.
In the result, there are many pages other than an explanation of Java source code.Performance depends on how much description there are.
NamazuSince the datasets consists of only source codes, the result is better than Google.Without characteristics of Java programs, we cannot get good results.
For searching software components, SPARS-J is more useful than other search engines.
26Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 3: Application of SPARS-J in actual development environment
Purpose of ExperimentWe confirm that SPARS-J is useful to management and understanding of software resource.
Criterion of EvaluationQualitative evaluation about SPARS-J
Preparation We set up SPARS-J to a company.
7 employees use SPARS-J for two weeks.They are all engaged in the software development and the maintenance activity.
We carry out a questionnaire survey about SPARS-J
27Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result (Experiment 3)
Questionnaire Item \ examinee
A B C D E F G Mode
Package Browser 4 5 5 5 4 3 3 5
Similar components 4 5 5 2 4 3 5,4
Components used by the class 5 5 5 5 5 5 5
Components using the class 5 1 5 5 5 5 5
Metrics of the class 1 4 1 2 4 5 4,1
Download of the class 1 3 5 5 2 5 5
Contribution to reduction of time cost 3 5 5 3 4 1 5,3
View-ability of the component-list view 4 4 5 5 3 3 5 5
View-ability of the highlighted source code
3 5 5 5 5 5 5 5
( [Useful or Used repeatedly] 5 4 3 2 1 [Useless or seldom Used] )
28Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Consideration (Experiment 3)
Highly rated questionnaire itemsReference by package browserReference by similar componentsReference by components using (used by) the classView-ability of the component list view and source code
Activities realized by using SPARS-JListing of applications which uses certain componentImpact analysis at reediting components
29Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Consideration (Experiment 3)
Other commentResponse speed is very quick, and we have felt no stress. Since it is not necessary to install in a client, sharing of software components is easy.
SPARS-J can support maintenance work effectively.
Easier grasp of software components
30Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Conclusion and Future worksConclusion
We construct software component search system SPARS-J.
Search engine for Java source codeRanking components with consideration of characteristics. Provision of useful relevant information.
We verified the validity of SPARS-J based on experimental evaluation.
SPARS-J is useful to search software components. SPARS-J is very helpful to grasp and manage components.
Future worksThe quantitative evaluation other than ranking performanceSupport for other software component
31Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
32Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
33Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
34Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Outline
Motivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
35Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Component analysis part
Extract component and its information from a Java source fileThe process
Extract a componentIndex the componentExtract use relationsClustering similar componentsRank components based on use relations (CR method)
36Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Extract and index a component
Extracting componentFind class or interface block in a java source file
Location information in the file (start line number, end line number)
IndexingExtract index key from the component
Index key : a word and the kind of itNo reserved words are extracted
Count frequency in use of the word
word kind
Sort Class name
quicksort Comment
quicksort Method name
pivot Variable name
quicksort Method call
: :Index key
public final class Sort { /* quicksort */ private static void quicksort(…) { int pivot; : quicksort(…); quicksort(…); }}
1
1
1
1
2
:frequency
37Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Extract use relationsExtract use relations among components using semantic analysisMake component graph from use relations
Node: componentEdge: use relation
Inheritance
Interfaceimplementati
on
Variable type
Instance creation
Field access
Method callThe kind of use relation
public class Test extend Data{ : public static void main(…) { : Sort.quicksort(super.array); : }}
Sort
Data
Test
Component graph
InheritanceField access
Method call
38Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Similar componentSimilar component is copied component or minor modified componentWe merge similar components into single componentMerged component have use relations that all component before merging have
C
B F
A D
G
E
Component graph
BF
AD E
C G
Clustered component graph
C
B F
A D
G
E
39Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clustering components
We measure characteristics metrics to merge componentsThe difference ratio of each component metrics
Metricscomplexity
– The number of methods, cyclomatic, etc. – represent a structural characteristic
Token-composition– The number of appearances of each token– represent a surface characteristic
40Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Ranking based on use relation
Component Rank (CR)Reusable component have many use relation
The example of use is muchGeneral purpose componentSophisticated component
We measure use relation quantitatively, and rank components
The component used by many components is importantThe component used by important component is also importantKatsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Component Rank:
Relative Significance Rank for Software Component Search", ICSE, Portland, OR, May 6, 2003.
41Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.34 0.33
0.33
0.17
0.17
0.330.33
Ad-hoc weights are assigned to each node
42Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.33 0.17
0.5
0.175
0.175
0.170.5
The node weights are re-defined by the incoming edge weights
43Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.5 0.175
0.345
0.25
0.25
0.1750.345
We get new node weights
44Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.4 0.2
0.4
0.2
0.2
0.20.4
• We get stable weight assignment next-step weights are the same as previous ones
• Component Rank : order of nodes sorted by the weight
45Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Outline
Motivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
46Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Component retrieval part
Search components from database, rank components The process
Search componentsRanking suited to a user requestAggregate two ranks (CR and KR)
47Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Search components
Search queryWords a user inputThe kind of an index word, package name
Components contain given query are searched from Database
48Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Ranking suited to a user request
Keyword Rank (KR)Components which contain words given by a user are searchedRank components using the value calculated from index word weight Index word weight
– Many frequency in use of a component– A word contained particular components– A word represent the component function such as Class
name
Sort the sum of all given word weightTF-IDF weighting using full-text search engine
49Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Calculation of KR value
Calculate weight Wct with component c word t
TFi : The frequency with which a kind i of word t occurs in component c IDF : the total number of components / the number of components containing word tkwi : Weight of a kind i
KR value is the sum of all word Wct
kindall
iict IDFTFkww ) (
the kind of a word
weight
Class name 200
Interface name 50
Method name 200
Package name 50
Import 30
Method call 10
Field access 10
Variable type 10
Instance creation
10
Local var access 1
Comment 30
Doc comment 50
Line comment 10
String 1
50Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Aggregate two ranks
Aggregate two ranks KR and CRAggregation method
Borda Count method known a voting systemUse for single or multiple-seat electionsThis form of voting is extremely popular in determining awards
SPARS-JRank components both KR and CRUsing KR and CR, the component that be suitable user’s request, reusable and sophisticated
51Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Borda Count methodThere are 10 voters and 5 candidates (from A to E) Each voter rank candidates1 point for last place, 2 points for second from last place …, and N points for first place1st=5points , 2nd=4points ,…