Overview and Evaluation of Java Component Search System SPARS-J. Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**, Shinji Kusumoto **Katsuro Inoue** *Japan Science and Technology Agency **Osaka University. Outline. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Overview and Evaluation ofJava Component Search Syste
Experimental evaluation for SPARS-J Conclusion and Future work
18Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experimental Evaluation1. Comparison of each ranking method in SPARS-J
We investigate the best ranking methodCR vs. KR vs. CR+KR
2. Comparison with other search enginesWe verify SPARS-J’s effectiveness as a software component search engine.vs. Google, Namazu
3. Application of SPARS-J in actual development environment
We confirm that SPARS-J is useful to management and understanding of software.
19Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 1: Comparison of ranking method in SPARS-J
Purpose of ExperimentWe investigate the best method among 3 ranking method in SPARS-J.
1. CR (Based on Use-relation)2. KR (Based on TF-IDF)3. CR+KR ( Integrating 1 & 2)
Preparation Database from Java source codes publicly available
About 140,000 files from JDK, SourceForge, etc.....Keywords
10 queries assumed development of simple system
20Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 1: Comparison of ranking method in SPARS-J
Criterion of EvaluationPrecision of components in the top 10 Result :
The percentage of suitable components– User tends to look at only a higher ranked results.– High precision means that there are many useful components in ran
ge of user’s visibility.
Ndpm :The percentage of the component pair which differs rank order between two ranking methods.– We define user‘s ideal ranking in advance, and calculate ndpm.
» The quantitative indicator which shows a distance from ideal– Ndpm considers all the components in a search result.
» Its distance becomes large when required components are ranked low.
21Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
In the result, there are many pages other than an explanation of Java source code.Performance depends on how much description there are.
NamazuSince the datasets consists of only source codes, the result is better than Google.Without characteristics of Java programs, we cannot get good results.
For searching software components, SPARS-J is more useful than other search engines.
26Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 3: Application of SPARS-J in actual development environment
Purpose of ExperimentWe confirm that SPARS-J is useful to management and understanding of software resource.
Criterion of EvaluationQualitative evaluation about SPARS-J
Preparation We set up SPARS-J to a company.
7 employees use SPARS-J for two weeks.They are all engaged in the software development and the maintenance activity.
We carry out a questionnaire survey about SPARS-J
27Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result (Experiment 3)
Questionnaire Item \ examinee A B C D E F G Mode
Package Browser 4 5 5 5 4 3 3 5Similar components 4 5 5 2 4 3 5,4Components used by the class 5 5 5 5 5 5 5Components using the class 5 1 5 5 5 5 5Metrics of the class 1 4 1 2 4 5 4,1Download of the class 1 3 5 5 2 5 5Contribution to reduction of time cost 3 5 5 3 4 1 5,3Improvement for software quality 5 3 3 3 4 1 3Understanding of software resource 3 1 5 3 5 2 1 5,3,1View-ability of the component-list view 4 4 5 5 3 3 5 5View-ability of the highlighted source
code 3 5 5 5 5 5 5 5
( [Useful or Used repeatedly] 5 4 3 2 1 [Useless or seldom Used] )
28Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Consideration (Experiment 3)
Highly rated questionnaire itemsReference by package browserReference by similar componentsReference by components using (used by) the classView-ability of the component list view and source code
Activities realized by using SPARS-JListing of applications which uses certain componentImpact analysis at reediting components
29Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Consideration (Experiment 3)
Other commentResponse speed is very quick, and we have felt no stress. Since it is not necessary to install in a client, sharing of software components is easy.
SPARS-J can support maintenance work effectively.
Easier grasp of software components
30Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Conclusion and Future worksConclusion
We construct software component search system SPARS-J.
Search engine for Java source codeRanking components with consideration of characteristics. Provision of useful relevant information.
We verified the validity of SPARS-J based on experimental evaluation.
SPARS-J is useful to search software components. SPARS-J is very helpful to grasp and manage components.
Future worksThe quantitative evaluation other than ranking performanceSupport for other software component
31Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
32Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
33Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
34Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
OutlineMotivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
35Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Component analysis part
Extract component and its information from a Java source fileThe process
Extract a componentIndex the componentExtract use relationsClustering similar componentsRank components based on use relations (CR method)
36Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Extract and index a component
Extracting componentFind class or interface block in a java source file
Location information in the file (start line number, end line number)
IndexingExtract index key from the component
Index key : a word and the kind of itNo reserved words are extracted
Count frequency in use of the word
word kindSort Class namequicksort Commentquicksort Method
namepivot Variable
namequicksort Method call
: :Index key
public final class Sort { /* quicksort */ private static void quicksort(…) { int pivot; : quicksort(…); quicksort(…); }}
11112:
frequency
37Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Extract use relationsExtract use relations among components using semantic analysisMake component graph from use relations
Node: componentEdge: use relation Inheritance
Interfaceimplementati
onVariable type
Instance creation
Field accessMethod callThe kind of use relation
public class Test extend Data{ : public static void main(…) { : Sort.quicksort(super.array); : }}
Sort
Data
TestComponent graph
InheritanceField access
Method call
38Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Similar componentSimilar component is copied component or minor modified componentWe merge similar components into single componentMerged component have use relations that all component before merging have
C
B F
A D
G
EComponent graph
BF
AD E
C G
Clustered component graph
C
B F
A D
G
E
39Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clustering componentsWe measure characteristics metrics to merge componentsThe difference ratio of each component metrics
Metricscomplexity
– The number of methods, cyclomatic, etc. – represent a structural characteristic
Token-composition– The number of appearances of each token– represent a surface characteristic
40Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Ranking based on use relation
Component Rank (CR)Reusable component have many use relation
The example of use is muchGeneral purpose componentSophisticated component
We measure use relation quantitatively, and rank components
The component used by many components is importantThe component used by important component is also important
41Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.34 0.33
0.33
0.17
0.17
0.330.33
Ad-hoc weights are assigned to each node
42Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.33 0.17
0.5
0.175
0.175
0.170.5
The node weights are re-defined by the incoming edge weights
43Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.5 0.175
0.345
0.25
0.25
0.1750.345
We get new node weights
44Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.4 0.2
0.4
0.2
0.2
0.20.4
• We get stable weight assignment next-step weights are the same as previous ones
• Component Rank : order of nodes sorted by the weight
45Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
OutlineMotivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
46Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Component retrieval part
Search components from database, rank components The process
Search componentsRanking suited to a user requestAggregate two ranks (CR and KR)
47Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Search componentsSearch query
Words a user inputThe kind of an index word, package name
Components contain given query are searched from Database
48Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Ranking suited to a user request
Keyword Rank (KR)Components which contain words given by a user are searchedRank components using the value calculated from index word weight Index word weight
– Many frequency in use of a component– A word contained particular components– A word represent the component function such as Class
nameSort the sum of all given word weightTF-IDF weighting using full-text search engine
49Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Calculation of KR valueCalculate weight Wct with component c word t
TFi : The frequency with which a kind i of word t occurs in component c IDF : the total number of components / the number of components containing word tkwi : Weight of a kind i
KR value is the sum of all word Wct
kindall
iict IDFTFkww ) (
the kind of a word
weight
Class name 200Interface name 50Method name 200Package name 50
Import 30Method call 10Field access 10Variable type 10
Instance creation
10
Local var access 1Comment 30
Doc comment 50Line comment 10
String 1
50Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Aggregate two ranksAggregate two ranks KR and CRAggregation method
Borda Count method known a voting systemUse for single or multiple-seat electionsThis form of voting is extremely popular in determining awards
SPARS-JRank components both KR and CRUsing KR and CR, the component that be suitable user’s request, reusable and sophisticated
51Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Borda Count methodThere are 10 voters and 5 candidates (from A to E) Each voter rank candidates1 point for last place, 2 points for second from last place …, and N points for first place1st=5points , 2nd=4points ,…
Web ページ検索では,検索結果の最初の1ページ( 10 件)目に該当文書が見つからない場合,2ページ目を検索するよりは検索キーワードを変更する傾向がある†検索結果の上位 10 件の部品に対する適合率を求める†Amanda Spink, B. J. Jansen, D. Wolfram, T. Saracevic:”From E-Sex to E-Commerce: Web Search Changes” IEEE Computer,Vol.35,No.3,pp.107-109,Mar(2002).
77Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University