1
Overview of Component Search System SPARS-J
Tetsuo Yamamoto*,Makoto Matsushita**, Katsuro Inoue**
*Japan Science and Technology Agency**Osaka University
2Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Outline
Motivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
3Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
MotivationReuse of Software Components
is a technique of developing new software components by using the components developed in the past.
Example of reusable components: source code, document …..improves productivity and quality, and cuts down development cost as a result.
However, reuse of components is not utilized effectively.A developer doesn’t know existence of desirable components.Although there are a lot of components, these components are not organized.
In order to take advantage of reuse, it is required to manage components and search suitable component easily
4Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Research aimWe have built the system which have functions as follows
Collects software components eagerly without preserving their inherent structuresManages the component information automaticallyProvides component be suitable for User’s request
TargetsIntranet
closed software development inside a companyInternet
Large open source software development web site– SourceForge, Jakarta Project. etc.
5Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Outline
Motivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
6Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
SPARS-J(Software Product Archive , analysis and Retrieval System for Java)
Java Software Product Archiving, analyzing and Retrieving System
Many components are analyzed automatically. A search engine is built based on the analysis information.Component: a source code of class or interface
FeaturesKeyword searchTwo ranking methods
Frequency in use of a wordUse relation
Analyzed informationComponents using/used by a componentPackage hierarchy
7Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Structure of SPARS-J
Component analysis part・ extract components from a file・ store analyzed information to DB・ clustering and rank components using DB
Database
File
Analyzedinformation
・ store analyzed information and component
Component retrieval part・ search components in correspondence with query from DB・ rank components based on frequency in use of a keyword・ aggregate two rankings
User
User interface partQuery
Result
・ deliver query to component retrieval part・ show search results
QueryHit components
Library(Java source files)
Componentinformation
8Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Ranking search results
Ranking method1. Component suited to a user request
– Ranking based on frequency in use of a word
2. Component used mostly– Ranking based on component use relation
We make it high ranking that the component both 1 and 2 are high
Search results are shown to aggregate two ranks
Keyword Rank (KR)
Component Rank (CR)
9Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Outline
Motivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
10Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Component analysis part
Extract component and its information from a Java source fileThe process
Extract a componentIndex the componentExtract use relationsClustering similar componentsRank components based on use relations (CR method)
11Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Extract and index a component
Extracting componentFind class or interface block in a java source file
Location information in the file (start line number, end line number)
IndexingExtract index key from the component
Index key : a word and the kind of itNo reserved words are extracted
Count frequency in use of the word
word kind
Sort Class name
quicksort Comment
quicksort Method name
pivot Variable name
quicksort Method call
: :Index key
public final class Sort { /* quicksort */ private static void quicksort(…) { int pivot; : quicksort(…); quicksort(…); }}
1
1
1
1
2
:frequency
12Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Extract use relationsExtract use relations among components using semantic analysisMake component graph from use relations
Node: componentEdge: use relation
Inheritance
Interfaceimplementati
on
Variable type
Instance creation
Field access
Method callThe kind of use relation
public class Test extend Data{ : public static void main(…) { : Sort.quicksort(super.array); : }}
Sort
Data
Test
Component graph
InheritanceField access
Method call
13Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Similar componentSimilar component is copied component or minor modified componentWe merge similar components into single componentMerged component have use relations that all component before merging have
C
B F
A D
G
E
Component graph
BF
AD E
C G
Clustered component graph
C
B F
A D
G
E
14Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clustering components
We measure characteristics metrics to merge componentsThe difference ratio of each component metrics
Metricscomplexity
– The number of methods, cyclomatic, etc. – represent a structural characteristic
Token-composition– The number of appearances of each token– represent a surface characteristic
15Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Ranking based on use relation
Component Rank (CR)Reusable component have many use relation
The example of use is muchGeneral purpose componentSophisticated component
We measure use relation quantitatively, and rank components
The component used by many components is importantThe component used by important component is also importantKatsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Component Rank:
Relative Significance Rank for Software Component Search", ICSE, Portland, OR, May 6, 2003.
16Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.34 0.33
0.33
0.17
0.17
0.330.33
Ad-hoc weights are assigned to each node
17Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.33 0.17
0.5
0.175
0.175
0.170.5
The node weights are re-defined by the incoming edge weights
18Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.5 0.175
0.345
0.25
0.25
0.1750.345
We get new node weights
19Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Propagating weights
A B
C
0.4 0.2
0.4
0.2
0.2
0.20.4
• We get stable weight assignment next-step weights are the same as previous ones
• Component Rank : order of nodes sorted by the weight
20Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Outline
Motivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
21Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Component retrieval part
Search components from database, rank components The process
Search componentsRanking suited to a user requestAggregate two ranks (CR and KR)
22Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Search components
Search queryWords a user inputThe kind of an index word, package name
Components contain given query are searched from Database
23Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Ranking suited to a user request
Keyword Rank (KR)Components which contain words given by a user are searchedRank components using the value calculated from index word weight Index word weight
– Many frequency in use of a component– A word contained particular components– A word represent the component function such as Class
name
Sort the sum of all given word weightTF-IDF weighting using full-text search engine
24Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Calculation of KR value
Calculate weight Wct with component c word tTFi: The frequency with which a kind i of word t occurs in component c IDF: the total number of components / the number of components containing word tkwi: Weight of a kind i
KR value is the sum of all word Wct
kindall
iict IDFTFkww ) (
the kind of a word
weight
Class name 200
Interface name 50
Method name 200
Package name 50
Import 30
Method call 10
Field access 10
Variable type 10
Instance creation
10
Local var access 1
Comment 30
Doc comment 50
Line comment 10
String 1
25Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Aggregate two ranks
Aggregate two ranks KR and CRAggregation method
Borda Count method known a voting systemUse for single or multiple-seat electionsThis form of voting is extremely popular in determining awards
SPARS-JRank components both KR and CRUsing KR and CR, the component that be suitable user’s request, reusable and sophisticated
26Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Borda Count methodThere are 10 voters and 5 candidates (from A to E) Each voter rank candidates1 point for last place, 2 points for second from last place …, and N points for first place1st=5points , 2nd=4points ,…
A : 15+3+6+4=28pointsB : 38pointsC : 38pointsD : 22pointsE : 26points
1st
2nd
3rd
4th
5th
3 A B C D E
3 E B C D A
2 C B A E D
2 C D B A E
1st
1st
3rd
4th
5th
B C A D E
Aggregation
27Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Outline
Motivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
28Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
User interface
Receive a user’s query and provide the search results through Web browser
Microsoft Internet Explore, Mozilla, etc.
The processParse query word and the search conditionShow rank ordered resultsShow analyzed information of the component
Used by/Using the componentMetrics
29Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analyzed information
A component information are as followsMetrics
The number of method, variableLOC, cyclomaticEtc. (measurable metrics in the component itself)
Components used by/using the componentShow lists of nodes followed use relation
Components that are similar to the component
Show lists of similar components
30Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Package browsing
The naming structure for Java packages is hierarchical
A user can search lists of components in same package of a component easily
31Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (top page)
32Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (search results)
33Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (source code)
34Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (similar components)
35Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (using the component)
36Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (used by the component)
37Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Screenshot (package browsing)
38Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Outline
Motivation and research aimSPARS-J
OutlineSystem architectureRanking methodEach part
Analysis partRetrieval partUser Interface
ExperimentConclusion and Future work
39Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment(1/2)
Comparison with GoogleRegister about 130,000 components get from InternetQuery words ‘calculator applet’ and ‘chat server client’
Calculate relevance ratio of 10 rank higherRelevance: The component is reusable source code
Google is a web search engine…Add ‘java source’ term to the query wordsFollow one link from the result web page
40Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment(2/2)Example 1 :
”calculator applet”SPARS-J
9 hits7 suited components
Example 2 :”chat server client”SPARS-J
69 hits57 suited components
Using SPARS-J, suited component is high order
orderrank
componentrelevant ofnumber Theratio relevance
SAPRS-J Google SPARS-J Google
order
Relevance
Ratio Relevance
Ratio Relevance
Ratio Relevance
ratio
1 ○ 1 ○ 1 ○ 1 × 0
2 ○ 1 × 0.5 ○ 1 × 0
3 ○ 1 ○ 0.67
○ 1 × 0
4 ○ 1 × 0.5 ○ 1 × 0
5 ○ 1 ○ 0.6 ○ 1 × 0
6 × 0.83
○ 0.67
○ 1 × 0
7 ○ 0.86
× 0.57
○ 1 ○ 0.14
8 × 0.75
○ 0.63
○ 1 × 0.13
9 ○ 0.78
× 0.56
○ 1 ○ 0.22
10 - - × 0.5 ○ 1 ○ 0.3
Example1 Example2
41Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Conclusion and Future work
We developed component search engine SPARS-JUsing SPARS-J, retrieval of components used well is enabled easily.
Future workMorphological analysis of Index keywordCollaborative filteringInvestigate best ranking method
The value of weightAggregation ranks
Evaluation of SPARS-JUsability
42Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
End
43Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Component graph
A B
C
ED
F
G
IH
System X System Y
componentuse relation
44Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Weight of nodes
A B
C
ED
F
G
IH
System X System Y
sum of all node weights = 1 ... (1)weight of node represents significance of node
0.10.1
0.2
0.1 0.1
0.1
0.2
0.050.05
1 w(x) 0
45Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Weights of edges
A0.2
0.05
0.05
0.05
0.05
B
0.2
0.05
0.15
0.4
d=1/4
d=1/4
d=1/4
d=1/4
d: distribution ratio
• Node weight is distributed to each outgoing edge• Edge weights are collected at the destination node
sum of all outgoing edge weights = origin node weight ... (2)sum of all incoming edge weights = destination node weight ... (3)
46Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Definition of weights
Under constraints (1)~(3), we have a simultaneous equation
)(
)(
)(
2
1
nvw
vw
vw
)(
)(
)(
2
1
nvw
vw
vw
t
ddd
ddd
ddd
nnnn
n
n
21
22221
11211
= .
Dt: transposed matrix of distribution ratios
W: node weight vector
This simultaneous equation can be solved by propagating node weight through edges in the graph
47Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Pseudo use relation
A B C
• Weight computation does not always converge
• Add a pseudo edge from a node to another, if there is no 'real' edge
• Distribution ratios: pseudo edges << real edges
48Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Markov model
• Component rank model can be considered as a Markov Chain of user's focus
• User's focus moves from one component to another along a use relation at a fixed time duration
• Node weight represents the existence probability of the user's focus at infinite future
0.01
0.02 0.01
0.030.05
0.001 0.1
49Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Related WorksMarkov models of documentation traversal
Influence Weight: impact factor of journal publication thought incoming referencesPage Rank: weight of HTML in the Internet through incoming web links
Explicit use relationsNo clustering (important for software products)
Measurement reusability of components or interfaces
Use various characteristic metrics Indirect indicator of reusability Our approach directly reflects usage of components
50Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
部品群グラフをもとにした繰り返し計算
計算手順1. 各頂点に適当な重みを与える
– 重みの総和は 1
2. 各有向辺の重みを求める– 頂点の重みを,出ていく辺で分配する
3. 各頂点の重みを再計算– 頂点に入ってくる辺の重みの総和を,その頂点の重みとして再定
義する4. 頂点の重みが収束するまで, 2.3. を繰り返し計算する5. 収束した頂点の重みを,その頂点に対応する部品群の CR 値と
する– 部品の評価値は属する部品群の CR 値とする
C10.334
C20.333
C30.333
C10.334
C20.333
C30.333
v1×50%
v1×50%
v2×100%v3×100%
C1 C2
C3
0.167
0.167
0.3330.333
C10.333
C20.167
C30.500
C1 C2
C3
0.1665
0.1665
0.1670.500
C10.500
C20.1665
C30.3335
C10.400
C20.200
C30.400
0.200
0.200
0.2000.400
CR 値の計算