SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS lab , University of Georgia Paper presentation at WWW2005, Chiba Japan Kemafor Anyanwu, Angela Maduko, and Amit Sheth. SemRank: Ranking Complex Relationship Search Results on the Semantic Web, Proceedings of the 14th International World Wide Web Conference (WWW2005), Chiba, Japan, May 10-14, 2005, pp. 117-127. This work is funded by NSF-ITR-IDM Award#0325464 titled ‘SemDIS : Discovering Complex Relationships in the Semantic Web ’ and NSF-ITR-IDM Award#0219649 titled ‘Semantic Association Identification and Knowledge Discovery for National Security Applications.’
35
Embed
SemRank: Ranking Complex Relationship Search Results on the Semantic Web Kemafor Anyanwu, Angela Maduko, Amit Sheth LSDIS labLSDIS lab, University of Georgia.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Kemafor Anyanwu, Angela Maduko, Amit ShethLSDIS lab, University of Georgia
Paper presentation at WWW2005, Chiba JapanKemafor Anyanwu, Angela Maduko, and Amit Sheth. SemRank: Ranking Complex Relationship Search Results on the Semantic Web, Proceedings of the 14th International World Wide Web
Conference (WWW2005), Chiba, Japan, May 10-14, 2005, pp. 117-127.
This work is funded by NSF-ITR-IDM Award#0325464 titled ‘SemDIS: Discovering Complex Relationships in the Semantic Web’ and NSF-ITR-IDM Award#0219649 titled ‘Semantic Association Identification and Knowledge Discovery for National Security Applications.’
– How varied is the result from what is expected from schema?
• Information Gain– How much information does a user gain by being
informed about a result?
• S-Match – Best semantic match with user need (if provided)
High Information GainHigh Refraction CountHigh S-Match
Low Information GainLow Refraction CountHigh S-Match
adjustable search mode
Modulative Rank Function
• Typical preference or rank function– Ranki = wij
* attrij
• What we want is, given– µ - weight function parameter– and attributes attr1, attr2 … attrk e.g. length– for each attribute, select appropriate weight functions
from g1, g2, … gm e.g. gi (µ) = µ • each gi is some function of µ
• Then– Ranki(µ ) = gj(µ) * (attrik
)
• where gj is the weight function selected for attrk
Refraction as a measure of predictability
Refraction
• The path “ enrolled_in taught_by married_to “ doesn’t exist anywhere at schema layer
• We say that the path refracts at node 3 • High refraction count in a path low
predictability
Student Course Professorenrolled_in
Spouse married_to
taught_by
1 2enrolled_in taught_by
4married_to
3
Semantic Summary
C1 C2 C3 C4 C5
C1 p1, p2, p1, p2 p3
C2 P5, p4
C3 p1, p2 p1, p2 p3
C4 p4, p5
C5
C1
C5
C4C3
C2
p1
p2
p1
p3
p4
p1, p2
p5
p4
p5
p3
p1, p2
p2
C1 C3C2 C4
RepresentativeOntology Class
Semantic Summary & Refraction.• A Semantic Summary is a graph of
representative ontology classes with appropriate relations as arcs
• For a path p = r1, p1, r2, p2, r3, there is a refraction at r2 if
• p1 (ROCi, ROCj) and p2 (ROCj, ROCk) (or vice versa) where– ROCi, ROCj, ROCk are representative ontology
classes of r1, r2, r3 respectively
Information content and
Information gain
Measuring Information Content of a Property • Content is related to uncertainty removed• Typically measured as some function of its
probability– High probability -> low information content
• For p P, P = set of property types, its information content ISP can be measured as: – ISP(pk) = log2(1/Prk(p = pk))
= - log2 ( [[ pk]] / [[ P ]] )
• ISP(p) is maximum when
– Pri = 1 / [[ P ]] = log [[ P ]]
Information Content of a Property Sequence – global perspective• The information content of a sequence of properties p1 p2 p3 pk is
– max(ISP(pi)), 1 ≤ i ≤ k
p1p2 p3
Prob = high Prob = low Prob = high
Information content is dependent on p2
weak point
Information Content – Local Perspective
• Global high information content but local low information content
• Given (a, p1, b), information content with respect to only the valid possibilities between a and b ?
(a, p1, b), and valid(p1) is
P = (ROCi, ROCj), a ROCi and b ROCj and superproperties
• Recompute probabilities based on P (local)– I =min(NI(pi) + average of other NI
Total Information Content
Total information content =
Information content from global perspective +
Information content from local perspective
S-MatchRelevance Specification as
keywords
published_in located_in
Keywords
S-Match
• Uses the “best semantic match” paradigm
• For a keyword ki and a property pj on a path:– SemMatch(ki, pj) = 0 < (2d)-1 1, where
• d is the minimum distance between the properties in a property hierarchy
• For a path ps, its S-Match value is:– the sum of the max(SemMatch(ki, pj))
Putting it all together …….
SemRank
• For a search mode and a path ps:
• Modulated information gain for ps, I(ps)
– I(ps) = (1-)(I(ps))-1 + I(ps)
• Modulated Refraction Count RC(ps)
– RC(ps) = RC(ps)
• SEMRANK(ps) = I(ps) (1+RC(ps)) (1+S-Match(ps))
Computing SemRank in SSARK
The SSARK system
Ranking Engine
Pipelined top-k resultsPreprocessor
Query Processor
RDFDocuments
Query & ResultInterface
User SubSystem
x ?? ?? ?? y
FDIXPHIXROIX
Index ManagerStorage Manager
LtStoreUtStore
Loader
LACLook Ahead
Cache
RCResult Cache
Preprocessingphase
Query Processingphase
Rankingphase
2 3a
de
f1
5
b
c4
Approach
g
Query Processor
a f
fec b
db , 4
, 5 , 4 , 2 , 1
, 6
, 2 , 5
, 3
Ranking engine
Assigns SemRank* valuesto leaves of the tree i.e. edges on the path
* - without refraction count
g
The Index Subsystem• FDIX – Frequency Distribution IndeX
– Stores the frequency distribution of properties
• ROIX – Representative Ontology IndeX– Maps classes to Representative Ontology Classes– Stores the semantic summary graph
• PHIX – Property Hierarchy IndeX– Uses the Dewey Decimal labeling scheme to
encode the hierarchical relationships in a property hierarchy
– Used for computing S-Match (match between keywords and properties in a path)
∙
∙
a, 3 b, 2
∙
c, 4 d, 1 e, 2 f, 5
h, 1 i, 6
g, 3
∙
∙
a, 3 b, 2
∙
c, 4 d, 1 e, 2 f, 5
h, 1 i, 6
g, 3h, 1i, 6,
e, 2f, 5,
d, 1c, 4 ,
ab, 5
g.i, 9 ,
h, 1i, 6 ,
i, 6
∙
∙
a, 3 b, 2
∙
c, 4 d, 1 e, 2 f, 5
h, 1
g, 3
gh, 4
d, 1
ab, 5
cf, 9 ,ce, 6
e, 2
f, 5 ,c 4 ,c.f, 9 ,
h, 1
i, 6 ,ab, 5
∙
∙
a, 3 b, 2
∙
c, 4 d, 1 e, 2 f, 5
h, 1 i, 6
g, 3
gh, 4
g.i, 9
d1
ce, 6
e, 2f, 5 ,c4 ,
gi, 9 ,cf, 9 ,
cf, 9,ce , 6
. . .
Top-K Evaluation
Final Top_k:1. g.i, 182. c. f, 9
Evaluation Issues• Data set needs
– Entities described with a variety of relationships– Richly connected hierarchies – Realistic frequency distributions
• Synthetically generated realistic small data set using human defined rules– e.g. |(p = “audits”)| ≤ 0.1 |(p = “enrolls”)|
µ = 0
µ = 1
Related Work
• Semantic searching and ranking of entities on the Semantic Web
• Rocha et al WWW2004, Guha et al WWW2003, Stojanovic et al ISWC 2003 , Zhuge et al WWW2003,
• Semantic ranking of relationships • Halaschek VLDB demo 2004, Aleman-Meza et al
SWDB03
Future Work• Comprehensive evaluation• Including some measures for importance of
nodes in the paths• Revise the Modulation function• Optimizing Top-K evaluation
– Decreasing height of tree– estimation techniques for a closer approximation
to SemRank ordering
Data, demos, more publications at SemDis project web site (Google: semdis)