1 Ranking Objects by Ranking Objects by Exploiting Exploiting Relationships: Computing Relationships: Computing Top-K over Aggregation Top-K over Aggregation Kaushik Ch Kaushik Ch akrabarti akrabarti Venkatesh G Venkatesh G anti anti Dong Xin Dong Xin Jiawei Han Jiawei Han Presented by: Vaidergorn Presented by: Vaidergorn Eitan Eitan
49
Embed
Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation
Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation. Kaushik Chakrabarti. Venkatesh Ganti. Jiawei Han. Dong Xin. Presented by: Vaidergorn Eitan. Outline. Introduction System Overview Scoring Functions SQL implementation Early Termination Approach Experiments - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
11
Ranking Objects by Exploiting Ranking Objects by Exploiting Relationships: Computing Top-K Relationships: Computing Top-K
Pruning to the Final Top-KPruning to the Final Top-K
3838
Pruning to the Final Top-KPruning to the Final Top-K
• UB={t1(2.5), t2(1.8), t3(1.6), t4(1.6)} K=3
• t1=((1+0.1)+(0.1+1))=2.2• t1=2.2, t2=1.6, t3=1.6, t4=1.6• UB={t1(2.2), t2(1.6), t3(1.6), t4(1.6)}• The final top-k results are {t1, t2, t3}
w1 w2
d1 1.0 0.1
d2 0.1 1.0
3939
Exact Top-K with Approximate Exact Top-K with Approximate scoresscores
• Exact Top-K with Approximate Scores:
• Crossing Objects: its rank in LB is more than K and its rank in UB is K or less.
• Boundary Objects: a pair of target objects (A,B):1. The top K in UB and LB are
same.2. A is the Kth object in LB and
uth object in UB (u ≤ k)3. B is the (K+1)th object in UB
and lth object in LB (l ≥ K+1)4. LBK ≤ UBK+1
UB LB
1 A C
2 A=1.5
3 B=1.6
4 C B
K=2
4040
OutlineOutline
• Introduction
• System Overview
• Scoring Functions
• SQL implementation
• Early Termination Approach
• Experiments
• Conclusions
4141
ExperimentExperiment
• Our documents comprise of a collection of 714,192 news articles from 03’-04’ obtained from MSNBC news portal.
• We index those news articles inside SQL Server FTS engine.
• We extract three types of named entities: PersonNames, OrganizationNames, and LocationNames.
4242
ExperimentExperiment
• To get realistic OF queries, we picked the following top 10 sport news queries on Google in 2004 .
4343
ExperimentExperiment
• “PersonNames” the desired entity type for all the queries. All our measurements are averaged across the 10 queries.
• Implementation all 3 approaches to evaluate OF queries: SQL implemetation, GenPrune,GenOnly.
• SUM as the combination function.SUM as the aggregation function.
4444
ExperimentExperiment
4545
ExperimentExperiment
4646
4747
OutlineOutline
• Introduction
• System Overview
• Scoring Functions
• SQL implementation
• Early Termination Approach
• Experiments
• Conclusions
4848
ConclusionsConclusions
• Class of OF queries and defined its semantics.
• Two broad class of scoring functions, which exploit relationships between documents and objects, to compute the relevance score of the target objects for a given set of keywords.
• We present early termination techniques which shows that our approach is 4-5 times faster than SQL implementation.