Routing of Structured Queries in Large-Scale Distributed Systems Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS_IR'08) @ ACM 17th CIKM 2008, Napa Valley, California, USA, Oct 2008. Judith Winter Institute for Informatics / Telematics Group Goethe-University / Frankfurt am Main, Germany
22
Embed
Routing of Structured Queries in Large-Scale Distributed Systems Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS_IR'08) @ ACM.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Routing of Structured Queries in Large-Scale Distributed Systems
Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS_IR'08)
@ ACM 17th CIKM 2008,
Napa Valley, California, USA, Oct 2008.
Judith Winter
Institute for Informatics / Telematics GroupGoethe-University / Frankfurt am Main, Germany
2
Judi
th W
inte
r: R
outin
g of
Str
uctu
red
Que
ries
in L
arge
-Sca
le D
istr
. Sys
tem
sJu
dith
Win
ter:
Rou
ting
of S
truc
ture
d Q
uerie
s in
Lar
ge-S
cale
Dis
tr. S
yste
ms
Routing of Structured Queries in Large-Scale Distributed Systems
Overview
1. Introduction
2. Concept & Architecture
3. Routing
4. Evaluation
5. Questions and Discussion
1. Introduction
3
Judi
th W
inte
r: R
outin
g of
Str
uctu
red
Que
ries
in L
arge
-Sca
le D
istr
. Sys
tem
sJu
dith
Win
ter:
Rou
ting
of S
truc
ture
d Q
uerie
s in
Lar
ge-S
cale
Dis
tr. S
yste
ms
• XML Information Retrieval in P2P systems
• Investigate the impact of using structural information when retrieving XML-documents in a P2P network
• Challenge: not all information accessable / scalability issues
Proposed research:
How to perform & improve query routing in a large-scale P2P System
Routing of Structured Queries in Large-Scale Distributed Systems
1. Introduction
2. Concept & Architecture
3. Routing
4. Evaluation
5. Questions and Discussion
2. Concept & Architecture
6
Judi
th W
inte
r: R
outin
g of
Str
uctu
red
Que
ries
in L
arge
-Sca
le D
istr
. Sys
tem
sJu
dith
Win
ter:
Rou
ting
of S
truc
ture
d Q
uerie
s in
Lar
ge-S
cale
Dis
tr. S
yste
ms
• Queries: content-and-structure (CAS)
• Indexing: include structure
• Hybrid indexing: globally or locally (distributing summaries) depending on peer status index with posting lists (doc level) & peer lists (peer-level)
• Distributing global information into DHT
• Ranking: extended vector space model (using structure)
• Results/Retrieval units: document or element retrieval
• Evaluation with INEX-Collection of 2007: • Wikipedia-collection: 660.000 documents (4.6 GB) • 80 CAS queries (out of 123 topics )• run on 1 peer with simulationDHT (measurement of #postings)• retrieval of best 1500 results per query• PLmax set to indefinite ( all HDKs single XTerms)• different structural similarity functions• simple version of the proposed formulas (document-based)
• Goal: show the effect of using structural hints for routing
• efficiency (#postings: 100, 500, 2000 postings)
• effectivness (precision at different recall levels)
• Propose to take advantage of XML structure when routing in highly distributed environments such as P2P systems
• Provide an infrastructure for investigation of proposed techniques to perform routing based on evidence from document-, element-, collection-, and peer-level
• For 80 CAS topics of INEX2007, efficiency and effectivness could be improved