Top Banner
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi- structured and Structured Data Guoliang Li et al. Guoliang Li et al.
23

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

EASE: An Effective 3-in-1 Keyword Search Method for

Unstructured, Semi-structured and Structured

Data

Guoliang Li et al.Guoliang Li et al.

Page 2: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

The ProblemThe Problem

Keyword search introduces false positivesKeyword search introduces false positives

i.e.: “Conference 2008 Canada Data Integration”i.e.: “Conference 2008 Canada Data Integration”

Page 3: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

The ProblemThe Problem

Websites are organized through contentWebsites are organized through content

““Dr Pain, Math 343, Linear Algebra”Dr Pain, Math 343, Linear Algebra”

Page 4: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

The SolutionThe Solution

Combine linked pages for search, Combine linked pages for search, ordered by rankingordered by ranking

Page 5: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

The Solution

r-Radius Steiner Graph Problem r-Radius Graph

Centric Distance: shortest path Radius: minimal centric distance

vu

t

r

s

Page 6: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

The Solution

r-Radius Steiner Graph Problem Content node: Contains a keyword Steiner node: Two content nodes

u

t

r“Dr Pain”

“Math 343”

v

s

Page 7: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

r-Radius Steiner Graph on search

Example:Example:

Page 8: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

r-Radius Steiner Graph on search

Page 9: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

r-Radius Steiner Graph on search

The graph model for the publication database

Page 10: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

Adjacency MatrixAdjacency Matrix

Page 11: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

Finding r-Radius GraphsFinding r-Radius Graphs Query: “Shanmugasundaram, Guo, Query: “Shanmugasundaram, Guo,

XRANK”XRANK”

Page 12: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

Avoiding OverlappingAvoiding Overlapping

Maximal r-Radius GraphMaximal r-Radius Graph It is not contained in another r-Radius It is not contained in another r-Radius

subgraphsubgraph But wait! There is still overlapBut wait! There is still overlap No problem:No problem:

Graph Clustering Graph Clustering Graph PartitioningGraph Partitioning

Page 13: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

Graph ClusteringGraph Clustering

Page 14: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

RankingRanking

TF-IDF-based IR ranking (tf,idf,ndl) is TF-IDF-based IR ranking (tf,idf,ndl) is okok

Better yet: structural compactness-Better yet: structural compactness-based DB ranking (SIM)based DB ranking (SIM) More compact more relevantMore compact more relevant Length of path inversely proportional to Length of path inversely proportional to

rankingranking

Page 15: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

IndexingIndexing

IR score and Sim score are combinedIR score and Sim score are combined An inverted index (EI-Index) is An inverted index (EI-Index) is

created created The inverted index stores keyword The inverted index stores keyword

pairs and scorespairs and scores

Page 16: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

ExperimentsExperiments

Page 17: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

ResultsResults

Page 18: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

ResultsResults

Page 19: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

ResultsResults

Page 20: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

ResultsResults

Page 21: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

Strengths of the PaperStrengths of the Paper

Very well written paperVery well written paper Deep research on the topicDeep research on the topic Mathematical based and provedMathematical based and proved Baseline with current methodsBaseline with current methods Good resultsGood results

Page 22: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

Weakness and Future WorkWeakness and Future Work

It might be too complexIt might be too complex Could work on ways to find Steiner Could work on ways to find Steiner

graphs fastergraphs faster It doesn’t consider cases of farming It doesn’t consider cases of farming

sites or bogus sitessites or bogus sites

Page 23: EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.

Questions?Questions?