Top Banner
Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng (Binghamton University) Abdur Chowdhury (America Online, Inc.)
32

Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Dec 16, 2015

Download

Documents

Darlene Warren
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Effective Keyword Search in Relational Databases

Fang Liu (University of Illinois at Chicago)Clement Yu (University of Illinois at Chicago)Weiyi Meng (Binghamton University)Abdur Chowdhury (America Online, Inc.)

Page 2: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Effective Keyword Search in Relational Databases

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 3: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction

Why keyword search in relational databases? We want to search text data in

relational databases SQL with the “contains” operator is not

for non-expert users Keyword search is tremendous

successful in text database by ranking documents based on similarity. It is for non-expert users

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 4: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction Text data in relational databases

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 5: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

IntroductionSuppose a user is looking for albums titled “off the wall”

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 6: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction Keyword search is very

successful in text database by ranking documents based on similarity. Google, Yahoo and MSN search are the examples.

So, let’s do keyword search in relational databases!(DBXplorer, BANKS, DISCOVER & IR-style DISCOVER, ObjectRank, Ranking Objects)

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 7: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction Let’s do it, but how?

What are answers to be ranked? How should we rank these answers?

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 8: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction -- an answer

An answer for a given query Q: a tuple tree, in which every leaf node must have at least one keyword in Q.SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 9: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction Use a slightly modified

algorithm [DISCOVER] to produce all answers for a given query.

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 10: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction: Ranking Our focus is on the effectiveness

problem of ranking answers: the more relevant an answer is to the user query, the higher it should be ranked.

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 11: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction: Contributions We identify four new factors that are

critical to effective ranking and we propose a new ranking strategy

Design and conduct comprehensive experiments for the effectiveness problem

Experimental results show our strategy is significantly better than existing works in effectiveness

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 12: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work

Effective Keyword Search in Relational Databases

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 13: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

3.3 IR Ranking Q=(k1, k2, ..,kn), D is a document, Sim(Q,D) is

the ranking score of D.

DQk

DkweightQkweightDQSim,

),(*),(),(

idfndl

ntfDkweight ),(

))ln(1ln(1 tfntf

1ln

df

Nidf

avgdldl

ssndl )1(

tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2

1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8

tf=2, ntf=1.53;tf=10, ntf=2.2; half: idf =0.69, 1/100, idf=4.6, 1/200,000, idf=12, s=0.2

1: ndl=1, half, ndl=0.9, 1/10:ndl = 0.8, 2: ndl=1.2, 10: ndl=2.8

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 14: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work

Effective Keyword Search in Relational Databases

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 15: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Our Ranking Strategy T=(D1,D2,..Dn), so Si

m(Q,D)Sim(Q,T)

DQk

DkweightQkweightDQSim,

),(*),(),(

TQk

TkweightQkweightTQSim,

),(*),(),(

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 16: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Our Ranking Strategy

TQk

TkweightQkweightTQSim,

),(*),(),(

T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T)

)(*

*),(

TNsizendl

idfntfDkweight

g

i

),(),...,,()( 1, mDkweightDkweightCombTkweight

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 17: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Our Ranking Strategy Tuple Tree Size Normalization

avgsize

TsizessTNsize

)()1()(

)(*

*),(

TNsizendl

idfntfDkweight

g

i

# of tuples in a tuple tree T

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 18: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Our Ranking Strategy Document Length Normalization

Reconsidered

)(*

*),(

TNsizendl

idfntfDkweight

g

i

)ln(1*)1( avgdlavgdl

dlssndl

Document length of Di

Average Document length of the text column of Di

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 19: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Our Ranking Strategy Document Frequency

Normalization

)(*

*),(

TNsizendl

idfntfDkweight

g

i

1ln

g

gg

df

Nidf

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 20: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Our Ranking Strategy T=(D1,D2,..Dn)

maxWgt is the maximum weight(k, Di) sumWgt is the sum of weight(k, Di)

),(),...,,()( 1, mDkweightDkweightCombTkweight

Wgt

sumWgtWgtComb

maxln1ln1*max()

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 21: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Our Ranking Strategy T=(D1,D2,..Dn), so Sim(Q,D)Sim(Q,T)

),(),...,,()( 1, mDkweightDkweightCombTkweight

)(*

*),(

TNsizendl

idfntfDkweight

g

i

idfndl

ntfDkweight ),(

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 22: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Our Ranking Strategy Schema Terms in Query

lyrics for How come by D12 lusher the singer's lyrics to burn

Phrase-based Ranking Using position information to boast phrase matching

Concept-based Ranking Can improve effectiveness Can assign semantics to answers

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 23: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work

Effective Keyword Search in Relational Databases

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 24: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Experiments – data set A Lyrics Database

50 Queries from an AOL query log Relevance Judgment: pooling + logs

Page 25: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Experiments: some queries to me lyrics by lionel richie inner smile texas lyrics lionel richie lyrics lionel richie lyrics you mean more to me avril lavigne lyrics for the album under t

his skin avril lavigne lyrics

Page 26: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Experiments – measure

Reciprocal rank: measures how good the system is to return the first relevant answer.

MAP (mean average precision): A precision is computed after each relevant answer is retrieved. Then we average all precision values to get a single number to measure the overall effectiveness.

Page 27: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Experiments – results Our ranking strategy: the

four new factors.

Page 28: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Experiments – results Comparison with related

works

Page 29: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Introduction IR ranking in text databases Our ranking strategy in RDBs Experiments Conclusions and future work

Effective Keyword Search in Relational Databases

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 30: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Conclusions Effectiveness is as

important as efficiency The four new factors are

critical to search effectiveness

Our strategy is significantly more effective than related works

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 31: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Future Work Utilize link analysis Combine non-text columns Efficiency Problem More real world data sets

SIGMOD 2006: Effective Keyword Search in Relational Databases

Page 32: Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Questions ?

SIGMOD 2006: Effective Keyword Search in Relational Databases