Top Banner
Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College Park
32

Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Dec 30, 2015

Download

Documents

Joseph Harrell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Web Query Disambiguation from Short Sessions

Lilyana Mihalkova* and Raymond Mooney

University of Texas at Austin

*Now at University of Maryland College Park

Page 2: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

2

Web Query Disambiguation

scrubs

?

Page 3: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

3

Existing Approaches

• Well-studied problem: – [e.g., Sugiyama et al. 04, Sun et al. 05, Dou et al. 07]

• Build a user profile from a long history of that user’s interactions with the search engine

Page 4: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

4

Concerns

• Privacy concerns– AOL Data Release– “Googling Considered Harmful” [Conti 06]

• Pragmatic concerns– Storing and protecting data– Identifying users across searches

Page 5: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

5

Proposed Setting

• Base personalization only on short glimpses of user search activity captured in brief short-term sessions

• Do not assume across-session user identifiers of any sort

Page 6: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

How Short is Short-Term?N

umbe

r of s

essi

ons

with

that

man

y qu

erie

s

Number of queries before ambiguous query

Page 7: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

7

Is This Enough Info?

98.7 fm

kroq

scrubs

www.star987.com

www.kroq.com

???

huntsville hospital

ebay.com

scrubs

www.huntsvillehospital.com

www.ebay.com

???scrubs-tv.com scrubs.com

Page 8: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

More Closely Related Work

• [Almeida & Almeida 04]: Similar assumption of short sessions, but better suited for a specialized search engine (e.g. on computer science literature)

Page 9: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Main Challenge

• How to harness this small amount of potentially noisy information available for each user?– Exploit the relations among sessions, URLs, and

queries

Page 10: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Relationship are established by shared queries or clicks that

are themselves related

Query strings and URLs are related by sharing

identical keywords, or by being identical

The choices users make over the set of possible results are

also interdependent

Page 11: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Exploiting Relational Information

• To overcome sparsity in individual sessions, need techniques that can effectively combine effects of noisy relations of varying types among entities– Use statistical relational learning (SRL) [Getoor &

Taskar 07]– We used one particular SRL model: Markov logic

networks (MLNs) [Richardson & Domingos 06]

Page 12: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Rest of This Talk

• Background on MLNs• Details of our model

– Ways of relating users– Clauses defined in the model

• Experimental results• Future work

Page 13: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

13

Markov Logic Networks (MLNs)

• Set of weighted first-order clauses• The larger the weight of a clause, the greater

the penalty for not satisfying a grounding of this clause– Clauses can be viewed as relational features

[Richardson & Domingos 06]

Page 14: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

MLNs Continued

• Q: set of unknown query atoms• E: set of evidence atoms• What is P(Q|E), according to an MLN:

All formulas

Number of satisfied

groundings of formula i

Normalizing constant

Page 15: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

MLN Learning and Inference

• A wide range of algorithms available in the Alchemy software package [Kok et al. 05]

• Weight learning: we used contrastive divergence [Lowd & Domingos 07]– Adapted it for streaming data

• Inference: we used MC-SAT [Poon & Domingos 06]

Page 16: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

16

Re-Ranking of Search Results

0.1

0.4

0.9

0.5

0.30.7

MLN

0.1

0.4

0.9

0.5

0.3

0.7

•Hand-code a set of formulas to capture regularities in the domain

- Challenge: define an effective set of relations

•Learn weights over these formulas from the data

- Challenge: noisy data

Page 17: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

17

Specific Relationships

huntsville hospital

ebay

scrubs

huntsvillehospital.org

ebay.com

???

huntsville school

. . .

. . .

hospitallink.com

scrubs

scrubs-tv.com

ebay.com

scrubs

scrubs.com

Page 18: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

18

Collaborative Clauses

• The user will click on result chosen by sessions related by:

ambiguous query www.someplace1.com

some query ambiguous query

www.clickedResult.com

. . .

www.someplace1.com

•Shared click•Shared keyword click-to-click, click-to-search, search-to-click, or search-to-search

ambiguous query www.aClick.com

some query ambiguous query

www.clickedResult.comwww.someplace1.com

some other

Page 19: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

19

Popularity Clause

• User will choose result chosen by any previous session, regardless of whether it is related– Effect is that the most popular result becomes the

most likely pick

Page 20: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

20

Local Clauses

• User will choose result that shares a keyword with a previous search or click in the current session

ambiguous query

some query

www.someplace1.com

www.someResult.com

www.anotherPossibility.com

www.yetAnother.com

Not effective because of data sparsity

Page 21: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

21

Balance Clause

• If the user chooses one of the results, she will not choose another– Sets up a competition among possible results– Allows the same set of weights to work well for

different-size problems

Page 22: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

22

Empirical Evaluation: Data

• Provided by MSR• Collected in May 2006 on MSN Search• First 25 days is training, last 6 days is test

×Does not specify which queries are ambiguous• Used DMOZ hierarchy of pages to establish ambiguity

×Only lists actually clicked results, not all that were seen• Assumed user saw all results that were clicked at least

once for that exact query string

Page 23: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Empirical Evaluation: Models Tested

• Random re-ranker• Popularity• Standard collaborative filtering baselines based on

preferences of a set of the most closely similar sessions– Collaborative-Pearson– Collaborative-Cosine

• MLNs– Collaborative + Popularity + Balance– Collaborative + Balance– Collaborative + Popularity + Local + Balance

In Paper

Page 24: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

24

Empirical Evaluation: Measure

• Area under the ROC curve (mean average true negative rate)

In Paper: Area under precision-recall curve, aka mean average precision

Page 25: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

25

AUC-ROC Intuitive Interpretation

0.1

0.4

0.9

0.5

0.3

0.7

• Assuming that user scans results from the top until a relevant one is found,

• AUC-ROC captures what proportion of the irrelevant results are not considered by the user

Page 26: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

26

AUC-ROC

0.5

0.51

0.52

0.53

0.54

0.55

0.56

0.57

0.58

0.59

Random Collaborative-Pearson

Collaborative-Cosine

Popularity MLN

Results Collaborative + Popularity +

Balance

Page 27: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

27

Difficulty Levels

1 2 3 4 5 6 7 80.45

0.47

0.49

0.51

0.53

0.55

0.57

0.59

0.61

0.63

0.65

Increasing Easiness

AU

C-RO

CAverage

Case

MLN

Popularity

Possible Resultsfor a given query

Prop

ortio

n Cl

icke

d

KL

Page 28: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

28

Difficulty Levels WorstCase

1 2 3 4 5 6 7 80.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

Increasing Easiness

AU

C-RO

C

Popularity

MLN

Page 29: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Future Directions

• Incorporate more info into the models– How to retrieve relevant information quickly

• Learn more nuanced models– Cluster shared clicks/keywords and learn separate

weight for clusters• More advanced measures of user relatedness

– Incorporate similarity measures– Use time spent on a page as an indicator of

interestedness

Page 30: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Thank you

• Questions?

Page 31: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

31

First-Order Logic

• Relations and attributes are represented as predicates

Predicate Variable Predicate

Literal Literal

WorkedFor(brando, coppola)

Ground Literal, “Grounding” for short

WorkedFor(A, B) Actor(A)

Page 32: Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

32

Clauses• Dependencies and regularities among the predicates are

represented as clauses:

Premises Conclusion

Movie(T, A) Director(A) Actor(A)

• To obtain a grounding of a clause, replace all variables with entity names:

Movie(godfather, pacino) Director(pacino) Actor(pacino)