Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business Technology Seoul National University Seoul, Korea Presented by Sung Eun, Park 1/25/2011 Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Microsoft Research
19
Embed
Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Diversifying Search Result
WSDM 2009
Intelligent Database Systems Lab.
School of Computer Science & Engineering
Seoul National University
Center for E-Business TechnologySeoul National UniversitySeoul, Korea
Presented by Sung Eun, Park1/25/2011
Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel IeongMicrosoft Research
Copyright 2010 by CEBT
Contents
Introduction
Intuition
Preliminaries
Model
Problem Formulation
Complexity
Greedy algorithm
Evaluation
Measure
Empirical analysis
2
Copyright 2010 by CEBT
Introduction
Ambiguity and diversification
For the ambiguous queries, diversification may help users to find at least one relevant document
Ex) the other day, we were trying to find the meaning of the word “ 왕건” .
– In the context of “ 우와 저거 진짜 왕건이다”
– But search result was all about the king of Goguryu
3
King 왕건
왕건 as a Big thing
Copyright 2010 by CEBT
Preliminaries
4
Copyright 2010 by CEBT
Problem Formulation
d fails to satisfy user that issues query q with the intended category c
Multiple intents
The probability that some document will satisfy category c
Copyright 2010 by CEBT
Complexity
Copyright 2010 by CEBT
A Greedy Algorithm
R(q) be the top k documents selected by some classical ranking algorithm for the target query The algorithm reorder the R(q) to maximize the objective
P(S|q) Input: k, q, C, D, P(c | q), V (d | q, c), Output : set of
documents S
0.4
0.9
0.5
0.4
0.4
D V(d | q, c)
0.08
0.72
0.40
0.32
0.08
g(d | q, c)U(R | q) = U(B | q) =0.8 0.2
×0.8×0.8×0.8×0.2×0.2
×0.08×0.08×0.2×0.2
0.08
0.08
0.04
0.03
0.08
0.12
×0.08×0.08
×0.12 0.050.4
0.9
0.4
0.07S
• Produces an ordered set of results
• Results not proportional to intent distribution
• Results not according to (raw) quality
Copyright 2010 by CEBT
Greedy Algorithm (IA-SELECT)
Input: k, q, C, D, P(c | q), V (d | q, c)
Output : set of documents S
When documents may belong to multiple categories, IA-SELECT is no longer guaranteed to be optimal.(Notice this problem is NP-hard)
S = ∅∀c ∈ C, U(c | q) ← P(c | q)while |S| < k do for d ∈ D do g(d | q, c) ← c U(c | q)V (d | q, c) end for d∗ ← argmax g(d | q, c) S ← S {∪ d∗} ∀c ∈ C, U(c | q) ← (1 − V (d ∗ | q, c))U(c | q) D ← D \ {d∗}end while
Marginal Utility
U(c | q): conditional prob of intent c given query qg(d | q, c): current prob of d satisfying q, c