MULTIPLE INTENTS RE-RANKING By: Yossi Azar, Iftah Gamzu, Xiaoxin Yin pp. 669-678, in Proceedings of STOC 2009 Presented By: Bhawana Goel
Mar 22, 2016
MULTIPLE INTENTS RE-RANKING
By:Yossi Azar, Iftah Gamzu, Xiaoxin Yinpp. 669-678, in Proceedings of STOC 2009
Presented By: Bhawana
Goel
WEB SEARCH AND RANKING Ranking of search results on the basis of:
Hyperlink structure of the web Content of the web page User’s location Not much research on user’s “intent”
INTENT Same query different intents “computer science at A&M”
Information about computer science department at A&M
Information about admission to computer science department at A&M
INTRO
DU
CTION
PROBLEM STATEMENT 20% of web queries are ambiguous Different user types with different intents Goal is to minimize the average effort of
browsing through the search results Re-rank the web results
OPTIMAL ORDERING?
1 2 3 321
1 1 2 32 3Minimize average effort for all User types
TYPES OF INTENTS Navigational
First result is relevant
Informational All the results are relevant
Complex First and third results are relevant
OVERVIEW Each user type has its own profile vector with
subset of relevant pages <1,0…0> , <0,0…1> , <1,1…1> The elements in vector correspond to positions
and not particular page Order of result pages in vector is irrelevant and
is determined by search engine Depicts intention
Type of query need Depicts proportion of users
<1,0,0> <100,0,0>One user 100 users
CALCULATION OF USER EFFORTNavigational (<1,0,0>)
2 * 1 = 2
Informational (<1,1,1>)2*1 + 4*1 + 5*1 = 11
Complex (<0.4,0.4,0.2>)2*0.4 + 4*0.4 + 5*0.2 = 3.4
1 2 3
2
4
1
9
3
1
2
3
5
4
Profile Vectors
PROBLEM FORMULATION Form a weighted hypergraph
With vertices = web results Hyperedges = user types Weights = user profiles
1 2 3
2
4
1
9
3
1
2
3
5
4
9
4
e2(1,2,3)*<1,0,0> = 1
e1(2,4,5)*<15,20,25> = 235e2
e1Overhead
SPECIAL CASES All user profiles are of type <1,0,…0>
It’s a case of min-sum set cover problem Its NP-hard Has an approximation ratio of 4
A B C F G IC A B
A F C B G IGreedily pick the element which covers the most number of uncovered sets.
SPECIAL CASES All user profiles are of type <0,0,…1>
It’s a case of minimum-latency set cover problem Its NP-hard Has e-approximation algorithm
CASE 1: NON-INCREASING WEIGHT VECTORS Non-increasing weight vectors
Generalization for min-sum set cover problem Greedy weight reduction algorithm Approximation ratio of 4
A B C D
E F G
(4,1,0)(3,0)
(2,2,0)
A
A F
GREEDY ALGORITHM IN GENERAL CASE Greedy weight reduction algorithm does not
work in the general case Approximation ratio is unbounded
OPT = k2
2w + (3+4…k+2)
ALG = k3
(1+2…k) + (k+2)w
k x <1,0>
w = k2
<0,w>
CASE 2: ARBITRARY WEIGHT VECTORSHARMONIC INTERPOLATION ALGORITHM Greedy algorithm takes only local maxima
into account Apply greedy algorithm on harmonically
interpolated weight vectors It provides knowledge about future weight
reduction potentials of hyperedges
ALG = 2w/2 + (3+4…k+2)
k x <1,0> <w/2,w>
HARMONIC INTERPOLATION
1, , ) (( )1
) (r
jr i
j i
ww w e
jw e w
i
Algorithm Phase I:1. Calculate harmonic interpolation for weight vectors for all e
e E
Algorithm Phase II:2. Calculate the weight of each vertex according to changed weight vectors3. Select vertex with maximum weight
(GREEDY WEIGHT REDUCTION ALGORITHM)
ANALYSIS OF HARMONIC INTERPOLATION ALGORITHM Use indicator vectors :<0,0,…w…0,0>
Only one entry is non-zero Harmonic interpolation : <w/j,…w/2,w,…0> Notations
(e,i): a potential pair w(e,i): weight of the potential pair let t be the time when (e,i) is covered Penalty of a step = remaining harmonic
weight/weight covered have to minimize:
∑t=1 ∑(e,i) w(e,i) × t
OPTIMAL SOLUTION HISTOGRAMCreate a histogram with no of columns = number of potential pairs, width of a column = w(e,i) and height of the column = t(e,i)
potential pairs
Its monotonically increasing
Time
HISTOGRAM FOR ALGORITHMIC SOLUTION
Its not monotonic
Histogram with no of columns = number of potential pairs, width of a column = ŵ(e,i) and height of the column = penalty of the step
APPROXIMATION RATIOo Reduce width of ALG by 2Hr and height by 2o The new histogram completely fits inside
optimal solution histogramo ALG/4Hr >= OPT
ALG/4
CONCLUSION O(log r) solution is general case using
harmonic interpolation and greedy algorithms
Intents for all user types taken care of Better solution exists :
In general case, randomized 485-approximation algorithm by Nikhil Bansal et. al.
Based on stricter LP relaxation Randomized rounding