Top Banner
 Web Search Result Diversification Farhan Ahmad Apr 2011
70

Web Search Result Diversification

Apr 08, 2018

Download

Documents

farhanhubble
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 1/70

 

Web Search Result

Diversification

Farhan Ahmad

Apr 2011

Page 2: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 2/70

 

Original Paper 

Diversifying Search Results

Rakesh AgrawalSearch Labs

Microsoft Research

Sreenivas GollapudiSearch Labs

Microsoft Research

Alan HalversonSearch Labs

Microsoft Research

Samuel IeongSearch Labs

Microsoft Research

Second ACM International Conference on Web Search and Data MiningWSDM 2009

Barcelona, Spain - February 9-12, 2009

Page 3: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 3/70

 

Abstract

Query terms given by users are oftenambiguous.

Page 4: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 4/70

 

Abstract

Query terms given by users are oftenambiguous.

Search engines should diversify thesearch results to minimize the risk of dissatisfaction of average users.

Page 5: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 5/70

 

Abstract

Query terms given by users are oftenambiguous.

Search engines should diversify thesearch results to minimize the risk of dissatisfaction of average users.

The authors have presented a systematicapproach for measuring diversity of search

results.

Page 6: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 6/70

 

Abstract

Query terms given by users are oftenambiguous.

Search engines should diversify thesearch results to minimize the risk of dissatisfaction of average users.

The authors have presented a systematicapproach for measuring diversity of search

results. They have presented an algorithm to

maximize the diversity of a subset of thesearch results.

Page 7: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 7/70

 

Introduction: Ambiguousqueries

Consider the search term 'FLASH'

Page 8: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 8/70

 

Introduction: Ambiguousqueries

Consider the search term 'FLASH'

It can have several interpretations-

− Flash player − Flash floods

− Flash Gordon (an adventure hero)

Page 9: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 9/70

 

Introduction: Ambiguousqueries

Suppose Flash player is searched mostoften.

Most of the top results returned for thequery 'FLASH' will belong to this category.

This is because search engines rank onthe basis of similarity, and make no explicit

attempt to diversify the documents.

Page 10: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 10/70

 

Introduction:Relevance of search results

The basic premise is “ The relevance of aset of documents depends not only on theindividual relevance of its members, but

also on how they relate to each other.”Jaime G. Carbonell and Jade Goldstein. The use of MMR, diversity-based re-ranking for 

reordering documents and producing summaries.

Ideally the result set should properlyaccount for the interests of the overall user population.

Page 11: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 11/70

 

Introduction: BasicAssumption #1

A taxonomy of information exists at thetopical level.

A document can belong to one or more

categories, and so can a query.

Page 12: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 12/70

 

Introduction: BasicAssumption #2

Usage statistics are available for user intents.

Example: When searching for 'FLASH',

65% users intended to find 'Flash Player',15% were looking for 'Recent flash floods'and 5% were looking for 'Flash Gordon'.

Page 13: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 13/70

 

Introduction: Defining theobjective

To maximize the relevance of a resultdocument set based on individualrelevance of the members and their 

diversity.

Page 14: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 14/70

 

Formalization of Notation

The set of categories to which a documentd belongs is denoted by C(d).

The set of categories to which a querybelongs is denoted by C(q).

Page 15: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 15/70

 

Formalization of Notation

Example:

− C(q='FLASH') = { 'flash floods','Flashplayer','Flash Gordon') }

− C(d='FLASH') = { 'flash floods','Flashplayer', 'Flash Gordon', 'Flash village' }

Page 16: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 16/70

 

Formalization of Notation

C(d) ∩C(q) may be empty.

Page 17: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 17/70

 

Formalization of Notation

The probability of a given query qbelonging to a category c is denoted byP(c|q).

It is called user intent for query q andcategory c.

Page 18: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 18/70

 

Formalization of Notation

Assumption: Our knowledge is complete.cƐC(q), Ʃ P(c|q) = 1

Informally this means that given a queryq, we have an exhaustive list of all thecategories to which the query couldbelong.

Page 19: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 19/70

 

Formalization of Notation

V(d|q,c) is defined as the relevance valueof document d for query q, when theintended category of q is c.

Page 20: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 20/70

 

Formalization of Notation

V(d|q,c) is defined as the relevance valueof document d for query q, when theintended category is c.

If we constrain V to [0,1], it represents the

probability of document d satisfying user query q that has intended category c.

Page 21: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 21/70

 

Formalization of Notation

V can be obtained by multiplying query-document similarity by the probability thatthe document d belongs to category c.

− V(d|q,c) = Similarity(d,q) * P(c|d)

− Where P(c|d) can be computed by theclassifier algorithm . E.g. a confidencevalue.

Page 22: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 22/70

 

Formalization of Notation

Assumption: Given a query q and acategory of intent c, the relevance of twodocuments is independent

− V(d1|q,c) , V(d

2|q,c) are independent.

Page 23: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 23/70

 

Formalizing the objective fn

Suppose users only consider the top kresults of a search engine.

 –  We can rephrase the objective :

 – 

As:“To maximize the relevance of a result document

set based on individual relevance of themembers and their diversity”.

The objective is to maximize theprobability that an average user finds at-least one relevant resultamong the top k.

 

Page 24: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 24/70

 

Formalizing the objective fn

Formally, given− A query q,

− A set of documents D

A distribution of category of intent P(c|q),− The relevance of each document d ε D

V(d|q,c),

− We want to find the set S of top k results

(|S| = k), S⊆ D ,such that

Page 25: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 25/70

 

Formalizing the objective fn

P(S|q)= ∑

cP(c|q) . (1- π

d S∈(1-V(d|q,c) ) )

is maximized

Page 26: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 26/70

 

Origin of the objective function

d S∈ V(d|q,c) is the probability that adocument d from our result subset Ssatisfies a user query q having intendedcategory c.

Page 27: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 27/70

 

Origin of the objective function

d S∈ 1-V(d|q,c) is the probability that adocument d, from our result subset S,does not satisfy a user query q, havingintended category c.

Page 28: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 28/70

 

Origin of the objective function

πd S∈ (1-V(d|q,c) ) is the probability that nodocument d, from our result subset S,satisfies user query q, having intendedcategory c.

Page 29: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 29/70

 

Origin of the objective function

1- πd S∈ (1-V(d|q,c) ) is the probabilitythat at least one document d, from our result subset S, satisfies user query q,having intended category c.

Page 30: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 30/70

 

Origin of the objective function

Therefore,P(c|q) . (1- π

d S∈(1-V(d|q,c) ) )

gives the probability that query q has

intended category c, and it is satisfied byat least one document from our resultsubset S.

Page 31: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 31/70

 

Origin of the objective function

If we sum upP(c|q) . (1- π

d S∈(1-V(d|q,c) ) )

for different categories {c1,c

2,...,c

r },

we find the probability that a user querybelonging to any of these categories issatisfied by at least one document from

our result subset S.

Page 32: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 32/70

 

Origin of the objective function

Therefore by definingP(S|q) = ∑

cP(c|q) . (1- π

d S∈(1-V(d|q,c) ) ),

and trying to maximize P(S|q), we are trying

to maximize the chances that an averageuser is satisfied.

That is P(S|q) is measuring the diversity of result subset S for a query q.

Page 33: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 33/70

 

Formalizing the problem

Given a result set D, find a set S D, |S|⊆=k, whose diversity

P(S|q) = ∑c

P(c|q) . (1- πd S∈

(1-V(d|q,c) ) )

is maximum of all such possible S

Formally written as Diversify(k)

Page 34: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 34/70

 

Caveat

Diversify(k) does not try to cover all thecategories

− While trying to maximize

∑c P(c|q) . (1- πd S∈ (1-V(d|q,c) ) ),− And maintain |S| = k

We might need to exclude all documents

from some category cr .

Page 35: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 35/70

 

Caveats

It might be that by taking all k documentsfrom only a single category c3,

we are able

to maximize P(S|q) !!

All other categories are left out in such acase.

Page 36: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 36/70

 

Problems with Diversify(k)

Diversify(k) does not say anything aboutthe ordering of the result subset S.

Diversify(k) is NP hard (it reduces to theproblem of finding the max coverage of 

result set D).

A greedy algorithm for

Page 37: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 37/70

 

A greedy algorithm for Diversify(k)

The authors have proposed a greedyalgorithm for Diversify(k) that uses theconcept of marginal utility to diversify aswell as re-rank the search results.

This algorithm maximizes P(S|q) whenevery document can belong to just onecategory, and otherwise it optimizes P(S|q)with bounded error.

Page 38: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 38/70

 

Notation

U(c|q,S) is the probability that a query qbelongs to the category c, given that alldocuments in the set S fail to satisfy theuser.

− Initially S= ∅

− And we define U(c|q, ) = P(c|q)∅

Page 39: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 39/70

 

Notation

We define the marginal utility of adocument d as the product of its relevancevalue V with the conditional distribution of categories U

g(d|q,c,S) = ∑c C(d)∈ U(c|q,S) . V(d|q,c)

− It is the probability that document dsatisfies the user when all documentsthat come before it fail to do so.

Page 40: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 40/70

 

Greedy algo IA-Select Inputs: k,q,C(q),D,C(d),P(c|q),V(d|c,q)

Output S D, |S|=k⊆

1: S=∅

2: c, U(c|q,S) = P(c|q)∀

3: WHILE |S| < k

4: FOR d D do∈

5: g(d|q,c,S) = ∑c C(d)∈

U(c|q,S) . V(d|q,c)

6: ENDFOR7: d* = argmax( g(d|q,c,S) )

8: c C(d*), U(c|q,S) = (1 – V(d*|q,c) ) . U(c|q,S)∀ ∈

9: S = S d*∪

G S

Page 41: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 41/70

 

Greedy algo IA-Select10: D = D – {d*}

11:ENDWHILE

12:RETURN S

P f

Page 42: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 42/70

 

Proofs

Earlier we claimed that –  IA-select maximizes P(S|q), the diversity if 

every document belongs to exactly onecategory.

 –  IA-select optimizes P(S|q), with boundederror, otherwise.

B i f f

Page 43: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 43/70

 

Basis for proofs

To prove our claims, we need tounderstand the concept of submodularity.

• We first define submodularity.

• Then prove that P(S|q) is submodular 

• Then prove our claims.

S b d l it

Page 44: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 44/70

 

Submodularity

It is known as the principle of diminishingmarginal utilities in economics.

• In our context, the marginal benefit of adding a document to a larger 

collection is less than that of adding thesame document to a smaller collection.

• Formal definition follows.

S b d l it

Page 45: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 45/70

 

Submodularity

If N is a set and f is a set functionf: 2N ==> R

then f is submodular if and only if 

for S T N⊆ ⊆and d N and d S , d T∊ ∉ ∉

f(S U {d}) – f(S) f(T U {d}) - f(T)≧

Submodularity

Page 46: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 46/70

 

Submodularity

We have chosen two subsets of thedomain N, S and T. S is smaller than Twhich in turn is smaller than N.

• Then for a new element d in N

which has not yet been added to either S or T.

Submodularity

Page 47: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 47/70

 

Submodularity

We evaluate the change in values of f due toaddition of d, for S and T both.

• f(S U {d}) – f(S) is the marginal utilitygained from adding d to the smaller set

S.• f(T U {d}) – f(T) is the marginal utility

gained from adding d to the larger set T.

Submodularity

Page 48: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 48/70

 

Submodularity

If the inequalityf(S U {d}) – f(S) f(T U {d}) – f(T) holds,≧

f is said to be submodular.

P(S|q) is submodular

Page 49: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 49/70

 

P(S|q) is submodular 

Let S,TS T D⊆ ⊆

be two sets of documents.

• And e D be a document such that∊e T∉

• Let S' = S {e} and T' = T {e}⋃ ⋃

P(S|q) is submodular

Page 50: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 50/70

 

P(S|q) is submodular 

• P(S'|q) – P(S|q)

= P(c|q) . [ (1- πd S'∈

(1-V(d|q,c) ) ) - (1-

πd S∈

(1-V(d|q,c) ) ) ]

= P(c|q) . [ πd S∈

(1-V(d|q,c) - πd S'∈

(1-V(d|

q,c) )]

P(S|q) is submodular

Page 51: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 51/70

 

P(S|q) is submodular 

= P(c|q) . [ πd S∈  (1-V(d|q,c) ) - πd S∈  (1-

V(d|q,c) ) . (1-V(e|q,c)) ]

=P(c|q) . (πd S∈

 (1-V(d|q,c) ) ) . V(e|q,c) 

P(S|q) is submodular

Page 52: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 52/70

 

P(S|q) is submodular 

• Similarly

• P(T'|q) – P(T|q)

P(c|q) . (πd T∈

 (1-V(d|q,c) ) ) . V(e|q,c) 

P(S|q) is submodular

Page 53: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 53/70

 

P(S|q) is submodular 

• Because |T| |S|≧

(πd S∈

 (1-V(d|q,c) ) ) (≧ πd T∈

 (1-V(d|q,c) ) )

since both the sides are products of 

fractions.And RH product contains all the fractions in

LH product in addition to its own factors.

P(S|q) is submodular

Page 54: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 54/70

 

P(S|q) is submodular 

• Therefore

P(S'|q) – P(S|q) P(T'|q) – P(T|q)≧

• Hence P(S|q) is submodular.

Proof #1

Page 55: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 55/70

 

Proof #1

• Formally, we have to prove that

- IA-select maximizes P(S|q) if 

∀d D, |C(d)|=1∈

Proof #1

Page 56: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 56/70

 

Proof #1

• In this case step 5 in our algo becomes

 –  g(d|q,c,S) = U(c|q,S) . V(d|q,c)

instead of 

g(d|q,c,S) = ∑c C(d)∈

U(c|q,S) . V(d|q,c) 

Proof #1

Page 57: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 57/70

 

Proof #1

• Step 8 in our algo becomes

 –  c=C(d), U(c|q,S) = (1 – V(d*|q,c) ) . U(c|q,S)

instead of 

  ∀c C(d), U(c|q,S) = (1 – V(d*|q,c) ) . U(c|q,S)∈

Proof #1

Page 58: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 58/70

 

Proof #1

• Because U(c|q,S) = P(c|q) at the outset,

as documents { d1, d

2,..} get added to S,

in step 8 U is updated as

 –   c=C(d1),

U(c|q,S) = (1 – V(d1|q,c) ) . U(c|q, )∅

= (1 – V(d1|q,c) ) . P(c|q)

 –  c=C(d2),

U(c|q,S) = (1 – V(d2|q,c) ) . (1 – V(d

1|q,c) ) . U(c|q, )∅

= (1 – V(d2|q,c) ) .(1 – V(d

1|q,c) ) . P(c|q)

Proof #1

Page 59: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 59/70

 

Proof #1

• Therefore in the kth iteration of IA-select, insteps 4 and 5, g is updated as

 

FOR d D do∈  

g(d|q,c,S) = (πd' S∊

(1 – V(d'|q,c) ) . P(c|q)) . V(d|q,c)

= P(c|q) . (πd' S∊(1 – V(d'|q,c) ) . V(d|q,c)

Proof #1

Page 60: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 60/70

 

Proof #1

• From the result for submodularity of P(S|q) we know that

for every document d

∑c

g(d|q,c,S) = P( S {d} | q) – P(S|q)∪

 –  Because g(d|q,c,S) is non-zero for exactly one category c, the sigma

can be removed

g(d|q,c,S) = P( S {d} | q) – P(S|q)∪

Proof #1

Page 61: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 61/70

 

Proof #1

• At the start S = . When we have added∅

required k documents to S

g(d1|q,c, )∅ = P( {d

1} | q) – P( |q)∅

g(d2|q,c,{d

1}) = P( {d

2,d

1} | q ) - P({d

1} | q)

. . .

. . .

. . .

g(dk|q,c,{d1,d2...dk-1} ) = P(S | q) – P({d1,d2...dk-1} | q)

Proof #1

Page 62: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 62/70

 

• The sum is

g(d1|q,c, ) + g(d∅

2|q,c,{d

1}) + ......... + g(d

k|q,c,{d

1,d

2...d

k-1} )

= P(S|q) – P( |q)∅

=P(S|q)

Page 63: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 63/70

Proof #2

Page 64: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 64/70

 

• Formally we have to prove that if documents can belong to manycategories, their selection based on g(d|q,c,S) still optimizes P(S|q), with an error that is bounded.

Page 65: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 65/70

Proof #2

Page 66: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 66/70

 

• Since our objective P(S|q) is submodular,and our algorithm IA-Select() is greedy inthe sense mentioned before,

 –  IA-Select is a (1-1/e) approximation to

Diversify(k). –  That is, IA-select optimizes P(S|q) and the

optimum value is not less than (1-1/e) themaximum value obtainable fromDiversify(K).

Evaluating IA-Select

Page 67: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 67/70

 

g

• To measure the result set diversity of search engines and compare it with thediversity of results obtained using IA-Select, the authors first defined intent-aware counterparts of traditional IRmetrics.

Evaluating IA select

Page 68: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 68/70

 

• One such metric is reciprocal rank(RR) :It is the inverse of the first position atwhich a relevant document is found in alist.

• If there is a rank-threshold T, RR is zeroif no relevant document is found amongthe first T documents.

• The mean reciprocal rank (MRR) of aquery set is the average RR of thequeries in the set.

Evaluating IA-Select

Page 69: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 69/70

 

• Because IA-Select diversifies the top kresults, the authors defined an IA MRR

• For a result set D, rank threshold k

and query q

 –  MRR-IA(D,k) = ∑c

P(c|q) . MRR(D,k|c)

MRR(D,k|c) gives the average RR for aquery set belonging to category c.

Evaluating IA-Select

Page 70: Web Search Result Diversification

8/6/2019 Web Search Result Diversification

http://slidepdf.com/reader/full/web-search-result-diversification 70/70

 

• Shown below are the results obtainedusing three commercials search engineand IA-Select.