YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

http://ir.ii.uam.es

Explicit Relevance Models in Intent-Aware IR Diversification

35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Saúl Vargas, Pablo Castells and David Vallet Universidad Autónoma de Madrid

http://ir.ii.uam.es

Portland, OR, 13 August 2012

Page 2: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Outline

Context: IR diversification formulation and algorithms

Proposed approach: relevance-based reformulation

of diversification algorithms

Experiments

Adjustable tolerance to redundancy

Conclusion

Page 3: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Brief recap

Appliance

Golf

Chemical element

Nutrition / Health

Mining / Metallurgy

Page 4: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Brief recap

Appliance

Golf

Chemical element

Nutrition / Health

Mining / Metallurgy

Diversity as a means to address uncertainty in user queries

– The same query may have different intents or aspects in the information need underneath

Revision of document relevance independence

– Marginal utility of additional relevant documents decreases fast

Trade diminishing marginal utility for increased intent coverage

– Thus maximize the number of users who obtain at least some useful document

Page 5: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversification – Problem statement

Given a query 𝑞 on a collection

Find 𝑆 ⊂ of given size maximizing:

𝑝 some 𝑑 ∈ 𝑆 relevant 𝑞

Agrawal 2009, Santos 2010, Chen 2006, …

𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant 𝑞

Greedy approx

NP-hard

arg max𝑑∈𝑅−𝑆

𝝋 𝒅, 𝑺 𝒒

𝑆 Diversified ranking

𝑅 − 𝑆 Baseline ranking 𝑝(𝑑|𝑞)

Page 6: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝒛 𝑞 𝑝 𝒛 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝒛 𝑑′ 𝑝 𝑑 𝑞

𝑑′∈𝑆𝑧

xQuAD scheme (Santos 2010)

𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞

= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝒛 𝑞 𝑝 𝑑 𝑞, 𝒛 1− 𝑝 𝑑′ 𝑞, 𝒛

𝑑′∈𝑆𝑧

Explicit query aspects

Explicit query aspects

State of the art aspect-based approaches

Page 7: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞

𝑑′∈𝑆𝑧

xQuAD scheme (Santos 2010)

𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞

= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1− 𝑝 𝑑′ 𝑞, 𝑧

𝑑′∈𝑆𝑧

Query aspect coverage

State of the art aspect-based approaches

Page 8: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞

𝑑′∈𝑆𝑧

xQuAD scheme (Santos 2010)

𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞

= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1− 𝑝 𝑑′ 𝑞, 𝑧

𝑑′∈𝑆𝑧

Document “relevance” for query aspect

State of the art aspect-based approaches

Page 9: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

Redundancy penalization

State of the art aspect-based approaches

IA-Select scheme (Agrawal 2009)

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞

𝑑′∈𝑆𝑧

xQuAD scheme (Santos 2010)

𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞

= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1− 𝑝 𝑑′ 𝑞, 𝑧

𝑑′∈𝑆𝑧

Page 10: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞

𝑑′∈𝑆𝑧

xQuAD scheme (Santos 2010)

𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞

= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1− 𝑝 𝑑′ 𝑞, 𝑧

𝑑′∈𝑆𝑧

Mixture with baseline

State of the art aspect-based approaches

𝜆 Degree of diversification

Page 11: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑧 𝑑 𝑝 𝑑 𝑞 1− 𝑝 𝑧 𝑑′ 𝑝 𝑑 𝑞

𝑑′∈𝑆𝑧

xQuAD scheme (Santos 2010)

𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑑,¬ 𝑆 𝑞

= 1 − 𝜆 𝑝 𝑑 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑑 𝑞, 𝑧 1− 𝑝 𝑑′ 𝑞, 𝑧

𝑑′∈𝑆𝑧

Probability to observe documents

𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞

Page 12: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Relevance-based instantiation of objective function

IA-Select scheme – relevance-based

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1− 𝑝 𝒓 𝑑′, 𝑞, 𝑧

𝑑′∈𝑆𝑧

xQuAD scheme – relevance-based

𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓𝑑 𝑞 + 𝜆 𝑝 𝒓𝑑 , ¬ 𝒓𝑆 𝑞

= 1 − 𝜆 𝑝 𝒓 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝒓 𝑑, 𝑞, 𝑧 1− 𝑝 𝒓 𝑑′, 𝑞, 𝑧

𝑑′∈𝑆𝑧

Probability of relevance

Our proposal

𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞

Page 13: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Relevance-based instantiation of objective function

IA-Select scheme – relevance-based

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1− 𝑝 𝑟 𝑑′, 𝑞, 𝑧

𝑑′∈𝑆𝑧

xQuAD scheme – relevance-based

𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝒓𝑑 𝑞 + 𝜆 𝑝 𝒓𝑑 , ¬ 𝒓𝑆 𝑞

= 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1− 𝑝 𝑟 𝑑′, 𝑞, 𝑧

𝑑′∈𝑆𝑧

More literal interpretation of initial problem statement

𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝑞

Page 14: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Relevance-based instantiation of objective function

IA-Select scheme – relevance-based

𝜑 𝑑, 𝑆 𝑞 = 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1− 𝑝 𝑟 𝑑′, 𝑞, 𝑧

𝑑′∈𝑆𝑧

xQuAD scheme – relevance-based

𝜑 𝑑, 𝑆 𝑞 = 1 − 𝜆 𝑝 𝑟𝑑 𝑞 + 𝜆 𝑝 𝑟𝑑 , ¬ 𝑟𝑆 𝑞

= 1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1− 𝑝 𝑟 𝑑′, 𝑞, 𝑧

𝑑′∈𝑆𝑧

Equivalent for 𝜆 = 1

𝜑 𝑑, 𝑆 𝑞 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant 𝑞

Page 15: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance distirbution vs. document distribution

𝑑 0

1

𝑝 𝑟 𝑑, 𝑞, 𝑧𝑑

= E nr relevant docs ≥ 1

1 − 𝜆 𝑝 𝑟 𝑑, 𝑞 + 𝜆 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑞, 𝑧 1− 𝑝 𝑟 𝑑′, 𝑞, 𝑧

𝑑′∈𝑆𝑧

𝑝 𝑑 𝑞, 𝑧𝑑

= 1

Different potential behavior E.g. stronger redundancy penalization

𝑝 𝑟 𝑑,· vs. 𝑝 𝑑 · – The difference does matter (in this context)

Potential rank equivalences do not apply here

Page 16: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance-based greedy diversification

Relevance-based reformulation of diversification algorithm

1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧

2. Does it work? Test empirically

3. Further development: parameterized tolerance to redundancy

Page 17: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Aspect-based relevance model

Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛

Cannot use odds, logs, constant removal… or any other rank-preserving step

(we need the specific values)

𝑝 𝑟 𝑑, 𝑞

𝑝 𝑟 𝑑, 𝑞, 𝑧

𝑝 𝑧 𝑑

𝑝 𝑧 𝑞

𝑝 𝑑 𝑞

𝑝(𝑧)

Normalized baseline IR system score (as in e.g. Bache 2009)

Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 𝑞 depending

on available observations:

• 𝑧 as document classes (e.g. ODP)

• 𝑧 as subqueries (e.g. reformulations)

Then derive the other two parameters

Positional relevance 𝑝 𝑟 rank 𝑑, 𝑞

Page 18: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Positional relevance distribution estimate

𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 𝑟 rank 𝑑, 𝑞 = 𝒑 𝒓 𝒌

1E-05

1E-04

1E-03

1E-02

1E-01

1E+00

0 20 40 60 80 100 120 140 160 180 200

p(r

|k)

k

pLSA

Lemur

AOL

Click log statistics

Precision estimates

𝑝 𝑟 𝑘

𝑘

Page 19: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance-based greedy diversification

Relevance-based reformulation of diversification algorithm

1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧

2. Does it work? Test empirically

3. Further development: parameterized tolerance to redundancy

Page 20: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments

Collection: ClueWeb09 category B (50M documents)

Query/subtopic set: TREC 2009/10 diversity task (100 queries)

Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100

Query aspect space:

a) ODP categories level 4 (~7K categories)

b) TREC subtopics (oracle for reference)

Specific parameter estimates:

𝑝 𝑧 𝑞 Uniform

𝑝 𝑧 𝑑

𝑝 𝑟 𝑘

Search diversity

ODP categories: semi-supervised text classification by Textwise

TREC subtopics: Indri search system run on 𝑧 as if a query

i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation)

ii. Click statistics from AOL log (thus different IR system)

Page 21: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments – Search diversity on TREC

ERR

-IA

Based on 𝑝 𝑑 𝑞, 𝑧

Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧

ERR

-IA

λ

ODP categories TREC subtopics

λ

xQuAD scheme

𝑝 𝑟 𝑘 from qrels

Page 22: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments – Search diversity on TREC

-nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20

Lemur - 0.2587 0.1630 0.2396 0.4636

a) O

DP

ca

tego

rie

s IA-Select - 0.2651 0.1681 0.2423 0.4483

xQuAD 0.9 0.2675 0.1656 0.2451 0.4864

Rel-based xQuAD

i. Qrels 0.1 0.2858△▲ 0.1828△▲ 0.2655△▲ 0.4898▲△

ii. Clicks 0.4 0.2841▲△ 0.1831△△ 0.2605△▲ 0.4830▲▽

b)

TR

EC

sub

top

ics IA-Select - 0.3541 0.2346 0.3213 0.5787

xQuAD 1.0 0.3445 0.2241 0.3127 0.5704

Rel-based xQuAD

i. Qrels 1.0 0.3543△△ 0.2349△△ 0.3192▽△ 0.5782▽△

ii. Clicks 1.0 0.3512▽△ 0.2320▽△ 0.3166▽△ 0.5748▽△

“informally” maximizing ERR-IA by 0.1 steps for each diversifier

Best value in bold green

▲ ▼ 𝑝 < 0.05

Page 23: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments

Dataset 1: MovieLens 1M

Dataset 2: Last.fm crawl

Adaptation of IR diversity paradigm

(Vargas, Castells & Vallet SIGIR 2011)

Baseline rankings: Diversified top n: 100

Specific parameter estimates:

𝑝 𝑧 𝑞 Uniform

𝑝 𝑧 𝑑 Uniform on 𝑑 (based on binary aspect/item association)

𝑝 𝑟 𝑘 P@k estimates with 2-fold cross-validation on test users

Recommendation diversity

Queries users Documents items (movies, music artists) Subtopics item features (genres, tags) Relevance judgments test ratings from data split

Collection: 6K users, 4K movies, 1M ratings

Subtopic set: 10 movie genres

Collection: 1K users, 175K artists, 20M playcounts

Subtopic set: 120K social tags on artists by Last.fm users

a) pLSA

b) Popularity-based recommendation

Page 24: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments – Recommendation diversity on MovieLens and Last.fm

λ

MovieLens 1M

ERR

-IA

Last.fm

λ

pLS

A r

eco

mm

en

der

R

eco

mm

end

atio

n

by

item

po

pu

lari

ty

ERR

-IA

Based on 𝑝 𝑑 𝑞, 𝑧

Based on 𝑝 𝑟 𝑑, 𝑞, 𝑧

Page 25: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance-based greedy diversification

Relevance-based reformulation of diversification algorithm

1. Need to estimate 𝑝 𝑟 𝑑, 𝑞, 𝑧

2. Does it work? Test empirically

3. Further development: parameterized tolerance to redundancy

Page 26: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Adjustable tolerance to redundancy

Generalization of relevance-based diversification scheme

Formally support adjustable redundancy penalization

Approach: generalize relevance to browsing model

𝜑 𝑑, 𝑆 𝑞 = 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑟𝑑 , ¬ 𝒔𝒕𝒐𝒑𝑆 𝑞 = ⋯

= 1 − λ 𝑝 𝑟 𝑑, 𝑞 + λ 𝑝 𝑧 𝑞 𝑝 𝑟 𝑑, 𝑧, 𝑞 1− 𝑝 𝑟 𝑑′, 𝑧, 𝑞 𝒑 𝒔𝒕𝒐𝒑 𝒓

𝑑′∈𝑆𝑐

Adjustable redundancy tolerance parameter 𝑝 𝑠𝑡𝑜𝑝 𝑟 ∈ [0,1]

– High 𝑝 𝑠𝑡𝑜𝑝 𝑟 for aggresive penalization, low for e.g. high-recall searches

– In this view, original formulations would implicitly assume 𝑝 𝑠𝑡𝑜𝑝 𝑟 = 1,

i.e. a single relevant document is sought

Tolerance to redundancy

Page 27: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Empirical observation: 𝑝 𝑠𝑡𝑜𝑝 𝑟 vs. in -nDCG

Adjustable tolerance to redundancy

𝑝𝑠𝑡𝑜𝑝𝑟

𝑝𝑠𝑡𝑜𝑝𝑟

Search task Lemur on TREC / Subtopics

Recommendation task pLSA on MovieLens / Genres

0 0 1 1

1 1

best -nDCG value of column

worst -nDCG value of column For each

Page 28: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Conclusion

Alternative, relevance-based formulation of greedy aspect-based diversification

– Unifies two previous aspect-based algorithms

– More literal expression of formal problem statement (and metrics?)

𝑝 𝑟 𝑑, 𝑞, 𝑧 vs. 𝑝 𝑑 𝑞, 𝑧

– Literal value estimates needed (rather than rank-equivalent approximations)

– Estimate based on positional relevance (relevance or click data needed)

Seems to perform well empirically

– Light requirements on relevance or click data for training positional relevance

– Improvement trend, but needs to be tested under further optimizations

Formal support for redundancy tolerance adjustment


Related Documents