Top Banner
CERI 2016, G, S A S R-B L M R S Daniel Valcarce, Javier Parapar, Álvaro Barreiro @dvalcarce @jparapar @AlvaroBarreiroG Information Retrieval Lab @IRLab_UDC University of A Coruña Spain
44

Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Feb 18, 2017

Download

Data & Analytics

Daniel Valcarce
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 2: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Outline

1. Recommender Systems

2. Pseudo-Relevance Feedback

3. Relevance-Based Language Modelling of RecommenderSystems

4. IDF Effect and Additive Smoothing

5. Experiments

6. Conclusions and Future Directions

1/26

Page 3: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

RECOMMENDER SYSTEMS

Page 4: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Recommender Systems

Recommender systems generate personalised suggestions foritems that may be of interest to the users.

Top-N Recommendation: create a ranking of the N mostrelevant items for each user.

Collaborative filtering: exploit only user-item interactions(ratings, clicks, etc.).

3/26

Page 5: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

PSEUDO-RELEVANCE FEEDBACK

Page 6: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Pseudo-Relevance Feedback (I)

In Information Retrieval, Pseudo-Relevance Feedback (PRF) isan automatic query expansion method.

The goal is to expand the original query with new terms toimprove the quality of the search results.

These new terms are extracted automatically from a firstretrieval using the original query.

5/26

Page 7: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Pseudo-Relevance Feedback (II)

Information need

6/26

Page 8: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Pseudo-Relevance Feedback (II)

Information need

query

6/26

Page 9: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

6/26

Page 10: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

6/26

Page 11: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

6/26

Page 12: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

6/26

Page 13: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

QueryExpansion

expandedquery

6/26

Page 14: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Pseudo-Relevance Feedback (II)

Information need

query RetrievalSystem

QueryExpansion

expandedquery

6/26

Page 15: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

RELEVANCE-BASED LANGUAGE MODELLINGOF RECOMMENDER SYSTEMS

Page 16: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Pseudo-Relevance Feedback for Collaborative Filtering

PRF CFUser’s query User’s profile

mostˆ1,populatedˆ2,stateˆ2 Titanicˆ2,Avatarˆ3,Matrixˆ5

Docum

ents

Neigh

bours

Term

s

Items

8/26

Page 17: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Relevance-Based Language Models (RM)

Relevance-Based Language Models or Relevance Models (RM)are a state-of-the-art PRF technique (Lavrenko & Croft, SIGIR2001).

# Two models: RM1 and RM2.

# RM1 works better than RM2 in retrieval.

Relevance Models have been recently adapted to collaborativefiltering (Parapar et al., IPM 2013).

# For recommendation, RM2 is the preferred method.

9/26

Page 18: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Relevance-Based Language Models (RM)

Relevance-Based Language Models or Relevance Models (RM)are a state-of-the-art PRF technique (Lavrenko & Croft, SIGIR2001).

# Two models: RM1 and RM2.

# RM1 works better than RM2 in retrieval.

Relevance Models have been recently adapted to collaborativefiltering (Parapar et al., IPM 2013).

# For recommendation, RM2 is the preferred method.

9/26

Page 19: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Relevance Models for Collaborative Filtering

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

# Iu is the set of items rated by the user u.

# Vu is neighbourhood of the user u. This is computed usinga clustering algorithm.

# p(i) and p(v) are the item and user priors.

# p(i |u) is computed smoothing the maximum likelihoodestimate with the probability in the collection.

10/26

Page 20: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Relevance Models for Collaborative Filtering

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

# Iu is the set of items rated by the user u.

# Vu is neighbourhood of the user u. This is computed usinga clustering algorithm.

# p(i) and p(v) are the item and user priors.

# p(i |u) is computed smoothing the maximum likelihoodestimate with the probability in the collection.

10/26

Page 21: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Relevance Models for Collaborative Filtering

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

# Iu is the set of items rated by the user u.

# Vu is neighbourhood of the user u. This is computedusing a clustering algorithm.

# p(i) and p(v) are the item and user priors.

# p(i |u) is computed smoothing the maximum likelihoodestimate with the probability in the collection.

10/26

Page 22: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Relevance Models for Collaborative Filtering

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

# Iu is the set of items rated by the user u.

# Vu is neighbourhood of the user u. This is computed usinga clustering algorithm.

# p(i) and p(v) are the item and user priors.

# p(i |u) is computed smoothing the maximum likelihoodestimate with the probability in the collection.

10/26

Page 23: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Relevance Models for Collaborative Filtering

RM2 : p(i |Ru) ∝ p(i)∏j∈Iu

∑v∈Vu

p(i |v) p(v)p(i) p( j |v)

# Iu is the set of items rated by the user u.

# Vu is neighbourhood of the user u. This is computed usinga clustering algorithm.

# p(i) and p(v) are the item and user priors.

# p(i |u) is computed smoothing the maximum likelihoodestimate with the probability in the collection.

10/26

Page 24: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Collection-based Smoothing Techniques (I)

Absolute Discounting (AD)

pδ(i |u) � max(ru ,i − δ, 0) + δ |Iu | p(i |C)∑j∈Iu ru , j

Jelinek-Mercer (JM)

pλ(i |u) � (1 − λ) ru ,i∑j∈Iu ru , j

+ λ p(i |C)

Dirichlet Priors (DP)

pµ(i |u) � ru ,i + µ p(i |C)µ +∑

j∈Iu ru , j

11/26

Page 25: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Collection-based Smoothing Techniques (II)

Absolute Discounting, Jelinek-Mercer and Dirichlet Priors havebeen studied in the context of:

# Text Retrieval (Zhai & Lafferty, ACM TOIS 2004)

◦ Absolute Discounting performs very poorly.◦ Dirichlet Priors is the most popular approach.◦ Jelinek-Mercer is a bit better for long queries.

# Collaborative Filtering (Valcarce et al., ECIR 2015)

◦ Absolute Discounting is the best smoothing method.

Can we do better?

12/26

Page 26: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Collection-based Smoothing Techniques (II)

Absolute Discounting, Jelinek-Mercer and Dirichlet Priors havebeen studied in the context of:

# Text Retrieval (Zhai & Lafferty, ACM TOIS 2004)◦ Absolute Discounting performs very poorly.◦ Dirichlet Priors is the most popular approach.◦ Jelinek-Mercer is a bit better for long queries.

# Collaborative Filtering (Valcarce et al., ECIR 2015)◦ Absolute Discounting is the best smoothing method.

Can we do better?

12/26

Page 27: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Collection-based Smoothing Techniques (II)

Absolute Discounting, Jelinek-Mercer and Dirichlet Priors havebeen studied in the context of:

# Text Retrieval (Zhai & Lafferty, ACM TOIS 2004)◦ Absolute Discounting performs very poorly.◦ Dirichlet Priors is the most popular approach.◦ Jelinek-Mercer is a bit better for long queries.

# Collaborative Filtering (Valcarce et al., ECIR 2015)◦ Absolute Discounting is the best smoothing method.

Can we do better?

12/26

Page 28: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

IDF EFFECT AND ADDITIVE SMOOTHING

Page 29: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Axiomatic Analysis of the IDF Effect in IR

A recent work performed an axiomatic analysis of several PRFmethods (Hazimeh & Zhai, ICTIR 2015).

# They found out that RM1 with Dirichlet Priors andJelinek-Mercer smoothing methods demote the IDF effect.

# The IDF effect is a desirable property that, intuitively,promotes documents with very specific terms.

Can we use this result in recommendation?

What is the IDF effect in recommendation? Is it a desirableproperty?

They studied RM1, what about RM2?

14/26

Page 30: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Axiomatic Analysis of the IDF Effect in IR

A recent work performed an axiomatic analysis of several PRFmethods (Hazimeh & Zhai, ICTIR 2015).

# They found out that RM1 with Dirichlet Priors andJelinek-Mercer smoothing methods demote the IDF effect.

# The IDF effect is a desirable property that, intuitively,promotes documents with very specific terms.

Can we use this result in recommendation?

What is the IDF effect in recommendation? Is it a desirableproperty?

They studied RM1, what about RM2?

14/26

Page 31: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

The IDF Effect in Recommendation (I)

This retrieval idea is related to the novelty in recommendation.

Definition (IDF effect)

A recommender system supports the IDF effect if p(i1 |Ru) >p(i2 |Ru) when

# two items i1 and i2# have the same ratings r(v , i1) � r(v , i2) for all v ∈ Vu

# and different popularity p(i1 |C) < p(i2 |C)

In simply words, if we have the same feedback for two items,we should recommend the least popular one.

15/26

Page 32: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

The IDF Effect in Recommendation (II)

We performed an axiomatic analysis of RM21 using thefollowing smoothing methods:

# Dirichlet Priors

# Jelinek-Mercer

# Absolute Discounting

Additive Smoothing

pγ(i |u) � r(u , i) + γ∑j∈Iu r(u , j) + γ|I|

1Math proofs in the paper!

16/26

Page 33: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

The IDF Effect in Recommendation (II)

We performed an axiomatic analysis of RM21 using thefollowing smoothing methods:

# Dirichlet Priors

# Jelinek-Mercer

# Absolute Discounting

Additive Smoothing

pγ(i |u) � r(u , i) + γ∑j∈Iu r(u , j) + γ|I|

1Math proofs in the paper!

16/26

Page 34: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

The IDF Effect in Recommendation (II)

We performed an axiomatic analysis of RM21 using thefollowing smoothing methods:

# Dirichlet Priors

# Jelinek-Mercer

# Absolute Discounting

Additive Smoothing

pγ(i |u) � r(u , i) + γ∑j∈Iu r(u , j) + γ|I|

1Math proofs in the paper!

16/26

Page 35: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

EXPERIMENTS

Page 36: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Experimental settings

Datasets:

# Movielens 100k

# Movielens 1M

Metrics:

# Ranking accuracy: nDCG.

# Diversity: the complement of the Gini index.

# Novelty: mean self-information (MSI).

18/26

Page 37: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Ranking accuracy

0.30

0.31

0.32

0.33

0.34

0.35

0.36

0.37

0.38

0.39

0.40

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

nDC

G@

10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

0.26

0.27

0.28

0.29

0.30

0.31

0.32

0.33

0.34

0.35

0.36

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

nDC

G@

10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

Figure: Values of nDCG@10 on MovieLens 100k (left) and 1M (right).

19/26

Page 38: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Diversity

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

Gin

i@10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

0.00

0.01

0.02

0.03

0.04

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

Gin

i@10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

Figure: Values of Gini@10 on MovieLens 100k (left) and 1M (right).

20/26

Page 39: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Novelty

7.5

8.0

8.5

9.0

9.5

10.0

10.5

11.0

11.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

MSI

@10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

8.0

8.5

9.0

9.5

10.0

10.5

11.0

11.5

12.0

12.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

MSI

@10

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

Figure: Values of MSI@10 on MovieLens 100k (le ft) and 1M (right).

21/26

Page 40: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

G-measure of nDCG, Gini and MSI

0.2

0.3

0.4

0.5

0.6

0.7

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

G(G

ini@

10,M

SI@

10,n

DC

G@

10)

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

0.1

0.2

0.3

0.4

0.5

0.6

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.001 0.01 0.1 1 10

G(G

ini@

10,M

SI@

10,n

DC

G@

10)

δ, λ, µ× 103

γ

Additive (γ)Absolute Discounting (δ)

Jelinek-Mercer (λ)Dirichlet Priors (µ)

Figure: Values of the geometric mean among nDCG@10, Gini@10 andMSI@10 on MovieLens 100k (left) and 1M (right).

22/26

Page 41: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

CONCLUSIONS AND FUTURE DIRECTIONS

Page 42: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Conclusions

The IDF effect from IR is related to the novelty of therecommendations.

The use of collection-based smoothing methods with RM2demotes the IDF effect.

Additive smoothing is a simple method that does not demote(nor promote) the IDF effect.

Additive smoothing provides better accuracy, diversity andnovelty figures than collection-based smoothing methods.

24/26

Page 43: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

Future work

Envision new ways of enhancing the IDF effect in RM2:

# Design smoothing methods that actively promote the IDFeffect.

# Use non-uniform prior estimates.

Study axiomatically other IR properties that can be useful inrecommendation.

25/26

Page 44: Additive Smoothing for Relevance-Based Language Modelling of Recommender Systems [CERI '16 Slides]

THANK YOU!

@DVALCARCEhttp://www.dc.fi.udc.es/~dvalcarce