Top Banner
Automatic Selection of Linked Open Data features in Graph-based Recommender Systems Cataldo Musto, Pierpaolo Basile, Marco de Gemmis Pasquale Lops, Giovanni Semeraro, Simone Rutigliano (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group) CBRecSys 2015 Workshop on New Trends in Content-based Recommender Systems Vienna (Austria) September 20, 2015
100

Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Apr 14, 2017

Download

Technology

Cataldo Musto
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Automatic Selection of Linked Open Data features in

Graph-based Recommender SystemsCataldo Musto, Pierpaolo Basile, Marco de Gemmis

Pasquale Lops, Giovanni Semeraro, Simone Rutigliano (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)

CBRecSys 2015 Workshop on New Trends in

Content-based Recommender Systems Vienna (Austria)

September 20, 2015

Page 2: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Outline• Basics

• Linked Open Data • Graph-based Recommendations • PageRank with Priors

• Methodology • Introducing LOD-based features • Selecting LOD-based features

• Experiments • Impact of LOD-based features • Comparison to baselines • Trade-off F1/Diversity

• Conclusions

2Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 3: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

3

Linked Open Data

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 4: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

4

Linked Open Data

What are you talking about?Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

(*) basic italian gesture

(*)

Page 5: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

5

Linked Open Data

Methodology to publish, share and link structured data on the Web

Definition

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 6: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

6

Linked Open Data (cloud)What is it?

A (large) set of interconnected semantic datasetsCataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 7: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

7

Linked Open Data (cloud)What kind of datasets?

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Each bubble is a dataset!

Page 8: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

8

Linked Open Data (cloud)How many data?

1048 datasets and 58 billions triplessource: http://stats.lod2.eu

(slide from CBRecSys 2014 presentation)

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 9: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

9

Linked Open Data (cloud)How many data?

3426 datasets and 86 billions triplessource: http://stats.lod2.eu

today!

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 10: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

10

Linked Open Data (cloud)

DBpedia is the structured mapping of Wikipedia

http://dbpedia.org

It is the core of the LOD cloud.

DBpedia

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 11: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

11

Linked Open Data (cloud)Example: unstructured content from Wikipedia

example (Wikipedia page)

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 12: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

12

Linked Open Data (cloud)How are these data represented?

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

The Matrix

Don Davis

http://dbpedia.org/resource/Category:Films_shot_in_Australia

Films shot in Australia

dcterms:subject

dbpedia-owl:musicComposer

http://dbpedia.org/resource/Don_Davis_(composer)

dcterms:subject

dcterms:subject

dcte

rms:

subj

ect

dbpe

dia-

owl:d

irect

ordcterms:subject

dcterms:subject

Dystopian Films1999 FilmsAmerican Action Thriller Films

Cyberpunk Films The Wachowskis

http://dbpedia.org/resource/The_Wachowskis

http://dbpedia.org/resource/Dystopian_FIlmshttp://dbpedia.org/resource/1999_FIlms

http://dbpedia.org/resource/Cyberpunk_Films

http://dbpedia.org/resource/American_Action_Thriller_FIlms

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

Page 13: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

13

Linked Open Data (cloud)How are these data represented?

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

The Matrix

Don Davis

http://dbpedia.org/resource/Category:Films_shot_in_Australia

Films shot in Australia

dcterms:subject

dbpedia-owl:musicComposer

http://dbpedia.org/resource/Don_Davis_(composer)

dcterms:subject

dcterms:subject

dcte

rms:

subj

ect

dbpe

dia-

owl:d

irect

ordcterms:subject

dcterms:subject

Dystopian Films1999 FilmsAmerican Action Thriller Films

Cyberpunk Films The Wachowskis

http://dbpedia.org/resource/The_Wachowskis

http://dbpedia.org/resource/Dystopian_FIlmshttp://dbpedia.org/resource/1999_FIlms

http://dbpedia.org/resource/Cyberpunk_Films

http://dbpedia.org/resource/American_Action_Thriller_FIlms

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

Page 14: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

14

Linked Open Data (cloud)How are these data represented?

Semantic Web cake

Information from the LOD cloud is

represented in RDF

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 15: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

15

Linked Open Data (cloud)How are these data represented?

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

The Matrix

Don Davis

http://dbpedia.org/resource/Category:Films_shot_in_Australia

Films shot in Australia

dcterms:subject

dbpedia-owl:musicComposer

http://dbpedia.org/resource/Don_Davis_(composer)

dcterms:subject

dcterms:subject

dbpe

dia-

owl:d

irect

ordcterms:subject

dcterms:subject

Dystopian FilmsAmerican Action Thriller Films

Cyberpunk Films The Wachowskis

http://dbpedia.org/resource/The_Wachowskis

http://dbpedia.org/resource/Dystopian_FIlms

http://dbpedia.org/resource/Cyberpunk_Films

http://dbpedia.org/resource/American_Action_Thriller_FIlms

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

dbo:

runt

ime

136

Page 16: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

16

Linked Open Data (cloud)How are these data represented?

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

The Matrix

Don Davis

http://dbpedia.org/resource/Category:Films_shot_in_Australia

Films shot in Australia

dcterms:subject

dbpedia-owl:musicComposer

http://dbpedia.org/resource/Don_Davis_(composer)

\

dcterms:subject

dcterms:subject

dbo:

runt

ime

dbpe

dia-

owl:d

irect

ordcterms:subject

dcterms:subject

Dystopian Films136American Action Thriller Films

Cyberpunk Films The Wachowskis

http://dbpedia.org/resource/The_Wachowskis

http://dbpedia.org/resource/Dystopian_FIlms

http://dbpedia.org/resource/Cyberpunk_Films

http://dbpedia.org/resource/American_Action_Thriller_FIlms

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

Several interesting (non-trivial) features come into play!

Page 17: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

17Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Research Question

Can we use Linked Open Data for Recommender Systems?

Page 18: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

18Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Contribution

We investigate the impact of the injection of exogenous knowledge coming from the LOD cloud

in graph-based recommender systems

Page 19: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

19Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Graphs

Why graphs?

Page 20: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

20Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Graphs

First, because LOD cloud is a graph

Page 21: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

21Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Graphs

Graphs are the most straightforward representation for LOD-based data points

Page 22: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

22Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Why graphs?

Second, graphs can easily model the recommendation task

Page 23: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

23Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4(Bipartite graph)

Users = nodes Items = nodes

Preferences = edges

Very intuitiverepresentation!

u1

i1

u2 i2

u3 i3u4

i4

Graph-based RecSys

Page 24: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

24Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

i1

u2 i2

u3 i3u4

i4

Graph-based RecSysRecommendations

are obtained by identifying the most relevant

(item) nodes for a target user,

according to the graph topology.

Page 25: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

25Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

i1

u2 i2

u3 i3u4

i4

Graph-based RecSysRecommendations

are obtained by identifying the most relevant

(item) nodes for a target user,

according to the graph topology.

How can we obtain such information?

Page 26: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

26Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Graph-based Recommendations

PageRank calculates the ‘importance’ of a node according to the quality and the number of its connections

PageRank

Page 27: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

27Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

i1

u2 i2

u3 i3u4

i4

Graph-based RecSysIt is likely that i4 is

suggested to u3, since it is more (and better)

connected in the graph

Page 28: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

28Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

i1

u2 i2

u3 i3u4

i4

Graph-based RecSysIt is likely that i4 is

suggested to u3, since it is more (and better)

connected in the graph

Issue: Classic PageRank

is not personalized!

Page 29: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

29Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

Graph-based RecSysIt is likely that i4 is

suggested to u3, since it is more (and better)

connected in the graph

Issue: Classic PageRank

is not personalized!

1/8

1/8

1/8

1/81/8

1/8

1/8 1/8

When PageRank is run, all the nodes

are provided with an even distributed

probability

Page 30: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

30Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

Graph-based RecSysIt is likely that i4 is

suggested to u3, since it is more (and better)

connected in the graph1/8

1/8

1/8

1/81/8

1/8

1/8 1/8

When PageRank is run, all the nodes

are provided with an even distributed

probability

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: bringing order to

the Web.

Reference

Page 31: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

31Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

Graph-based RecSysIt is likely that i4 is

suggested to u3, since it is more (and better)

connected in the graph1/8

1/8

1/8

1/81/8

1/8

1/8 1/8

When PageRank is run, all the nodes

are provided with an even distributed

probability

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: bringing order to

the Web.

ReferenceAll the users are provided with the same ranking, which it is independent from user preferences

Page 32: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

32Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

Graph-based RecSysIt is likely that i4 is

suggested to u3, since it is more (and better)

connected in the graph1/8

1/8

1/8

1/81/8

1/8

1/8 1/8

When PageRank is run, all the nodes

are provided with an even distributed

probability

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: bringing order to

the Web.

ReferenceSolution: PageRank with

Priors

Page 33: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

33Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Graph-based Recommendations

Rationale: PageRank with Priors

. T. H. Haveliwala. Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search. IEEE Trans. Knowl. Data Eng., 15(4):784–796, 2003.

Reference

Random Walks have to be influenced by previous users behaviors (preferences!). Probability cannot be even distributed.

Page 34: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

34Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

PageRank with PriorsA large probability

(e.g. 80%) is assigned a priori to

specific items

0.33/8

0.33/8

0.33/8

0.33/8

0.33/8

0.33/8

3/8

3/8

Page 35: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

35Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

PageRank with PriorsA large probability

(e.g. 80%) is assigned a priori to

specific items

0.33/8

0.33/8

0.33/8

0.33/8

0.33/8

0.33/8

3/8

3/8

e.g., the items a user liked!

Page 36: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

36Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

PageRank with PriorsA large probability

(e.g. 80%) is assigned a priori to

specific items

0.33/8

0.33/8

0.33/8

0.33/8

0.33/8

0.33/8

3/8

3/8

e.g., the items a user liked!

PageRank is run, and calculations

are thus influenced by such a different

distribution of a priori probabilities.

Page 37: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

37Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

PageRank with Priors is a very good recommendation model.

Musto, C., Basile, P., Lops, P., de Gemmis, M., & Semeraro, G. (2014).

Linked Open Data-enabled Strategies for Top-N Recommendations.

CBRecSys 2014

Reference

Page 38: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

38Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

PageRank with Priors is a very good recommendation model.

Musto, C., Basile, P., Lops, P., de Gemmis, M., & Semeraro, G. (2014).

Linked Open Data-enabled Strategies for Top-N Recommendations.

CBRecSys 2014

Reference

We want to exploit Linked Open Data to further improve it.

Page 39: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

39Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Methodology

Graphs provide a uniform and solid representation to model collaborative (user preferences) data points

as well as LOD-based ones

Page 40: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

40Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

If we are able to map the items in the dataset with the entities in the LOD cloud, our representation can

be extended with new data points

Methodology

Page 41: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

41Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

u2

u3

u4

Methodologyexample - original representation

Page 42: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

42Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

u2

u3

u4

dcterms:subject

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

dbprop:director

Quentin Tarantino

dbprop:director

Methodologyexample - LOD-boosted representation

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

dcterms:subject

Page 43: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

43Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

u2

u3

u4

dcterms:subject

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

dbprop:director

Quentin Tarantino

dbprop:director

Methodologyexample - LOD-boosted representation

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

dcterms:subject

Many new information can be injected in the

graph

Page 44: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

44Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

u2

u3

u4

dcterms:subject

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

dbprop:director

Quentin Tarantino

dbprop:director

Methodologyexample - LOD-boosted representation

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

dcterms:subject

PageRank with Priors can be run again against this novel representation

Page 45: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

45Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

u2

u3

u4

dcterms:subject

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

dbprop:director

Quentin Tarantino

dbprop:director

Methodologyexample - LOD-boosted representation

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

dcterms:subject

How do the recommendations

change with such a new topology?

Page 46: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

46Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

u2

u3

u4

dcterms:subject

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

dbprop:director

Quentin Tarantino

dbprop:director

Methodologyexample - LOD-boosted representation

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

dcterms:subject

Is there any significant increase in terms of

computational load?

Page 47: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

47Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

u2

u3

u4

dcterms:subject

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

dbprop:director

Quentin Tarantino

dbprop:director

Methodologyexample - LOD-boosted representation

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

dcterms:subject

Is it necessary to inject all of the properties

available in the LOD cloud?

Page 48: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

48Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

u2

u3

u4

dcterms:subject

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

dbprop:director

Quentin Tarantino

dbprop:director

Methodologyexample - LOD-boosted representation

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

dcterms:subject

XX

Page 49: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

49Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

u2

u3

u4

dcterms:subject

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

dbprop:director

Quentin Tarantino

dbprop:director

Methodologyexample - LOD-boosted representation

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

dcterms:subjectXX

X

Page 50: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

50Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

i4

u1

u2

u3

u4

dcterms:subject

http://dbpedia.org/resource/Films_About_Rebellions

Films about Rebellions

dbprop:director

Quentin Tarantino

dbprop:director

Methodologyexample - LOD-boosted representation

1999 films

http://dbpedia.org/resource/1999_films

dcterms:subject

dcterms:subject

Is it possibile to automatically select the most promising

properties?

Page 51: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

51

Experiments

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 52: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

52Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Research QuestionsDo graph-based recommender systems benefit of the introduction of LOD-based features?

Do graph-based recommender systems exploiting LOD benefit of the adoption of feature selection techniques?

1/2

1.

2.

Page 53: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

53Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Research Questions3.

4.

2/2Is there any correlation between the choice of the FS technique and the behavior of the algorithm? (e.g., better diversity or better F1) ?

How does our methodology perform with respect to state-of-the-art algorithms?

Page 54: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

54Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Experimental EvaluationDescription of the dataset

MovieLens 100k983 users1,682 movies100,000 ratings55.17% positive ratings84.43 ratings/user (avg.)48.48 ratings/item (avg.)93.7% sparsity

Page 55: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

55Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Experimental EvaluationDBpedia mapping

1,600 movies (95%) were successfully mapped to DBpedia by querying a SPARQL endpoint with the title of the movie.

Page 56: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

56Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Experimental EvaluationDBpedia mapping

60 properties were extracted from the LOD cloud

Page 57: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

57Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Experimental EvaluationMost Popular LOD properties

Top-10 properties in the Movie domaindcterms:subjectdbprop:starring

dbprop:producerdbprop:title

dbprop:writerdbprop:country

dbprop:distributordbprop:music

dbprop:directordbprop:runtime

Page 58: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

58Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Experimental EvaluationExperimental Protocol

Algorithm PageRank with Priors

Data Split 5-fold Cross Validation

Graph Representation G, GLOD, GLOD+FS

Feature Selection Techniques PageRank, Chi-Square, Information Gain, Gain Ratio, mRMR, PCA, SVM

#Selected Features 10, 30, 50 (out of 60 overall features)

Evaluation Metrics F1, Intra-List Diversity, Run Time

Page 59: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

59Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Experimental EvaluationGraph Representations :: Recap

GBasic Graph with collaborative data points

Page 60: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

60Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Experimental Evaluation

GLOD Graph extended with all the properties gathered from the LOD cloud

Graph Representations :: Recap

Page 61: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

61Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Experimental Evaluation

GLOD+FS Graph encoding only the most relevant properties selected by a feature selection technique FS

Graph Representations :: Recap

Page 62: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 1

62

Impact of LOD-based features.

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 63: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

F1@5

F1@10

F1@15

G

G_LOD

G

G_LOD

G

G_LOD

53 55 57 59 61

59,63

60,83

54,24

59,41

60,23

53,89

Experiment 1

63

Impact of LOD-based features :: F1-measure

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Significant improvement in all the metrics (Wilcoxon test)

Page 64: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Run Time (min.)

G

G_LOD

50 262,5 475 687,5 900

880

72

Experiment 1

64Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Tremendous increase in the run time, as well

Impact of LOD-based features :: Run Time

Page 65: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 1

65Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Impact of LOD-based features :: Graph Representation

Nodes Edges

G 2,625 100,000

GLOD 53,794 178,020

Tremendous increase in the run timedepending on the amount of information encoded in the graph

Page 66: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 1

66

Impact of LOD-based features :: LESSONS LEARNED

F1 RunTime

LOD features can have a good impact on the overall F1

Tremendous Run Time increase

1.2.

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 67: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 2

67

Impact of Feature Selection techniques

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 68: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

PageRank

mRMR

Chi-Square

SVM

Gain Ratio

Inf. Gain

PCA

103050103050103050103050103050103050103050

53 53,5 54 54,5 5554,31

54,12

54,06

54,21

54,2

54,21

54,12

54,13

53,96

53,98

54,13

54,19

54,29

54,29

54,06

53,97

53,72

53,82

54,14

53,97

54,18

Experiment 2

68

Impact of Feature Selection techniques :: F1@5

Typically, the larger the number of features, the better the F1

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 69: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

PageRank

mRMR

Chi-Square

SVM

Gain Ratio

Inf. Gain

PCA

103050103050103050103050103050103050103050

53 53,5 54 54,5 5554,31

54,12

54,06

54,21

54,2

54,21

54,12

54,13

53,96

53,98

54,13

54,19

54,29

54,29

54,06

53,97

53,72

53,82

54,14

53,97

54,18

Experiment 2

69

Impact of Feature Selection techniques :: F1@5

Typically, the larger the number of features, the better the F1

#50

#50

#50

#50

#50

#30

#30

(best)

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 70: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

PageRank

mRMR

Chi-Square

SVM

Gain Ratio

Inf. Gain

PCA

103050103050103050103050103050103050103050

53 53,5 54 54,5 5554,31

54,12

54,06

54,21

54,2

54,21

54,12

54,13

53,96

53,98

54,13

54,19

54,29

54,29

54,06

53,97

53,72

53,82

54,14

53,97

54,18

Experiment 2

70

Impact of Feature Selection techniques :: F1@5

#50

#50

#50

#50

#50

#30

#30

(best)

VERY IMPORTANT: we noted a dataset-dependant behavior

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 71: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

PageRank

mRMR

Chi-Square

SVM

Gain Ratio

Inf. Gain

PCA

103050103050103050103050103050103050103050

53 53,5 54 54,5 5554,31

54,12

54,06

54,21

54,2

54,21

54,12

54,13

53,96

53,98

54,13

54,19

54,29

54,29

54,06

53,97

53,72

53,82

54,14

53,97

54,18

Experiment 2

71

Impact of Feature Selection techniques :: F1@5

How does it perform with respect to the baseline?Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 72: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 2

72

Impact of Feature Selection techniques :: F1@5

GLOD (baseline) = 54,24

PageRank

mRMR

Chi-Square

SVM

Gain Ratio

Inf. Gain

PCA

103050103050103050103050103050103050103050

53 53,5 54 54,5 5554,31

54,12

54,06

54,21

54,2

54,21

54,12

54,13

53,96

53,98

54,13

54,19

54,29

54,29

54,06

53,97

53,72

53,82

54,14

53,97

54,18

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 73: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

PageRank

mRMR

Chi-Square

SVM

Gain Ratio

Inf. Gain

PCA

103050103050103050103050103050103050103050

53 53,5 54 54,5 5554,31

54,12

54,06

54,21

54,2

54,21

54,12

54,13

53,96

53,98

54,13

54,19

54,29

54,29

54,06

53,97

53,72

53,82

54,14

53,97

54,18

Experiment 2

73

Impact of Feature Selection techniques :: F1@5

Only three out of seven techniques (and only with 50 features) overcome the baseline

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 74: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

PageRank

mRMR

Chi-Square

SVM

Gain Ratio

Inf. Gain

PCA

103050103050103050103050103050103050103050

53 53,5 54 54,5 5554,31

54,12

54,06

54,21

54,2

54,21

54,12

54,13

53,96

53,98

54,13

54,19

54,29

54,29

54,06

53,97

53,72

53,82

54,14

53,97

54,18

Experiment 2

74

Impact of Feature Selection techniques :: F1@5

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Overall, PCA is the best-performing feature selection technique

Page 75: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Run Time (min.)

GLOD

GLOD+PCA

50 262,5 475 687,5 900

581

880

Experiment 2

75Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

33.9% decrease

Impact of Feature Selection techniques :: Run Time

Page 76: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

76Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Nodes Edges

GLOD 53,794 178,020

GLOD+PCA 49,158 169,405

Experiment 2Impact of Feature Selection techniques :: Run Time

-8.6% nodes and -4.8% edges

Page 77: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 2

77Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Recap of the Experiment

G GLOD GLOD+PCA

F1@5 0,5406 0,5424 0,5431F1@10 0,6068 0,6083 0,6088F1@15 0,5956 0,5963 0,5970

Run Time 72 880 581LOD Properties 0 60 50

The adoption of Feature Selection Techniques improves F1 and decreases run time

Page 78: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 2

78

Impact of Features Selection techniques :: LESSONS LEARNED

F1 RunTime

Computational load drops down, as expected

Features Selection techniques are useful, but they do not always improve F11.

2.3. The optimal number of features is dataset-dependant

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 79: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

79

Trade-off between F1 and diversity

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 80: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

80

Trade-off between F1 and diversity

Can the choice of the feature selection technique endogenously induce an higher diversity (or,

respectively, an higher F1) of the recommendations?

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 81: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

81

Trade-off between F1 and diversity :: F1@5

Features Selection techniques can be split into four classes

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 82: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

82

Trade-off between F1 and diversity :: F1@5

Features Selection techniques can be split into four classes

Low Diversity Low Accuracy

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 83: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

83

Trade-off between F1 and diversity :: F1@5

Features Selection techniques can be split into four classes

Low Diversity Low Accuracy

Low Diversity High Accuracy

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 84: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

84

Trade-off between F1 and diversity :: F1@5

Features Selection techniques can be split into four classes

Low Diversity Low Accuracy

Low Diversity High Accuracy

High Diversity Low Accuracy

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 85: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

85

Trade-off between F1 and diversity :: F1@5

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Features Selection techniques can be split into four classes

Low Diversity Low Accuracy

Low Diversity High Accuracy

High Diversity Low Accuracy

High Diversity High Accuracy

Page 86: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

86

Trade-off between F1 and diversity :: F1@5

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Features Selection techniques can be split into four classes

Low Diversity Low Accuracy

Low Diversity High Accuracy

High Diversity Low Accuracy

High Diversity High Accuracy

Page 87: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

87

Trade-off between F1 and diversity :: F1@5

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

mRMR, Information Gain and ChiSquare are not useful

Page 88: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

88

Trade-off between F1 and diversity :: F1@5

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

PCA maximizes F1, at the expense of a little diversity

Page 89: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

89

Trade-off between F1 and diversity :: F1@5

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Gain Ratio and SVM sacrifice F1, to induce an higher diversity

Page 90: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

90

Trade-off between F1 and diversity :: F1@5

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

PageRank obtains a good compromise between F1 and Diversity

Page 91: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 3

91

Trade-off between F1 and Diversity :: LESSONS LEARNED

F1

Behavior needs to be generalized by analyzing different datasets

Features Selection techniques can maximize a specific evaluation metric, thus a graph-based recsys can be tuned according to the requirements of a scenario

1.2.

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 92: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 4

92

Comparison to State of the artBPRMF (Bayesian Personalized Ranking) [+]

U2U-KNN (User to User CF) I2I-KNN (Item to Item CF)

POPULAR (Popularity-based baseline)

[+] S. Rendle, C.Freudenthaler, Z. Gantner, L. Schmidt-Thieme: BPR: Bayesian Personalized Ranking from Implicit Feedback. UAI 2009.

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 93: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 4

93

Comparison to State of the Art

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

50

54

58

62

66

F1@5 F1@10

60,88

54,31

59,16

51,4

59,16

51,78

59,7

52,2

58,35

50,22

I2I-KNN U2U-KNN BPRMF POPULAR PR (G_LOD+PCA)

Page 94: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 4

94

Comparison to State of the Art

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

50

54

58

62

66

F1@5 F1@10

60,88

54,31

59,16

51,4

59,16

51,78

59,7

52,2

58,35

50,22

I2I-KNN U2U-KNN BPRMF POPULAR PR (G_LOD+PCA)

PageRank with Priors boosted with LOD is the best-performing approach

Page 95: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Experiment 4

95

Comparison to State of the Art

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

50

54

58

62

66

F1@5 F1@10

60,88

54,31

59,16

51,4

59,16

51,78

59,7

52,2

58,35

50,22

I2I-KNN U2U-KNN BPRMF POPULAR PR (G_LOD+PCA)

Even state-of-the-art approaches based on Matrix Factorization are overcame by our methodology

Page 96: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Conclusions

96Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 97: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Recap

97

Methodolology1. PageRank with Priors as base algorithm2. Mapping of the items with nodes in the Linked

Open Data Cloud 3. Expansion of the data points and injection of new

nodes and edges 4. Use of feature selection to automatically select the

most promising properties

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

INVESTIGATION ABOUT THE EFFECTIVENESS OF LINKED OPEN DATA INGRAPH-BASED RECOMMENDER SYSTEMS

Page 98: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Lessons Learned

98

Evaluation1. PageRank with Priors benefit of the injection of data points

coming from the LOD cloud2. Feature Selection techniques improve the results but need

to be properly tuned, since its usage is not always useful. 3. A significant connection between the choice of the feature

selection technique and the maximization of a specific evaluation metric exists.

4. PageRank with Priors boosted with LOD significantly overcomes state-of-the-art approaches.

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

INVESTIGATION ABOUT THE EFFECTIVENESS OF LINKED OPEN DATA INGRAPH-BASED RECOMMENDER SYSTEMS

Page 99: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

Future Research

99

Evaluation against different datasets and stronger baselines;

Further expansion of the graph, by introducing more LOD-based data points

Evaluation of Novelty and Serendipity on LOD-based Recommendations;

Cataldo Musto, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro, Simone Rutigliano. Automatic Selection of Linked Open features in Graph-based Recommender Systems. CBRecSys 2015 Workshop, Vienna, 20.09.2015

Page 100: Automatic Selection of Linked Open Data features in Graph-based Recommender Systems

questions?Cataldo Musto, Ph.D

[email protected]