Evaluation of Collaborative Filtering Algorithms for Recommending Articles on CiteULike June 29th, 2009 HT 2009, Workshop “Web 3.0: Merging Semantic Web and Social Web” Dr. Peter Brusilovsky, Associate Professor Denis Parra, PhD Student School of Information Sciences University of Pittsburgh
16
Embed
Evaluation of Collaborative Filtering Algorithms for Recommending Articles on CiteULike
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Evaluation of Collaborative Filtering Algorithms for Recommending Articles on CiteULike
June 29th, 2009
HT 2009, Workshop “Web 3.0: Merging Semantic Web and Social Web”
Dr. Peter Brusilovsky, Associate ProfessorDenis Parra, PhD StudentSchool of Information SciencesUniversity of Pittsburgh
Outline
• Motivation• Methods
– CCF– NwCF– BM25
• The Study• Description of the Data• Results• Conclusions
MotivationBased on information available on CiteULike : Develop user-centered recommendations of
scientific articles. Investigate the potential of users’ tags in
collaborative tagging systems to provide recommendations.
Compare the accuracy of user-based collaborative filtering methods.
Why CiteULike? Popular collaborative tagging system more topic-
oriented than delicious: article references. Familiarity with the system.
(NwCF): Similar to CCF, yet incorporates the “amount of neighbors rating an item” in the ranking formula of recommended items
∑∑∑
⊂⊂
⊂
−−
−−=
nunu
nu
CRi nniCRi uui
CRi nniuui
rrrr
rrrrnuuserSim
,,
,
22 )()(
))((),(
),())(1(log),( 10 iupredinbriudpre ⋅+=′
Methods: NwCF (2 / 2)
3
4
1
4
4
1
1
3
3
2
5
3
4
2
1
3
2
2
53
3
2
Methods: BM25 (1 / 2)
• BM25: We obtain the similarity between users (neighbors) using their set of tags as “documents” and performing an Okapi BM25 (probabilistic IR model) Retrieval Status Value [2] calculation.
),())(1(log),( 10 iupredinbriudpre ⋅+=′
∑∈ +
+⋅
+×+−+
⋅=qt tq
tq
tdaved
tdd tfk
tfk
tfLLbbk
tfkIDFRSV
3
3
1
1)1(
))/()1((
)1(
Methods: BM25 (2 / 2)
Query terms Doc_1 Doc_2 Doc_3
The Study
• 7 subjects• To each subject, four lists of 10
recommendations (each list) were created (CCF, NwCF, BM25_10, BM25_20)
• The four lists were combined and sorted randomly (due to overlapping of recommendations, less than 40 items)
• Subjects were asked to evaluate relevance (relevant/somewhat relevant/not relevant) and novelty (novel/ somewhat novel/ not novel)
Description of the Data
Crawl CUL for 20 “center users” (only 7 were used for the study)
• The rating scale must be considered carefully in a CF approach.
• NwCF, which incorporates the number of raters, decreases the uncertainty produced by items with too few ratings.
• The tag-based user similarity approach shows interesting results, which can lead us to consider it a valid approach to Pearson-correlation when using CF algorithms.
• We will incorporate more users in our future studies to make the results more conclusive.
Questions?
Bibliography
• [1] Schafer, J., Frankowski, D., Herlocker, J. and Sen, S. 2007 Collaborative Filtering Recommender Systems. The Adaptive Web. (May 2007), 291-324.
• [2] Manning, C., Raghavan, P. and Schutze, H. 2008 Introduction to Information Retrieval. Cambridge University Press.