Mining Cross-Domain Rating Datasets from Structured Data on Twitter @sidooms Simon Dooms
Aug 23, 2014
Mining Cross-Domain Rating Datasets from Structured Data on Twitter
@sidoomsSimon Dooms
Rating Datasets
What are ratings? Explicit user preference information
Why ratings? Recommender systems
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 2
Rating Datasets
What are ratings? Explicit user preference information
Why ratings? Recommender systems
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 3
Ratings Scarcity in Research
Ratings = private data Public datasets to the rescue?– MovieLens 100K (1998)– MovieLens 1M (2000)– MovieLens 10M (2008)– More on recsyswiki.com
Old, Synthetic Datasets
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 4
Social Sharing = Ratings Goldmine
Previous research: MovieTweetings
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 5
Social Sharing = Ratings Goldmine
Previous research: MovieTweetings– Movie Rating dataset from IMDb – Twitter– https://github.com/sidooms/MovieTweetings
What about other domains? Websites?
Well, let’s try it out!
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 6
Target Websites - GoodreadsConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 7
Twitter user - Rating - Book titleBook author - Goodreads URL - Time
Target Websites - PandoraConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 8
Twitter user - SongPandora URL - Time
Target Websites - YouTubeConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 9
Twitter user - (Video uploader)YouTube URL - Time
Mining Experiment
But words are wind…– 2 Weeks experiment– 4 Online platforms
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 10
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 12
Python code + Task Scheduler = Dataset fileshttps://github.com/sidooms/Twitter-ratings
The Numbers
One more thing …
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 13
Cross-Domain Rating DatasetConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 14
Applications
Collect ratings for recsys research / input Cross-domain recsys research Trend detection, analytics, ... Applicable for all social sharing webs
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 15
Conclusions
Ratings scarcity in research Public dataset are old and synthetic Social sharing = ratings goldmine 2 week experiment, 4 major websites Python code & datasets on Github True cross-domain ratings dataset
ConclusionCross-DomainResultsSocial SharingIntro
Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 16
@sidoomsSimon Dooms
Mining Cross-Domain Rating Datasets from Structured Data on Twitter