Top Banner
Mining Cross-Domain Rating Datasets from Structured Data on Twitter @sidooms Simon Dooms
17

Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Aug 23, 2014

Download

Science

Simon Dooms

Slides about mining cross-domain ratings presented at the WWW 2014 conference on April 8, in Seoul (Korea) by Simon Dooms.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Mining Cross-Domain Rating Datasets from Structured Data on Twitter

@sidoomsSimon Dooms

Page 2: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Rating Datasets

What are ratings? Explicit user preference information

Why ratings? Recommender systems

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 2

Page 3: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Rating Datasets

What are ratings? Explicit user preference information

Why ratings? Recommender systems

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 3

Page 4: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Ratings Scarcity in Research

Ratings = private data Public datasets to the rescue?– MovieLens 100K (1998)– MovieLens 1M (2000)– MovieLens 10M (2008)– More on recsyswiki.com

Old, Synthetic Datasets

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 4

Page 5: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Social Sharing = Ratings Goldmine

Previous research: MovieTweetings

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 5

Page 6: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Social Sharing = Ratings Goldmine

Previous research: MovieTweetings– Movie Rating dataset from IMDb – Twitter– https://github.com/sidooms/MovieTweetings

What about other domains? Websites?

Well, let’s try it out!

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 6

Page 7: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Target Websites - GoodreadsConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 7

Twitter user - Rating - Book titleBook author - Goodreads URL - Time

Page 8: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Target Websites - PandoraConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 8

Twitter user - SongPandora URL - Time

Page 9: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Target Websites - YouTubeConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 9

Twitter user - (Video uploader)YouTube URL - Time

Page 10: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Mining Experiment

But words are wind…– 2 Weeks experiment– 4 Online platforms

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 10

Page 11: Mining Cross-Domain Rating Datasets from Structured Data on Twitter
Page 12: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 12

Python code + Task Scheduler = Dataset fileshttps://github.com/sidooms/Twitter-ratings

Page 13: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

The Numbers

One more thing …

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 13

Page 14: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Cross-Domain Rating DatasetConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 14

Page 15: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Applications

Collect ratings for recsys research / input Cross-domain recsys research Trend detection, analytics, ... Applicable for all social sharing webs

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 15

Page 16: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

Conclusions

Ratings scarcity in research Public dataset are old and synthetic Social sharing = ratings goldmine 2 week experiment, 4 major websites Python code & datasets on Github True cross-domain ratings dataset

ConclusionCross-DomainResultsSocial SharingIntro

Apr. 08, 2014 Simon Dooms - Ghent University - MSM 2014 16

Page 17: Mining Cross-Domain Rating Datasets from Structured Data on Twitter

@sidoomsSimon Dooms

Mining Cross-Domain Rating Datasets from Structured Data on Twitter