Top Banner
X-Post Creating a Cross Posting Facilitator For Technology Communities . Hacker News & StackOverflow WS3 Group 3 Anca Dumitrache, Fabio Benedetti, Seyi Feyisetan
17

WS3 2014 group project: X-Post

Jul 01, 2015

Download

Documents

Anca Dumitrache

slides for the group project of team 3 (Anca Dumitrache, Fabio Benedetti, Seyi Feyisetan), Web Science summer school in Southampton, July 2014
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: WS3 2014 group project: X-Post

X-PostCreating a Cross Posting FacilitatorFor Technology Communities.

Hacker News & StackOverflow

WS3 Group 3Anca Dumitrache, Fabio Benedetti, Seyi Feyisetan

Page 2: WS3 2014 group project: X-Post

Introduction

● Stack Overflow: questions and answers on technology● Hacker News: news for technology enthusiasts

● similar to Hacker News: Reddit, Slashdot● similar to Stack Overflow: Quora

Page 3: WS3 2014 group project: X-Post

Goals1. develop a methodology to compare online technology

communities

2. use the vocabulary of one social community (e.g. StackOverflow) to describe the other (e.g. Hacker News)

3. topic recommendation: newsworthy cross posting across communities

Page 4: WS3 2014 group project: X-Post

Topic recommendation

Page 5: WS3 2014 group project: X-Post

Pipeline

Page 6: WS3 2014 group project: X-Post

Pipeline

Page 7: WS3 2014 group project: X-Post

Approach

1. data gathering:○ sources: Hacker News + StackOverflow○ fixed timeframe: September 2013○ method: web scraping with Python, R

2. data processing:○ linking: named entity extraction with term matching using the tags

vocabulary from Stack Overflow○ cleanup: only keep posts with tech-related topics

Page 8: WS3 2014 group project: X-Post

Future development1. data processing:

○ crowdsourced disambiguation of entities2. training:

○ use a priori observations of cross posting as training data○ possible features:

i. co-occurring tagsii. frequency of tagsiii. number of points in a postiv. number of comments in a postv. time...

3. evaluation:○ crowdsourced ranking of recommendation relevance

Page 9: WS3 2014 group project: X-Post

Results

Page 10: WS3 2014 group project: X-Post

Topic overlap

Page 11: WS3 2014 group project: X-Post

Trending topics

Page 12: WS3 2014 group project: X-Post

Trending topics

Page 13: WS3 2014 group project: X-Post

Frequency overlap

Page 14: WS3 2014 group project: X-Post

Frequency overlap

zoomed in

Page 15: WS3 2014 group project: X-Post

Findings

1. small set of overlapping topics over the two social machines(but better NER could identify more links)

2. StackOverflow has a more diverse range of topics than HackerNews(although the vocabulary likely introduces bias)

3. different frequently discussed topics on both social machines(although a set of outliers does exist)

Page 16: WS3 2014 group project: X-Post

Future Work● add more data sources such as Reddit, Slashdot

● gather data over a larger timeframe

● fine tune our Named Entity Recogniser

● expand the vocabulary used to describe the communities (and publish as Linked Data)

● use crowdsourcing for tag disambiguation and output evaluation

Page 17: WS3 2014 group project: X-Post

ConclusionPreliminary studies show that: ● we can use StackOverflow tags as a vocabulary to understand online

technology communities

● we can identify a feature set to compare these communities

● there is enough gap between trending topics in the two communities to allow for the use case of a topic recommendation system