Top Banner
Twitterpedia Visualization Lab By: Thomas Kraft
13

Twitterpedia

Feb 24, 2016

Download

Documents

tria maulana

Twitterpedia. Visualization Lab By: Thomas Kraft. Overview. Current State. Future. Problem. What is being talked about and where? Twitter has massive amounts of data Tweets are unstructured Goal: Quickly identify current events / topics on a large scale. Overview. Current State. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Twitterpedia

Twitterpedia

Visualization LabBy: Thomas Kraft

Page 2: Twitterpedia

What is being talked about and where?

Twitter has massive amounts of data

Tweets are unstructured

Goal: Quickly identify current events / topics on a large scale

ProblemOverview Current

State Future

Page 3: Twitterpedia

Data Collection◦ Database◦ Web Crawler

Analyze Data◦ Topic Modeling

Get Trends and topics!

What Needs To Be DoneCurrent StateOverview Future

Page 4: Twitterpedia

Processes large datasets◦ Splits data into chunks◦ Data processed on multiple machines

Very Scalable◦ Add/remove computers easily◦ As dataset grows so can # of machines

HadoopCurrent StateOverview Future

Page 5: Twitterpedia

Computer ClusterCurrent StateOverview Future

Page 6: Twitterpedia

Latent Dirichlet Allocation (LDA)◦ Correlations between words in topics

Topics composed of keyword groups

Tweets topic can effectively be inferred

Topic ModelingCurrent StateOverview Future

Page 7: Twitterpedia

“Can Rick Ross Please put his clothes on?”

“Bruno & alicia! I love it!”

June 26, 2011

Current StateOverview Future

Page 8: Twitterpedia

Topic Modeling Resource Intensive◦ Iterates over data

Single Computer can’t handle large dataset

Solution: Parallelizethe process

ChallengeCurrent StateOverview Future

Page 9: Twitterpedia

Write algorithm to split up tweets and join output

Improves scalability for LDA◦ Shows near linear improvements

PLDA will take twitterpedia to next level◦ Larger datasets with quicker processing

Parallel - LDACurrent StateOverview Future

Page 10: Twitterpedia

Write algorithm to parallelize tweet distribution and aggregation

Create website implementing topics

FutureFutureCurrent

StateOverview

Page 11: Twitterpedia

Working on this project has been a great learning experience◦ Designed and managed a large database◦ Efficiency high priority◦ Learned cool tricks along the way…

ConclusionCurrent StateOverview Future

Page 12: Twitterpedia

A Special thanks to my advisor Xiaoyu Wang, Wenwen Dou, and to the Visualization Center

Thomas Kraft : [email protected]

ThanksCurrent StateOverview Future

Page 13: Twitterpedia

Questions?