Top Banner
Big data meets journalism: automatic news detection Sandjai Bhulai [email protected]
31
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Slides data donderdag #6

Big data meets journalism:automatic news detection

Sandjai [email protected]

Page 2: Slides data donderdag #6

Sandjai Bhulai ([email protected])

The world of today

Page 3: Slides data donderdag #6

Sandjai Bhulai ([email protected])

The world of today: social media

Page 4: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Social media in our daily life

Page 5: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Social media in our daily life

Page 6: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Social media in our daily life

Page 7: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Social media in our daily life

Page 8: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Twitter to fame

Page 9: Slides data donderdag #6

Sandjai Bhulai ([email protected])

The old newsroom

Page 10: Slides data donderdag #6

Sandjai Bhulai ([email protected])

The modern newsroom

Page 11: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Information overload

Page 12: Slides data donderdag #6
Page 13: Slides data donderdag #6
Page 14: Slides data donderdag #6
Page 15: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Forecasting news for nu.nl?

Page 16: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Challenge #1

• How do you get (all) Dutch tweets?

• Twitter has a streaming API• Fair use policy delivers random 1% of the Twitter

stream• Following keywords is allowed

• How much data do you need?• How much data can you get?• How much data can you deal with?• How much data can you store?

Page 17: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Twitter - popularity

Page 18: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Challenge #2

• How do you detect trends on Twitter?

• Absolute frequencies of tweets• Relative frequencies of tweets• Speed of tweets• Acceleration of tweets• Seasonal patterns

• We need a real-time algorithm• We need to efficiently handle memory

Page 19: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Trending topics

1. #PrayforMexico2. #SocialMovies3. #temblor4. Sismo de 7.85. Earthquake in Mexico6. John Elway7. Pat Bowlen8. Marcelo Lagos9. Azcapotzalco10.Niñas de 13 y 14

20 maart 2012, Twitter.com

Page 20: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Trending topics

Page 21: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Number of tweets #PrayforMexico

Page 22: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Challenge #3

• How do you deal with the following tweets?

• “Brand in Amsterdam”• “Vuur in 020”• “Fikkie in A’dam”

• “Ik heb brand gezien”• “Ik zag brand”• “Ik zie brand”

Page 23: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Visualization

Page 24: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Visualization

Page 25: Slides data donderdag #6
Page 26: Slides data donderdag #6

Sandjai Bhulai ([email protected])

The final system

Page 27: Slides data donderdag #6

Sandjai Bhulai ([email protected])

From nu.nl to straks.nl

Page 28: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Page 29: Slides data donderdag #6

Sandjai Bhulai ([email protected])

The future

• Many challenges ahead:

• How to deal with retweets?• Integration of reputation scores?• Use of profile information?• Advantages of semantic research?• Add feeds of other social media?• Generalize to other languages?• Dependencies of GPS information?• …

Page 30: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Real-time driver information

Page 31: Slides data donderdag #6

Sandjai Bhulai ([email protected])

Questions?