Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering Kohei Hayashi (National Institute of Informatics) August 11, 2015 Joint work with • Takanori Maehara (Shizuoka Univ) • Masashi Toyoda (Univ Tokyo) • Ken-ichi Kawarabayashi (National Institute of Informatics) 1 / 21
22
Embed
Real-Time Top-R Topic Detection on Twitter with Topic Hijack Filtering
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Real-Time Top-R Topic Detection onTwitter with Topic Hijack Filtering
Kohei Hayashi (National Institute of Informatics)
August 11, 2015
Joint work with• Takanori Maehara (Shizuoka Univ)• Masashi Toyoda (Univ Tokyo)• Ken-ichi Kawarabayashi (National Institute ofInformatics)
1 / 21
Twitter: The Rapid Stream of Texts
SNS with short messages (tweets)Volume 41M usersDiversity Covering any topics: news, politics, TV, ...
Speed 270K tweets/min
A promising data source for topic detection• May discover breaking news and events even fasterthan news media 2 / 21
Two Challenges
..1 Topic Detection in Real-time
..2 Noise Filtering
3 / 21
Noise FilteringMany spam tweets generated by not human
• e.g. “tweet buttons”
Exaggerate co-occurrence and “hijack” important topics4 / 21
Contributions
A streaming topic detection algorithm based onnon-negative matrix factorization (NMF)
..1 Highly scalable:Able to deal with a 20M×1M sparse matrix/sec