Feature Extraction for Effective Microblog Search and Adaptive Clustering Algorithms for TTG PKUICST at TREC 2014 Microblog Track Chao Lv Feifan Fan Runwei Qiang Yue Fei Jianwu Yang [email protected]Peking University 北京大学计算机科学技术研究所 Institute of Computer Science & Technology Peking University
23
Embed
Feature Extraction for Effective Microblog Search and Adaptive Clustering Algorithms for TTG
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Feature Extraction for Effective Microblog Search and Adaptive Clustering Algorithms for TTG
PKUICST at TREC 2014 Microblog TrackChao Lv Feifan Fan Runwei Qiang Yue Fei Jianwu Yang
• Use different queries to better understand the user’s search intent• Original Query
• Top Tweet Based Query
• Web Based Query
• Freebase Based Query
• Whether to use PRF based query expansion?
Feature ExtractionQuery Example
10
Ron Weasleybirthday
1. Ronald Weasley - Harry Potter WikiRonald Bilius Weasley was the sixth of seven children born to Arthur and Molly Weasley (née Prewett), and got his middle name from his uncle. He was born at?2. Ronald Weasley's seventeenth birthday - Harry Potter WikiRonald Weasley's seventeenth birthday took place on 1 March, 1997. He received many gifts from3. Drunk Ron Weasley Sings Happy Birthday To Harry Potter - YouTubeJul 31, 2013 Drunk Ron Weasley (played by Simon Pegg) visits Jimmy Fallon to wish Harry Potter a happy birthday. Subscribe NOW to The Tonight Show?4. …5. …
It s Ron Weasley s birthday The ginger who vomited slugs out from his mouth happy birthday Ron
• Plain Tweet Text (Origin)Say HappyBirthdayRonWeasley and share your creativity by submitting a drawing of Ron to celebrate
• Topic Information from URL (Title)Pottermore Insider Happy birthday Ron Weasley
• Merged Text (DocEx)Say HappyBirthdayRonWeasley and share your creativity by submitting a drawing of Ron to celebrate PottermoreInsider Happy birthday Ron Weasley
Feature ExtractionDocument
API
• Get tweets with common API
• Save time for crawling
• Use general term statistics
• Statistical Index with Lucence
Local
• Local copy of the API corpus
• Preprocessing before indexing• Non-English tweets removal with
• TTGPKUICST1 [auto]• star clustering with tuned parameter 𝜎 = 0.7 and uniform tweet number 𝑁 = 200
• TTGPKUICST2 [auto]• hierarchical clustering method with distance threshold 𝛽 = 0.3 and score threshold 𝛼 = 4.5
• TTGPKUICST3 [manual]• hierarchical clustering method with distance threshold 𝛽 = 0.3 and manually selected 𝑁
• TTGPKUICST4 [manual]• star clustering with tuned parameter 𝜎 = 0.7 and manually selected 𝑁
21
Run Recall RecallW Precision F1 F1W
TTGPKUICST1 0.5221 0.7016 0.2682 0.3544 0.3881
TTGPKUICST2 0.3698 0.5840 0.4571 0.4088 0.5128
TTGPKUICST3 0.4849 0.6583 0.3635 0.4156 0.4684
TTGPKUICST4 0.5174 0.6615 0.3664 0.4290 0.4716
Reference
1. Y. Duan, L. Jiang, T. Qin, M. Zhou and H.-Y. Shum. An empirical study on learning to rank of tweets. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING ’10, pages 295–303. Association for Computational Linguistics, 2010.
2. Miyanishi, T., Okamura, N., Liu, X., Seki, K. and Uehara, K. Trec 2011 Microblog Track Experiments at Kobe University. In: Proceeding of the Twentieth Text REtrieval Conference, 2011
3. Z Han, X Li, M Yang and H Qi, S Li. Feature Analysis in Microblog Retrieval Based on Learning to Rank. atural Language Processing and Chinese Computing, 2013.
4. R Qiang, F Liang and J Yang. Exploiting Ranking Factorization Machines for Microblog Retrieval.
5. X Wang and C Zhai. Learn from web search logs to organize search results. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007.
6. Han J and Kamber M. Data Mining, Southeast Asia Edition: Concepts and Techniques[M]. Morgan kaufmann, 2006.
7. F Liang, R Qiang and J Yang. Exploiting real-time information retrieval in the microblogosphere. JCDL 2012.
22
Feature Extraction for Effective Microblog Search and Adaptive Clustering Algorithms for TTG
PKUICST at TREC 2014 Microblog TrackChao Lv Feifan Fan Runwei Qiang Yue Fei Jianwu Yang