Detecting, Modeling, & Predicting User Temporal Intention in Social Media Hany SalahEldeen & Michael Nelson Doctoral Consortium Hany M. SalahEldeen TPDL ‘12 Doctoral Consortium Old Dominion University Department of Computer Science Advisor: Dr. Michael L. Nelson
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Detecting, Modeling, & Predicting User Temporal Intention
in Social Media
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Hany M. SalahEldeen
TPDL ‘12 Doctoral Consortium
Old Dominion UniversityDepartment of Computer Science
Advisor: Dr. Michael L. Nelson
Let’s breakdown the title first…
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Detecting, Modeling, & Predicting User Temporal Intention
in Social Media
Let’s breakdown the title first…
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Detecting, Modeling, & Predicting User Temporal Intention
in Social Media
Let’s breakdown the title first…
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Detecting, Modeling, & Predicting User Temporal Intention
in Social Media
Scenario 1:Jenny reading Jeff’s tweets
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Michael Jackson Dies
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Snapshot on: June 25th 2009http://web.archive.org/web/20090625232522/http://www.cnn.com/
Create Intention-based datasetExtract Intention FeaturesTrain a Parametric Model to predict intentionEvaluate, test, cross-validate the modelCreate a mockup applicationExtend the model to induce preservationFinish Writing the Dissertation
Current State
Dissertation Plan
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Create Intention-based datasetExtract Intention FeaturesTrain a Parametric Model to predict intentionEvaluate, test, cross-validate the modelCreate a mockup applicationExtend the model to induce preservationFinish Writing the Dissertation
Dissertation Plan
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Estimating Web Archiving Coverage• Goal: Estimate how much of the public web is present in the public archives and how
many copies are available?• Action:
– Getting 4 different datasets from 4 different sources:• Search Engines Indices• Bit.ly• DMOZ• Delicious.
• Results: *
• Publications: – How much of the web is archived? JCDL '11
* Table Courtesy of Ahmed AlSum JCDL 2011
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Prototype ApplicationAnalyze Shared Resources Persistence and Coverage
Analyze Contextual Intention
Create Intention-based datasetExtract Intention FeaturesTrain a Parametric Model to predict intentionEvaluate, test, cross-validate the modelCreate a mockup applicationExtend the model to induce preservationFinish Writing the Dissertation
Dissertation Plan
Analyze Shortened URIs
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Shortened URI analysis• Goal: Have a better understanding of URI shortening and resolving,
understand the effect of time on this process and the correlation between the page’s features and characteristics, and its resolution.
• Action:– Fresh Bit.lys – Get hourly clicklogs, rate of change, social networking spread, and other
contextual information– Longitudinal study
• Evaluation:– Compare results with frequency of change analysis of Cho and Garcia-
Molina.– Compare results with Antoniades et al. WWW 2011.
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Create Intention-based datasetExtract Intention FeaturesTrain a Parametric Model to predict intentionEvaluate, test, cross-validate the modelCreate a mockup applicationExtend the model to induce preservationFinish Writing the Dissertation
Dissertation Plan
Analyze Shortened URIs
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Estimating Loss of Shared Resources in Social Media
• Goal: Estimate how much of the public web is present in the public archives and how many copies are available?
• Action:– Sampling from 6 public events– Events spanning 3 years– Existence in the current web– Existence in the public archives – Find relation with time
• Results:– After 1st year ~11% will be lost– After that we will continue on losing 0.02% daily
• Publications:– A year after the Egyptian revolution, 10% of the social media documentation is gone.
http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html– Losing my revolution: How Many Resources Shared on Social Media Have Been Lost?
TPDL '12
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Prototype ApplicationAnalyze Shared Resources Persistence and Coverage
User Intention Analysis
Create Intention-based datasetExtract Intention FeaturesTrain a Parametric Model to predict intentionEvaluate, test, cross-validate the modelCreate a mockup applicationExtend the model to induce preservationFinish Writing the Dissertation
Dissertation Plan
Analyze Shortened URIs
Hany SalahEldeen & Michael Nelson Doctoral Consortium
User Intention Analysis• Goal: Have a better understanding of User Intention and what factors affect
it. Also create a new testing and training set.
• Action:– Get a sample set of tweets selected at random– Extract the URIs– Get closest Memento– Download the snapshot & current version– Use Amazon’s Mechanical Turk in choosing the best version
• Evaluation:– Measure cross-rater agreement and confidence.
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Proposed Work
• Data Gathering• Feature Extraction• Modeling the intention engine• Evaluation• Application: Prediction and Preservation
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Possible Solution for Jenny
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Possible Solution for Jenny
The resource has changed since last time it was sharedDo you wish to see the version the author intended or the current version?
Current Version Intended Version
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Current Version
Archived Version
Proposed Framework
Feature Extraction Classifier
Example Features:- Tweet Content- Click Logs- Other Tweets- Shared Resource- Timemaps
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Extra Slides
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Archive Shortener Application
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Estimating Shared Resources Loss in Social Media
Hany SalahEldeen & Michael Nelson Doctoral Consortium
Estimating Shared Resources Loss in Social Media
Hany SalahEldeen & Michael Nelson Doctoral Consortium
My Publications• S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How much
of the web is archived? In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, JCDL '11, pages 133{136, 2011.
• H. SalahEldeen and M. L. Nelson. Losing my revolution: How much social media content has been lost? Accepted in TPDL 2012
• H. SalahEldeen and M. L. Nelson. Losing my revolution: A year after the Egyptian revolution, 10% of the social media documentation is gone. http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html.
Hany SalahEldeen & Michael Nelson Doctoral Consortium
References• D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, E. P. Markatos, and T. Karagiannis. we.b: the web of short
urls. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 715 {724, New York, NY, USA, 2011. ACM.
• A. Ashkan, C. L. Clarke, E. Agichtein, and Q. Guo. Classifying and characterizing query intent. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR '09, pages 578{586, Berlin, Heidelberg, 2009. Springer-Verlag.
• L. Azzopardi and M. de Rijke. Query intention acquisition: A case study on automatically inferring structured queries. In Proceedings DIR-2006, 2006.
• R. Baeza-Yates, L. Calderon-Benavides, and C. Gonzalez-Caro. The intention behind web queries. In F. Crestani, P. Ferragina, and M. Sanderson, editors, String Processing and Information Retrieval, volume 4209 of Lecture Notes in Computer Science, pages 98{109. Springer Berlin / Heidelberg, 2006. 10.1007/11880561 9.
• A. Benczur, I. Bro, K. Csalogany, and T. Sarlos. Web spam detection via commercial intent analysis. In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, AIRWeb '07, pages 89{92, New York, NY, USA, 2007. ACM.
• J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. CoRR, abs/1010.3003, 2010.• N. Dai, X. Qi, and B. D. Davison. Bridging link and query intent to enhance web search. In Proceedings of the 22nd ACM
conference on Hypertext and hypermedia, HT '11, pages 17{26, New York, NY, USA, 2011. ACM.• N. Dai, X. Qi, and B. D. Davison. Enhancing web search with entity intent. In Proceedings of the 20 th international conference
companion on World wide web, WWW '11, pages 29{30, New York, NY, USA, 2011. ACM.• K. Durant and M. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques
coupled with feature selection. In O. Nasraoui, M. Spiliopoulou, J. Srivastava, B. Mobasher, and B. Masand, editors, Advances in Web Mining and Web Usage Analysis, volume 4811 of Lecture Notes in Computer Science, pages 187{206. Springer Berlin / Heidelberg, 2007. 10.1007/978-3-540-77485-3 11.
Hany SalahEldeen & Michael Nelson Doctoral Consortium
References• Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher goals from interaction data. In Proceedings of the 33 rd
international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 130{137, New York, NY, USA, 2010. ACM.
• A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD '07, pages 56{65, New York, NY, USA, 2007. ACM.
• H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW '10, pages 591{600, New York, NY, USA, 2010. ACM.
• C.-H. L. Lee and A. Liu. Modeling the query intention with goals. In Proceedings of the 19th International Conference on Advanced Information Networking and Applications - Volume 2, AINA '05, pages 535{540, Washington, DC, USA, 2005. IEEE Computer Society.
• A. Loser, W. M. Barczynski, and F. Brauer. What's the intention behind your query? a few observations from a large developer community. In IRSW, 2008.
• F. McCown, N. Diawara, and M. L. Nelson. Factors aecting website reconstruction from the web infrastructure. In JCDL '07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 39{48, 2007.
• B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: inferring social link creation times in twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 517{526, New York, NY, USA, 2011. ACM.
• G. Mishne. Predicting movie sales from blogger sentiment. In In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW), 2006.
• M. L. Nelson and B. D. Allen. Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), 2002.• R. Sanderson, M. Phillips, and H. Van de Sompel. Analyzing the persistence of referenced web resources with memento. CoRR,
abs/1105.3459, 2011.• H. Van de Sompel, M. L. Nelson, R. Sanderson, L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the web. CoRR,
abs/0911.1112, 2009.• S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In Proceedings of the 20th international
conference on World wide web, WWW '11, pages 705{714, New York, NY, USA, 2011. ACM.
Hany SalahEldeen & Michael Nelson Doctoral Consortium