Word co-occurrences Some suggestions on where to look further Next meetings Big Data and Automated Content Analysis Week 7 – Monday »Word co-occurrances, Gephi — and some suggestions« Damian Trilling [email protected]@damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam 11 May 2015 Big Data and Automated Content Analysis Damian Trilling
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Word co-occurrences Some suggestions on where to look further Next meetings
Big Data and Automated Content AnalysisWeek 7 – Monday
»Word co-occurrances, Gephi— and some suggestions«
Word co-occurrences Some suggestions on where to look further Next meetings
Today
1 Integrating word counts and network analysis: Wordco-occurrences
The ideaA real-life example
2 Some suggestions on where to look furtherUseful packagesSome more tips
3 Next meetings, & final project
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
Integrating word counts and network analysis:Word co-occurrences
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
The idea
Simple word count
We already know this.1 from collections import Counter2 tekst="this is a test where many test words occur several times this is
because it is a test yes indeed it is"3 c=Counter(tekst.split())4 print "The top 5 are: "5 for woord,aantal in c.most_common(5):6 print (aantal,woord)
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
The idea
Simple word count
The output:1 The top 5 are:2 4 is3 3 test4 2 a5 2 this6 2 it
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
The idea
What if we could. . .
. . . count the frequency of combinations of words?
As in: Which words do typical occur together in the sametweet (or paragraph, or sentence, . . . )
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
The idea
What if we could. . .
. . . count the frequency of combinations of words?
As in: Which words do typical occur together in the sametweet (or paragraph, or sentence, . . . )
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
The idea
We can — with the combinations() function
1 >>> from itertools import combinations2 >>> words="Hoi this is a test test test a test it is".split()3 >>> print ([e for e in combinations(words,2)])4 [(’Hoi’, ’this’), (’Hoi’, ’is’), (’Hoi’, ’a’), (’Hoi’, ’test’), (’Hoi’,
Word co-occurrences Some suggestions on where to look further Next meetings
A real-life example
A real-life example
Trilling, D. (2014). Two different debates? Investigating therelationship between a political debate on TV and simultaneouscomments on Twitter. Social Science Computer Review, Advanceonline publication. doi: 10.1177/0894439314537886
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
A real-life example
Commenting the TV debate on Twitter
The debating politicians
• issues largely set by the interviewers• but candidates actively try to highlight the issues (⇒ agendasetting) and aspects of the issues (⇒ framing).
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
A real-life example
Commenting the TV debate on Twitter
The viewers
• Commenting television programs on social networks hasbecome a regular pattern of behavior (Courtois & d’Heer, 2012)
• User comments have shown to reflect the structure of thedebate (Shamma, Churchill, & Kennedy, 2010; Shamma, Kennedy, & Churchill, 2009)
• Topic and speaker effect more influential than, e.g., rhetoricalskills (Nagel, Maurer, & Reinemann, 2012; De Mooy & Maier, 2014)
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
A real-life example
Research Questions
To which extent are the statements politicians make during aTV debate reflected in online live discussions of the debate?
RQ1 Which topics are emphasized by the candidates?RQ2 Which topics are emphasized by the Twitter users?RQ3 With which topics are the two candidates associated
on Twitter?
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
A real-life example
Method
The data
• debate transcript• tweets containing#tvduell
• N = 120, 557 tweetsby N = 24, 796 users
• 22-9-2013,20.30-22.00
The analysis
• Series of self-written Pythonscripts:
1 preprocessing (stemming,stopword removal)
2 word counts3 word log likelihood (corpus
comparison)• Stata: regression analysis
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
A real-life example
Method
The data
• debate transcript• tweets containing#tvduell
• N = 120, 557 tweetsby N = 24, 796 users
• 22-9-2013,20.30-22.00
The analysis
• Series of self-written Pythonscripts:
1 preprocessing (stemming,stopword removal)
2 word counts3 word log likelihood (corpus
comparison)• Stata: regression analysis
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
A real-life example
Method
The data
• debate transcript• tweets containing#tvduell
• N = 120, 557 tweetsby N = 24, 796 users
• 22-9-2013,20.30-22.00
The analysis
• Series of self-written Pythonscripts:
1 preprocessing (stemming,stopword removal)
2 word counts3 word log likelihood (corpus
comparison)• Stata: regression analysis
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
Final project Next meetings
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
Final project
On 29–5, you have to hand in your final project
• Details and rules: ⇒ course manual• Similar to take-home exam• But: Much more advanced, and now, the result counts as well• And: Be creative! You can use code from class, but you needto extend it
• Start working on it!
Big Data and Automated Content Analysis Damian Trilling
Word co-occurrences Some suggestions on where to look further Next meetings
Next meeting
Wednesday, 13–5Lab session, focus on INDIVIDUAL PROJECTS! Prepare!(No common exercise)
Big Data and Automated Content Analysis Damian Trilling