Introduction Peaks Growth measure Conclusions There is No Deadline - Time Evolution of Wikipedia Discussions Andreas Kaltenbrunner David Laniado Social Media Research Group, Barcelona Media, Barcelona, Spain August 28th, 2012 WikiSym ’12, Linz, Austria Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
25
Embed
There is No Deadline - Time Evolution of Wikipedia Discussions
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Introduction Peaks Growth measure Conclusions
There is No Deadline - Time Evolution ofWikipedia Discussions
Andreas Kaltenbrunner David Laniado
Social Media Research Group,Barcelona Media,Barcelona, Spain
August 28th, 2012WikiSym ’12, Linz, Austria
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Motivation Dataset
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Motivation Dataset
Motivation
Wiki means quick in HawaiianHow to study the speed with which an article changes?First choice would the number of edits per time unit.
But the larger an article becomes ...
more of its generative process happens in talk pages.⇒ Looking at the associated discussion is often the mosteffective way to understand the editing process.
Research questionsWhat is the relationship between discussion and edits?How frequent are spikes of activity?How fast do discussions grow, and for how long?Which are the fastest discussions?
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Motivation Dataset
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Motivation Dataset
Dataset Dump of March 12th, 2010
Co-evolution of comments and editsnum
ber
of edits
number of comments and edits per day
2003 2004 2005 2006 2007 2008 2009 20100
2.5
5
7.5
10
12.5
15x 10
4
edits
comments
0
1500
3000
4500
6000
7500
9000
num
ber
of com
ments
Jan−1 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−1 Dec−310.5
0.75
1
1.25
1.5x 10
5
2007
num
ber
of edits
zoom on the year 2007
3000
4500
6000
7500
9000
num
ber
of com
ments
6 comments per 100 editsAndreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Motivation Dataset
Example for a single articleActivity is less synchronised
200720082009
2010Peaks in the discussion and edit activity of the article "Barack Obama"
Jan−01 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−01 Dec−310
100
200
300
0
100
200
300
0
100
200
300
0
100
#comments per day#edits per day
0
100
200
300
0
100
200
300
0
100
200
300
0
100
#com
men
ts, #
edits
per
day
.
How to detect peaks?Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics
How to detect peaks?Compare with median activity
200720082009
2010Peaks in the discussion and edit activity of the article "Barack Obama"
Jan−01 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−01 Dec−310
100
200
300
0
100
200
300
0
100
200
300
0
100
#comments per daymedian #comments during ± 2 weeks#edits per daymedian #edits during ± 2 weeks
0
100
200
300
0
100
200
300
0
100
200
300
0
100
#com
men
ts, #
edits
per
day
.
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics
Peak if activity > c ·max(m(t), nmin) adapted from [Lehmann 2012]
Uxbridge, Massachusetts 19 0Voodoo (D’Angelo album) 17 0List of World Wrestling Entertainment employees 16 3Super Smash Bros. Brawl 16 2Michael Jackson 16 1The Biggest Loser: Couples 2 16 0Roger Federer 15 0Rafael Nadal 15 0List of Barney & Friends episodes and videos 15 0Total Drama Action 15 0
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
How to measure the complexity of a Discussion?Discussion tree for article “Presidency of Barack Obama”
red→ root (the article)blue→ structural nodesgreen→ anonymouscommentsgrey→ registeredcomments
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
Using the h-index of a discussion introduced in [Gómez 2008]
The h-index ...is a balanced depthmeasure.is the maximal numberh such that there are atleast h comments atlevel (depth) h, but noth + 1 comments atlevel h + 1.In other words thereare h sub-threads ofdepth at least h.
Example
h-index=3
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
Outline
1 IntroductionMotivationDataset
2 PeaksPeak Detection AlgorithmPeak Statistics
3 Growth measureDiscussion ComplexityGrowth in Complexity
4 Conclusions
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions
Conclusions and future workConclusions
Discussion and edit peaks occur mostly independently ofeach other.Both endogenous (Wikipedia internal) and exogenous(offline world) events can be the cause of such peaks.We have introduced a simple growth measure.Some discussions need only a few days to evolve, whilethe slowest go on over years.
Future workUse metrics for early detection of controversies.Apply metrics on sub-threads to detect hot spots.Assess discussion maturity.
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions
Questions?
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
Introduction Peaks Growth measure Conclusions
Bibliography I
Vicenç Gómez, Andreas Kaltenbrunner & Vicente López.Statistical analysis of the social network and discussion threads in Slashdot.In WWW ’08: Proceeding of the 17th international conference on World Wide Web, pages 645–654, NewYork, NY, USA, 2008. ACM.
J. E. Hirsch.An index to quantify an individual’s scientific research output.PNAS, vol. 102, no. 46, pages 16569–16572, 2005.
J. Lehmann, B. Gonçalves, J.J. Ramasco & C. Cattuto.Dynamical Classes of Collective Attention in Twitter.In Proc. of WWW, 2012.
Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions