Top Banner
Introduction Peaks Growth measure Conclusions There is No Deadline - Time Evolution of Wikipedia Discussions Andreas Kaltenbrunner David Laniado Social Media Research Group, Barcelona Media, Barcelona, Spain August 28th, 2012 WikiSym ’12, Linz, Austria Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions

There is No Deadline - Time Evolution ofWikipedia Discussions

Andreas Kaltenbrunner David Laniado

Social Media Research Group,Barcelona Media,Barcelona, Spain

August 28th, 2012WikiSym ’12, Linz, Austria

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 2: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions

Outline

1 IntroductionMotivationDataset

2 PeaksPeak Detection AlgorithmPeak Statistics

3 Growth measureDiscussion ComplexityGrowth in Complexity

4 Conclusions

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 3: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Motivation Dataset

Outline

1 IntroductionMotivationDataset

2 PeaksPeak Detection AlgorithmPeak Statistics

3 Growth measureDiscussion ComplexityGrowth in Complexity

4 Conclusions

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 4: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Motivation Dataset

Motivation

Wiki means quick in HawaiianHow to study the speed with which an article changes?First choice would the number of edits per time unit.

But the larger an article becomes ...

more of its generative process happens in talk pages.⇒ Looking at the associated discussion is often the mosteffective way to understand the editing process.

Research questionsWhat is the relationship between discussion and edits?How frequent are spikes of activity?How fast do discussions grow, and for how long?Which are the fastest discussions?

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 5: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Motivation Dataset

Outline

1 IntroductionMotivationDataset

2 PeaksPeak Detection AlgorithmPeak Statistics

3 Growth measureDiscussion ComplexityGrowth in Complexity

4 Conclusions

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 6: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Motivation Dataset

Dataset Dump of March 12th, 2010

Co-evolution of comments and editsnum

ber

of edits

number of comments and edits per day

2003 2004 2005 2006 2007 2008 2009 20100

2.5

5

7.5

10

12.5

15x 10

4

edits

comments

0

1500

3000

4500

6000

7500

9000

num

ber

of com

ments

Jan−1 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−1 Dec−310.5

0.75

1

1.25

1.5x 10

5

2007

num

ber

of edits

zoom on the year 2007

3000

4500

6000

7500

9000

num

ber

of com

ments

6 comments per 100 editsAndreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 7: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Motivation Dataset

Example for a single articleActivity is less synchronised

200720082009

2010Peaks in the discussion and edit activity of the article "Barack Obama"

Jan−01 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−01 Dec−310

100

200

300

0

100

200

300

0

100

200

300

0

100

#comments per day#edits per day

0

100

200

300

0

100

200

300

0

100

200

300

0

100

#com

men

ts, #

edits

per

day

.

How to detect peaks?Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 8: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics

Outline

1 IntroductionMotivationDataset

2 PeaksPeak Detection AlgorithmPeak Statistics

3 Growth measureDiscussion ComplexityGrowth in Complexity

4 Conclusions

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 9: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics

How to detect peaks?Compare with median activity

200720082009

2010Peaks in the discussion and edit activity of the article "Barack Obama"

Jan−01 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−01 Dec−310

100

200

300

0

100

200

300

0

100

200

300

0

100

#comments per daymedian #comments during ± 2 weeks#edits per daymedian #edits during ± 2 weeks

0

100

200

300

0

100

200

300

0

100

200

300

0

100

#com

men

ts, #

edits

per

day

.

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 10: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics

Peak if activity > c ·max(m(t), nmin) adapted from [Lehmann 2012]

m(t) . . . 4 weeks median, nmin . . . activity minimum, c . . . peak factor

200720082009

2010Peaks in the discussion and edit activity of the article "Barack Obama"

Jan−01 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−01 Dec−310

100

200

300

0

100

200

300

0

100

200

300

0

100

#comments per daymedian #comments during ± 2 weekscomment peaks#edits per daymedian #edits during ± 2 weeksedit peaks

0

100

200

300

0

100

200

300

0

100

200

300

0

100

#com

men

ts, #

edits

per

day

.

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 11: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics

Edit and comment peaks do not always coincide ...and can be caused be endogenous or exogenous events

200720082009

2010Peaks in the discussion and edit activity of the article "Barack Obama"

Jan−01 Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec−01 Dec−310

100

200

300

0

100

200

300

0

100

200

300

0

100

#comments per daymedian #comments during ± 2 weekscomment peaks#edits per daymedian #edits during ± 2 weeksedit peaks

0

100

200

20−Jan−2009Pres. Inaguaration

0

50

100

150

200

250

300

35009 and 10−Mar−2009

Endogenous peak

0

100

200

09−Oct−2009Nobel Price Win

0

50

100

150

200 17−Mar−2008Endogenous peak

0

50

100

150

20004−Jun−2008

Nomination Win

0

50

100

150

200 29−Aug−2008 Official Nomiation

10−Oct−2008Endogenous peak

0

50

100

150

200

0

100

200

300

400 05−Nov−2008 Pres. Elections

0

50

100

150

200

15−Feb−2007Endogenous peak

0

50

100

150

200

11 and 12−Mar−2007Endogenous peak

0

100

200

300

0

100

200

300

0

100

200

300

0

100

#com

men

ts, #

edits

per

day

.

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 12: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics

Outline

1 IntroductionMotivationDataset

2 PeaksPeak Detection AlgorithmPeak Statistics

3 Growth measureDiscussion ComplexityGrowth in Complexity

4 Conclusions

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 13: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics

Peak Statistics with c = 5, nmin = 10, 2 580 discussion and 32 853 edit peaks

1 2 3 4 5 6 7 8 9 10 15 2010

0

101

102

103

104

number of peaks per article

num

ber

of art

icle

s

1198 articles with comment peaks

y~x−2.57

20681 articles with edit peaks

y~x−3.50

1 2 3 4 5 6 7 8 910

0

101

102

103

104

105

peak length (days)

num

ber

of peaks

2580 comment peaks

y~x−3.52

32853 edit peaks

y~x−3.97

100

101

102

103

100

101

102

103

time between consecutive peaks (in days)

nu

mb

er

of

tim

e in

terv

alls

comment peaks

y~x−1.41

edit peaks

y~x−1.36

In the entire datasetonly 12% of all comment peaks coincide with an edit peak27% when allowing one day of difference33.8% when allowing two.

Peaks in the discussion activity do not have to lead to peaks inthe editing activity as well.

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 14: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Peak Detection Algorithm Peak Statistics

Top 10 articles with most comment and edit peaksTitle #comment-peaks #edit-peaks

Intelligent design 15 2September 11 attacks 15 3Race and intelligence 14 5British Isles 11 0Main page 11 0Anarchism 10 12Catholic church 10 0Canada 10 0Transnistria 9 3New Anti-Semitism 9 0

Title #edit-peaks #com.-peaks

Uxbridge, Massachusetts 19 0Voodoo (D’Angelo album) 17 0List of World Wrestling Entertainment employees 16 3Super Smash Bros. Brawl 16 2Michael Jackson 16 1The Biggest Loser: Couples 2 16 0Roger Federer 15 0Rafael Nadal 15 0List of Barney & Friends episodes and videos 15 0Total Drama Action 15 0

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 15: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity

Outline

1 IntroductionMotivationDataset

2 PeaksPeak Detection AlgorithmPeak Statistics

3 Growth measureDiscussion ComplexityGrowth in Complexity

4 Conclusions

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 16: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity

How to measure the complexity of a Discussion?Discussion tree for article “Presidency of Barack Obama”

red→ root (the article)blue→ structural nodesgreen→ anonymouscommentsgrey→ registeredcomments

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 17: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity

Using the h-index of a discussion introduced in [Gómez 2008]

The h-index ...is a balanced depthmeasure.is the maximal numberh such that there are atleast h comments atlevel (depth) h, but noth + 1 comments atlevel h + 1.In other words thereare h sub-threads ofdepth at least h.

Example

h-index=3

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 18: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity

Outline

1 IntroductionMotivationDataset

2 PeaksPeak Detection AlgorithmPeak Statistics

3 Growth measureDiscussion ComplexityGrowth in Complexity

4 Conclusions

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 19: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity

Example for growth of discussions

Jan−2002 Jan−2003 Jan−2004 Jan−2005 Jan−2006 Jan−2007 Jan−2008 Jan−2009 Jan−201010

0

101

102

103

num

ber

of com

ments

George W. Bushsmoothed trend Bush

Barack Obama

smoothed trend Obama

Bill Clintonsmoothed trend Clinton

Jan−2002 Jan−2003 Jan−2004 Jan−2005 Jan−2006 Jan−2007 Jan−2008 Jan−2009 Jan−201010

0

101

102

103

104

105

num

ber

of com

ments

George W. Bush

Barack Obama

Bill Clinton

Can we use the h-index to measure this growth?

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 20: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity

Example for growth rate ∆h

Jan−2003 Jan−2004 Jan−2005 Jan−2006 Jan−2007 Jan−2008 Jan−20090

2

4

6

8

10

12

14

h−in

dex

George W. BushBarack ObamaBill Clinton George W. Bush

∆h =70.7 daysBarack Obama∆h =90.2 daysBill Clinton∆h =331.9 days

We define the growth rate ∆h asthe average time a discussion increases its h-index by one

∆h =th − t1h− 1

related to the inverse of the m-index proposed in [Hirsch 2005]

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 21: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity

Distribution of growth rates ∆hof all discussions with more than 1000 comments

1 10 100 10000

10

20

30

40

50

60

days

# di

scus

sion

s

∆h

Different growth rates

We find several orders of magnitude of different rates ofincrease in complexity of the discussions.

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 22: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions Discussion Complexity Growth in Complexity

The 15 fastest and slowest discussions (∆h and duration in days)

Title ∆h start date end date duration final h-index

Virginia Tech massacre 0.5 15-Apr-2007 20-Apr-2007 5 92009 flu pandemic 0.9 25-Apr-2009 30-Apr-2009 5 7Bronze Soldier of Tallinn 0.9 26-Apr-2007 02-May-2007 6 72009 Honduran constitutional crisis 1.0 27-Jun-2009 05-Jul-2009 8 8Seung-Hui Cho 1.0 16-Apr-2007 24-Apr-2007 8 82008 Mumbai attacks 1.0 26-Nov-2008 01-Dec-2008 5 6Israeli-occupied territories 1.2 22-Sep-2005 03-Oct-2005 11 10International status of Abkhazia and South 1.3 25-Aug-2008 04-Sep-2008 10 8Air France Flight 447 1.4 01-Jun-2009 08-Jun-2009 7 67 July 2005 London bombings 1.7 10-Jul-2005 15-Jul-2005 5 5State terrorism and the United States 1.7 15-Feb-2008 06-Mar-2008 20 13July 2009 Ürümqi riots 1.9 06-Jul-2009 21-Jul-2009 15 9Henry Louis Gates arrest controversy 2.0 24-Jul-2009 09-Aug-2009 16 9Teach the Controversy 2.6 11-Apr-2005 29-Apr-2005 18 8

Shakespeare authorship question 485.6 02-Jun-2003 24-Jan-2010 2428 7Karl Marx 487.6 19-Sep-2004 21-Jan-2010 1950 6Led Zeppelin 511.9 31-Jan-2003 03-Feb-2010 2560 6Vampire 517.1 19-Nov-2002 18-Jul-2008 2068 6World War II casualties 523.0 13-Sep-2004 29-Dec-2008 1568 4War on Terrorism 533.1 07-Oct-2005 22-Feb-2010 1599 6Fathers’ rights movement 546.0 07-Mar-2004 01-Sep-2008 1639 4Instant-runoff voting 546.3 09-Jul-2003 03-Jan-2008 1639 5Scientific method 553.5 15-Jun-2003 08-Jul-2009 2215 6France 566.5 13-Nov-2003 26-Jan-2010 2266 6Harry Potter 589.9 27-Nov-2002 02-Oct-2007 1770 6Anna Anderson 604.5 17-Mar-2004 09-Jul-2007 1209 3New York City 617.3 09-Dec-2003 03-Jan-2009 1852 5Pi 627.3 07-Dec-2002 20-Oct-2009 2509 6Christopher Columbus 1159.0 24-Oct-2003 27-Feb-2010 2318 5

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 23: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions

Conclusions and future workConclusions

Discussion and edit peaks occur mostly independently ofeach other.Both endogenous (Wikipedia internal) and exogenous(offline world) events can be the cause of such peaks.We have introduced a simple growth measure.Some discussions need only a few days to evolve, whilethe slowest go on over years.

Future workUse metrics for early detection of controversies.Apply metrics on sub-threads to detect hot spots.Assess discussion maturity.

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 24: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions

Questions?

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions

Page 25: There is No Deadline - Time Evolution of Wikipedia Discussions

Introduction Peaks Growth measure Conclusions

Bibliography I

Vicenç Gómez, Andreas Kaltenbrunner & Vicente López.Statistical analysis of the social network and discussion threads in Slashdot.In WWW ’08: Proceeding of the 17th international conference on World Wide Web, pages 645–654, NewYork, NY, USA, 2008. ACM.

J. E. Hirsch.An index to quantify an individual’s scientific research output.PNAS, vol. 102, no. 46, pages 16569–16572, 2005.

J. Lehmann, B. Gonçalves, J.J. Ramasco & C. Cattuto.Dynamical Classes of Collective Attention in Twitter.In Proc. of WWW, 2012.

Andreas Kaltenbrunner & David Laniado Time Evolution of Wikipedia Discussions