Top Banner
1 School of Information University of Michigan Information diffusion in online communities Lada Adamic ICOS Sept. 29, 2006
59

Information diffusion in online communities

Jan 03, 2016

Download

Documents

roxanne-jovan

Information diffusion in online communities. Lada Adamic ICOS Sept. 29, 2006. Talk outline. discussions in the political blogosphere. online person-to-person product recommendations. The political blogosphere and the 2004 election: Divided they blog. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Information diffusion in online communities

1

School of InformationUniversity of Michigan

Information diffusion in online communities

Lada Adamic

ICOS Sept. 29, 2006

Page 2: Information diffusion in online communities

2

Talk outline

discussions in the political blogosphere

online person-to-person product recommendations

Page 3: Information diffusion in online communities

3

joint work with Natalie Glance @ Nielsen/Buzzmetrics

The political blogosphere and the 2004 election:Divided they blog

Page 4: Information diffusion in online communities

4

Political blogs gaining in importance Pew Internet & American Life Project Report, January

2005, reports: 63 million U.S. citizens use the Internet to stay informed about

politics (mid-2004, Pew Internet Study) 9% of Internet users read political blogs preceding the 2004 U.S.

Presidential Election

2004 Presidential Campaign Firsts Candidate blogs: e.g. Dean’s blogforamerica.com Successful grassroots campaign conducted via websites & blogs Bloggers credentialed as journalists & invited to nominating

conventions

Page 5: Information diffusion in online communities

5

Related research on political blogs 10 most popular political blogs account for half the blogs

read by surveyed journalists (Drezner and Farrell 2004) The most popular blogs also receive the majority of

citation links (Shirky 2003). Citation link structure reveals topical subcommunities:

Catholicism, homeschooling, A-list bloggers (Herring et. al. 2005)

Comparison of network neighborhoods of Atrios and Instapundit: no overlap in linking behavior (Welsch 2005)

Research question: Are we witnessing cyberbalkanization of the Internet?

Page 6: Information diffusion in online communities

6

Collected self-identified liberal and conservative blogs from online directories (eTalkingHead, BlogCatalog, CampaignLine, Blogorama)

Crawled home page of each blog in February 2005: found 30 more well-cited political blogs (manually categorized) biases toward sidebar/blogroll links

Did not include libertarian, independent or moderate blogs (fewer in number and lesser in popularity)

Identified: 676 liberal and 659 conservative blogs

Calling all political blogs

Page 7: Information diffusion in online communities

7

The larger political blogosphere

Results 91% of links point to blog of same persuasion Conservative blogs show greater tendency to link

82% of conservative blogs linked to at least once; 84% link to at least one other blog

67% of liberal blogs are linked to at least once; 74% link to at least one other blog

Average # of links per blog is similar: 13.6 for liberal; 15.1 for conservative

Higher proportion of liberal blogs that are not linked to at all

Page 8: Information diffusion in online communities

8

Indegree distributions for political blogs

Page 9: Information diffusion in online communities

9

Different rankings produce similar A-lists

Page 10: Information diffusion in online communities

10

Top 20 liberal blogs

Page 11: Information diffusion in online communities

11

Top 20 conservative blogs

Page 12: Information diffusion in online communities

12

Harvested posts for top 20 lists from BlogPulse BlogPulse stores individual posts: date, permalink, and content Date range: late August 2004 –> mid-November 2004 Collected: 12,470 liberal posts; 10,414 conservative posts

Identifying citation links (weblog post -> blog OR post) For each post, extract all links (hrefs) Exclude self-links Blogroll/sidebar links not included 1511 L-L citations; 2110 R-R citations; 247 L-R; 312 R-L

Result: Conservatives had 16% fewer posts but cited each other 40% more often

Methodology for detailed study of A-list blogs

Page 13: Information diffusion in online communities

13

1 23

4 567

8

910

111213

1415

16

1718

19

20

21

22 2324

2526

27

2829 30

3132

3334 35 36

37 38 39

40

1 DigbysBlog2 JamesWalcott3 Pandagon4 blog.johnkerry.com5 OliverWillis6 AmericaBlog7 CrookedTimber8 DailyKos9 AmericanProspect10Eschaton11Wonkette12TalkLeft13PoliticalWire14TalkingPointsMemo15Matthew Yglesias16WashingtonMonthly17MyDD18JuanCole19Left Coaster20BradfordDeLong

21 JawaReport22VokaPundit23Roger LSimon24TimBlair25Andrew Sullivan26 Instapundit27BlogsforBush28 LittleGreenFootballs29BelmontClub30Captain’sQuarters31Powerline32 HughHewitt33 INDCJournal34RealClearPolitics35Winds ofChange36Allahpundit37MichelleMalkin38WizBang39Dean’sWorld40Volokh(C)

(B)

(A) A) all citations between A-list blogs in 2 months preceding the 2004 election

B) citations between A-list blogs with at least 5 citations in both directions

C) edges further limited to those exceeding 25 combined citations

Citations between blogs in their posts(Aug 29th – Nov 15th, 2004)

only 15% of the citations bridge communities

Page 14: Information diffusion in online communities

14

1 23

4 567

910 11

1213

1415

16

1718

19

20

21

22 2324

2526

27

2829 30

3132

3334 35 36

37 38 39

40

1 23

4 567

89

10 111213

1415

16

1718

19

20

21

22 2324

2526

27

2829 30

3132

3334 35 36

37 38 39

40

1 23

4 567

89

10 111213

1415

16

1718

19

20

21

22 2324

2526

27

2829 30

3132

3334 35 36

37 38 39

40

21 JawaReport22 Vodka Pundit23 Roger L Simon24 Tim Blair25 Andrew Sullivan26 Instapundit27 Blogs for Bush28 LittleGreenFootballs29 Belmont Club30 Captain’s Quarters31 Powerline32 Hugh Hewitt33 INDC journal34 Real Clear Politics35 Winds of Change36 Allahpundit37 Michelle Malkin38 Wizbang39 Dean’s World40 Volokh

1 Digby’s Blog2 James Walcott3 Pandagon4 blog.johnkerry.com5 Oliver Willis6 America Blog7 Crooked Timber8 Daily Kos9 American Prospect10 Eschaton11 Wonkette12 Talk Left13 Political Wire14 Talking Points Memo15 Matthew Yglesias16 Washington Monthly17 MyDD18 Juan Cole19 Left Coaster20 Bradford DeLong

Page 15: Information diffusion in online communities

15

Notable examples of blogs breaking a story

1. Swiftvets.com anti-Kerry video Bloggers linked to this in late July, keeping accusations alive Kerry responded in late August, bringing mainstream media

coverage

2. CBS memos alleging preferential treatment of Pres. Bush during the Vietnam War Powerline broke the story on Sep. 9th, launching flurry of

discussion Dan Rather apologized later in the month

3. “Was Bush Wired?” Salon.com asked the question first on Oct. 8th, echoed by

Wonkette & PoliticalWire.com MSM follows-up the next day

Page 16: Information diffusion in online communities

16

Discussion of “forged documents”

Liberals and conservatives differ in the topics they discuss

0

5

10

15

20

25

30

35

8/29

/200

4

9/5/

2004

9/12

/200

4

9/19

/200

4

9/26

/200

4

10/3

/200

4

10/1

0/20

04

10/1

7/20

04

10/2

4/20

04

10/3

1/20

04

11/7

/200

4

Date

# w

eblo

g p

ost

s

Right

Left

Page 17: Information diffusion in online communities

17

Political blogs as echo chambers

Pairwise comparison of URLs and phrases posted by each blog

vA = wU1 wU2 … wUN

tf*idf weight~ (number of times blog mentions URL)*log[(total number of blogs monitored by blogpulse)/(number of those blogs citing the URL)]

Similarity of two blogs is given by the cosine of their vectors

cos(A,B) = vA.vB/(||vA||*||vB||)

Similarity in URLs between blogs of the same persuasion was higher (0.08 for liberal blogs and 0.09 for conservative ones), than between mixed pairs (0.03)

Same trend for phrases. We can even invert the analysis, and see what phrases are similar…

Page 18: Information diffusion in online communities

18

Network of phrases found on the same blogs

Page 19: Information diffusion in online communities

19

Political figures being discussed

59% of the mentions of Kerry are by right leaning blogs53% of the mentions of Bush are by left leaning blogs

Page 20: Information diffusion in online communities

20

Mainstream media bias(links from 1,400 blog set)

Page 21: Information diffusion in online communities

21

Mainstream media cited about once every other post from the A-list bloggers

(6,762 times from the left, 6,364 from the right)

Page 22: Information diffusion in online communities

22

Insights from the political blogosphere

Liberal and conservative blogs are balanced in numbers and tendto link primarily to their own communities

But is 10% cross-linking really too little? Or is it a sign of significant discussion?

Conservative blogs are more likely to include links to other blogson their pages, and their A-list blogs reference one another more frequently

Liberal and conservative blogs tend to discuss different things, but oneis not more ‘coherent’ than the other

Page 23: Information diffusion in online communities

23

Trying to bridge the divideOpposition to the bankruptcy bill (March 2005)

conservative blog post

liberal blog post

uncategorized blog post

news article

government website

link between posts/pages

posts/pages belonging tosame blog/site

but, bill was passed nevertheless: Senate 74 - 25 , House 302 - 126

Page 24: Information diffusion in online communities

24

School of InformationUniversity of Michigan

The dynamics of viral marketing

Jure Leskovec, Carnegie Mellon University

Lada Adamic, University of Michigan

Bernardo Huberman, HP Labs

Page 25: Information diffusion in online communities

25

Using online networks for viral marketing

Burger King’s subservient chicken

Page 26: Information diffusion in online communities

26

Outline prior work on viral marketing & information diffusion incentivised viral marketing program cascades and stars network effects product and social network characteristics

The dynamics of Viral Marketing

Page 27: Information diffusion in online communities

27

Information diffusion

Studies of innovation adoption hybrid corn (Ryan and Gross, 1943) prescription drugs (Coleman et al. 1957)

Models (very many) Rogers, ‘Diffusion of Innovations’ Watts, Information cascades (2003) Kempe, Kleinberg, Tardos, Maximizing the spread of

Influence, (2005)

Page 28: Information diffusion in online communities

28

Motivation for viral marketing

viral marketing successfully utilizes social networks for adoption of some services hotmail gains 18 million users in 12 months,

spending only $50,000 on traditional advertising gmail rapidly gains users although referrals are the

only way to sign up

customers becoming less susceptible to mass marketing

mass marketing impractical for unprecedented variety of products online

Page 29: Information diffusion in online communities

29

The web savvy consumer and personalized recommendations

> 50% of people do research online before purchasing electronics

personalized recommendations based on prior purchase patterns and ratings Amazon, “people who bought x also bought y” MovieLens, “based on ratings of users like you…”

Is there still room for viral marketing?

Page 30: Information diffusion in online communities

30

Is there still room for viral marketing next to personalized recommendations?

We are more influenced by our friends than strangers

68% of consumers consult friends and family before purchasing home electronics (Burke 2003)

Page 31: Information diffusion in online communities

31

Incentivised viral marketing(our problem setting)

Senders and followers of recommendations receive discounts on products

10% credit 10% off

Recommendations are made to any number of people at the time of purchase

Only the recipient who buys first gets a discount

Page 32: Information diffusion in online communities

32

Product recommendation

network

purchase following a recommendation

customer recommending a product

customer not buying a recommended product

Page 33: Information diffusion in online communities

33

the data

large anonymous online retailer (June 2001 to May 2003)

15,646,121 recommendations 3,943,084 distinct customers 548,523 products recommended Products belonging to 4 product groups:

books DVDs music VHS

Page 34: Information diffusion in online communities

34

summary statistics by product group

products customers recommenda-tions

edges buy + get

discount

buy + no discount

Book 103,161 2,863,977 5,741,611 2,097,809 65,344 17,769

DVD 19,829 805,285 8,180,393 962,341 17,232 58,189

Music 393,598 794,148 1,443,847 585,738 7,837 2,739

Video 26,131 239,583 280,270 160,683 909 467

Full 542,719 3,943,084 15,646,121 3,153,676 91,322 79,164

high

low

peoplerecommendations

Page 35: Information diffusion in online communities

35

viral marketing programnot spreading virally

94% of users make first recommendation without having received one previously

size of giant connected component increases from 1% to 2.5% of the network (100,420 users) – small!

some sub-communities are better connected 24% out of 18,000 users for westerns on DVD 26% of 25,000 for classics on DVD 19% of 47,000 for anime (Japanese animated film) on DVD

others are just as disconnected 3% of 180,000 home and gardening 2-7% for children’s and fitness DVDs

Page 36: Information diffusion in online communities

36

medical study guide recommendation network

973

938

Page 37: Information diffusion in online communities

37

measuring cascade sizes

delete late recommendations count how many people are in a single cascade exclude nodes that did not buy

steep drop-off

very few large cascades

books

100

101

10210

0

102

104

106

= 1.8e6 x-4.98

Page 38: Information diffusion in online communities

38

cascades for DVDs

shallow drop off – fat tail

a number of large cascades

DVD cascades can grow large possibly as a result of websites where people sign up to

exchange recommendations

100

101

102

10310

0

102

104

~ x-1.56

Page 39: Information diffusion in online communities

39

simple model of propagating recommendations(ignoring for the moment the specific mechanics of the recommendation

program of the retailer)

Each individual will have pt successful recommendations. We model pt as a random variable.

At time t+1, the total number of people in the cascade,

Nt+1 = Nt * (1+pt)

Subtracting from both sides, and dividing by Nt, we have

Page 40: Information diffusion in online communities

40

simple model of propagating recommendations(continued)

Summing over long time periods

The right hand side is a sum of random variables and hence normally distributed.

Integrating both sides, we find that N is lognormally distributed

if large resembles power-law

Page 41: Information diffusion in online communities

41

participation level by individual

100

105

100

102

104

106

108

Number of recommendations

Co

un

t= 3.4e6 x-2.30 R2=0.96

very high variance

The most active customer made 83,729 recommendations and purchased 4,416 different items!

Page 42: Information diffusion in online communities

42

Network effects

Page 43: Information diffusion in online communities

43

does receiving more recommendationsincrease the likelihood of buying?

BOOKS DVDs

2 4 6 8 100

0.01

0.02

0.03

0.04

0.05

0.06

Incoming Recommendations

Pro

ba

bili

ty o

f B

uyi

ng

10 20 30 40 50 600

0.02

0.04

0.06

0.08

Incoming Recommendations

Pro

ba

bili

ty o

f B

uyi

ng

Page 44: Information diffusion in online communities

44

does sending more recommendationsinfluence more purchases?

10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

Outgoing Recommendations

Nu

mb

er

of

Pu

rch

ase

s

20 40 60 80 100 120 1400

1

2

3

4

5

6

7

Outgoing Recommendations

Nu

mb

er

of

Pu

rch

ase

s

BOOKS DVDs

Page 45: Information diffusion in online communities

45

the probability that the sender gets a credit with increasing numbers of recommendations

consider whether sender has at least one successful recommendation

controls for sender getting credit for purchase that resulted from others recommending the same product to the same person

10 20 30 40 50 60 70 800

0.02

0.04

0.06

0.08

0.1

0.12

Outgoing Recommendations

Pro

ba

bili

ty o

f C

red

it probability of receiving a credit levels off for DVDs

Page 46: Information diffusion in online communities

46

Multiple recommendations between two individuals weaken the impact of the bond on purchases

5 10 15 20 25 30 35 404

6

8

10

12x 10

-3

Exchanged recommendations

Pro

ba

bili

ty o

f b

uyi

ng

5 10 15 20 25 30 35 400.02

0.03

0.04

0.05

0.06

0.07

Exchanged recommendations

Pro

ba

bili

ty o

f b

uyi

ng

BOOKS DVDs

Page 47: Information diffusion in online communities

47

product and social network characteristics influencing

recommendation effectiveness

Page 48: Information diffusion in online communities

48

recommendation success by book category

consider successful recommendations in terms of av. # senders of recommendations per book category av. # of recommendations accepted

books overall have a 3% success rate (2% with discount, 1% without)

lower than average success rate (significant at p=0.01 level) fiction

romance (1.78), horror (1.81) teen (1.94), children’s books (2.06) comics (2.30), sci-fi (2.34), mystery and thrillers (2.40)

nonfiction sports (2.26) home & garden (2.26) travel (2.39)

higher than average success rate (statistically significant) professional & technical

medicine (5.68) professional & technical (4.54) engineering (4.10), science (3.90), computers & internet (3.61) law (3.66), business & investing (3.62)

Page 49: Information diffusion in online communities

49

professional and organized contexts

In general, professional & technical book recommendations are more often accepted(probably in part due to book cost)

Some organized contexts other than professional also have higher success rate, e.g. religion overall success rate 3.13% Christian themed books

Christian living and theology (4.7%) Bibles (4.8%)

not-as-organized religion new age (2.5%) occult spirituality (2.2%)

Well organized hobbies books on orchids recommended successfully twice as often as

books on tomato growing

Page 50: Information diffusion in online communities

50

examples of communities

use a community finding algorithm to identify groups of people sending recommendations to one another

# nodes # senders topics _

735 74 books: American literature, poetry

710 179 sci-fi books, TV series DVDs, alternative rock music

667 181 music: dance, indie

653 121 discounted DVDs

541 112 books: art & photography, web development, graphical design, sci-fi

502 104 books: sci-fi and other

388 77 books: Christianity and Catholicism

309 81 books: business and investing, computers, Harry Potter

192 30 books: parenting, women’s health, pregnancy

163 48 books: comparative religion, Egypt’s history, new age, role playing games

Page 51: Information diffusion in online communities

51

Page 52: Information diffusion in online communities

52

telling the world vs. telling your friends

consider # reviewers per book # recommenders per book

what we observe (ratio of recommendations to reviews given in parentheses) tell the world but not your friends

literature & fiction (0.57) mystery & thrillers (0.36) horror (0.44)

tell the world and your friends biographies (0.90) children’s books (1.12) religion (1.73) history (1.27) nonfiction (1.89)

tell just your friends about personal pursuits health, mind & body (2.39) home & garden (3.48) arts & photography (3.85) cooking, food & wine (3.49)

tell your colleagues about professional interests professional & technical (3.22) computers & internet (3.10) medicine (4.19) engineering (3.85) law (4.25)

Page 53: Information diffusion in online communities

53

anime DVDs

47,000 customers responsible for the 2.5 out of 16 million recommendations in the system

29% success rate per recommender of an anime DVD

giant component covers 19% of the nodes

Overall, recommendations for DVDs are more likely to result in a purchase (7%), but the anime community stands out

Page 54: Information diffusion in online communities

54

regressing on product characteristics

Variable transformation Coefficient

const -0.940 ***

# recommendations ln(r) 0.426 ***

# senders ln(ns) -0.782 ***

# recipients ln(nr) -1.307 ***

product price ln(p) 0.128 ***

# reviews ln(v) -0.011 ***

avg. rating ln(t) -0.027 *

R2 0.74

significance at the 0.01 (***), 0.05 (**) and 0.1 (*) levels

small tightly knit communities purchasing expensive products

Page 55: Information diffusion in online communities

55

Marketing in the long tail

Chris Anderson, ‘The Long Tail’, Wired, Issue 12.10 - October 2004

Page 56: Information diffusion in online communities

56

The long tail of successful recommendations

20% of the products account for 50% of the purchases

67 % of the products have only a single successful recommendation (30% of all purchases)

Page 57: Information diffusion in online communities

57

Conclusions

Overall incentivized viral marketing contributes marginally to

total sales occasionally large cascades occur

Observations for future diffusion models purchase decision more complex than threshold or

simple infection influence saturates as the number of contacts expands links lose effectiveness if they are overused

Conditions for successful recommendations professional and organizational contexts discounts on expensive items small, tightly knit communities

Page 58: Information diffusion in online communities

58

Overall observations

Interests and social connections co-evolve What we are exposed to depends on the active topics

in our community Who we communicate with depends on our interests

Page 59: Information diffusion in online communities

59

Thanks!

http://www-personal.umich.edu/~ladamic