1 School of Information University of Michigan Information diffusion in online communities Lada Adamic ICOS Sept. 29, 2006
Jan 03, 2016
1
School of InformationUniversity of Michigan
Information diffusion in online communities
Lada Adamic
ICOS Sept. 29, 2006
2
Talk outline
discussions in the political blogosphere
online person-to-person product recommendations
3
joint work with Natalie Glance @ Nielsen/Buzzmetrics
The political blogosphere and the 2004 election:Divided they blog
4
Political blogs gaining in importance Pew Internet & American Life Project Report, January
2005, reports: 63 million U.S. citizens use the Internet to stay informed about
politics (mid-2004, Pew Internet Study) 9% of Internet users read political blogs preceding the 2004 U.S.
Presidential Election
2004 Presidential Campaign Firsts Candidate blogs: e.g. Dean’s blogforamerica.com Successful grassroots campaign conducted via websites & blogs Bloggers credentialed as journalists & invited to nominating
conventions
5
Related research on political blogs 10 most popular political blogs account for half the blogs
read by surveyed journalists (Drezner and Farrell 2004) The most popular blogs also receive the majority of
citation links (Shirky 2003). Citation link structure reveals topical subcommunities:
Catholicism, homeschooling, A-list bloggers (Herring et. al. 2005)
Comparison of network neighborhoods of Atrios and Instapundit: no overlap in linking behavior (Welsch 2005)
Research question: Are we witnessing cyberbalkanization of the Internet?
6
Collected self-identified liberal and conservative blogs from online directories (eTalkingHead, BlogCatalog, CampaignLine, Blogorama)
Crawled home page of each blog in February 2005: found 30 more well-cited political blogs (manually categorized) biases toward sidebar/blogroll links
Did not include libertarian, independent or moderate blogs (fewer in number and lesser in popularity)
Identified: 676 liberal and 659 conservative blogs
Calling all political blogs
7
The larger political blogosphere
Results 91% of links point to blog of same persuasion Conservative blogs show greater tendency to link
82% of conservative blogs linked to at least once; 84% link to at least one other blog
67% of liberal blogs are linked to at least once; 74% link to at least one other blog
Average # of links per blog is similar: 13.6 for liberal; 15.1 for conservative
Higher proportion of liberal blogs that are not linked to at all
8
Indegree distributions for political blogs
9
Different rankings produce similar A-lists
10
Top 20 liberal blogs
11
Top 20 conservative blogs
12
Harvested posts for top 20 lists from BlogPulse BlogPulse stores individual posts: date, permalink, and content Date range: late August 2004 –> mid-November 2004 Collected: 12,470 liberal posts; 10,414 conservative posts
Identifying citation links (weblog post -> blog OR post) For each post, extract all links (hrefs) Exclude self-links Blogroll/sidebar links not included 1511 L-L citations; 2110 R-R citations; 247 L-R; 312 R-L
Result: Conservatives had 16% fewer posts but cited each other 40% more often
Methodology for detailed study of A-list blogs
13
1 23
4 567
8
910
111213
1415
16
1718
19
20
21
22 2324
2526
27
2829 30
3132
3334 35 36
37 38 39
40
1 DigbysBlog2 JamesWalcott3 Pandagon4 blog.johnkerry.com5 OliverWillis6 AmericaBlog7 CrookedTimber8 DailyKos9 AmericanProspect10Eschaton11Wonkette12TalkLeft13PoliticalWire14TalkingPointsMemo15Matthew Yglesias16WashingtonMonthly17MyDD18JuanCole19Left Coaster20BradfordDeLong
21 JawaReport22VokaPundit23Roger LSimon24TimBlair25Andrew Sullivan26 Instapundit27BlogsforBush28 LittleGreenFootballs29BelmontClub30Captain’sQuarters31Powerline32 HughHewitt33 INDCJournal34RealClearPolitics35Winds ofChange36Allahpundit37MichelleMalkin38WizBang39Dean’sWorld40Volokh(C)
(B)
(A) A) all citations between A-list blogs in 2 months preceding the 2004 election
B) citations between A-list blogs with at least 5 citations in both directions
C) edges further limited to those exceeding 25 combined citations
Citations between blogs in their posts(Aug 29th – Nov 15th, 2004)
only 15% of the citations bridge communities
14
1 23
4 567
910 11
1213
1415
16
1718
19
20
21
22 2324
2526
27
2829 30
3132
3334 35 36
37 38 39
40
1 23
4 567
89
10 111213
1415
16
1718
19
20
21
22 2324
2526
27
2829 30
3132
3334 35 36
37 38 39
40
1 23
4 567
89
10 111213
1415
16
1718
19
20
21
22 2324
2526
27
2829 30
3132
3334 35 36
37 38 39
40
21 JawaReport22 Vodka Pundit23 Roger L Simon24 Tim Blair25 Andrew Sullivan26 Instapundit27 Blogs for Bush28 LittleGreenFootballs29 Belmont Club30 Captain’s Quarters31 Powerline32 Hugh Hewitt33 INDC journal34 Real Clear Politics35 Winds of Change36 Allahpundit37 Michelle Malkin38 Wizbang39 Dean’s World40 Volokh
1 Digby’s Blog2 James Walcott3 Pandagon4 blog.johnkerry.com5 Oliver Willis6 America Blog7 Crooked Timber8 Daily Kos9 American Prospect10 Eschaton11 Wonkette12 Talk Left13 Political Wire14 Talking Points Memo15 Matthew Yglesias16 Washington Monthly17 MyDD18 Juan Cole19 Left Coaster20 Bradford DeLong
15
Notable examples of blogs breaking a story
1. Swiftvets.com anti-Kerry video Bloggers linked to this in late July, keeping accusations alive Kerry responded in late August, bringing mainstream media
coverage
2. CBS memos alleging preferential treatment of Pres. Bush during the Vietnam War Powerline broke the story on Sep. 9th, launching flurry of
discussion Dan Rather apologized later in the month
3. “Was Bush Wired?” Salon.com asked the question first on Oct. 8th, echoed by
Wonkette & PoliticalWire.com MSM follows-up the next day
16
Discussion of “forged documents”
Liberals and conservatives differ in the topics they discuss
0
5
10
15
20
25
30
35
8/29
/200
4
9/5/
2004
9/12
/200
4
9/19
/200
4
9/26
/200
4
10/3
/200
4
10/1
0/20
04
10/1
7/20
04
10/2
4/20
04
10/3
1/20
04
11/7
/200
4
Date
# w
eblo
g p
ost
s
Right
Left
17
Political blogs as echo chambers
Pairwise comparison of URLs and phrases posted by each blog
vA = wU1 wU2 … wUN
tf*idf weight~ (number of times blog mentions URL)*log[(total number of blogs monitored by blogpulse)/(number of those blogs citing the URL)]
Similarity of two blogs is given by the cosine of their vectors
cos(A,B) = vA.vB/(||vA||*||vB||)
Similarity in URLs between blogs of the same persuasion was higher (0.08 for liberal blogs and 0.09 for conservative ones), than between mixed pairs (0.03)
Same trend for phrases. We can even invert the analysis, and see what phrases are similar…
18
Network of phrases found on the same blogs
19
Political figures being discussed
59% of the mentions of Kerry are by right leaning blogs53% of the mentions of Bush are by left leaning blogs
20
Mainstream media bias(links from 1,400 blog set)
21
Mainstream media cited about once every other post from the A-list bloggers
(6,762 times from the left, 6,364 from the right)
22
Insights from the political blogosphere
Liberal and conservative blogs are balanced in numbers and tendto link primarily to their own communities
But is 10% cross-linking really too little? Or is it a sign of significant discussion?
Conservative blogs are more likely to include links to other blogson their pages, and their A-list blogs reference one another more frequently
Liberal and conservative blogs tend to discuss different things, but oneis not more ‘coherent’ than the other
23
Trying to bridge the divideOpposition to the bankruptcy bill (March 2005)
conservative blog post
liberal blog post
uncategorized blog post
news article
government website
link between posts/pages
posts/pages belonging tosame blog/site
but, bill was passed nevertheless: Senate 74 - 25 , House 302 - 126
24
School of InformationUniversity of Michigan
The dynamics of viral marketing
Jure Leskovec, Carnegie Mellon University
Lada Adamic, University of Michigan
Bernardo Huberman, HP Labs
25
Using online networks for viral marketing
Burger King’s subservient chicken
26
Outline prior work on viral marketing & information diffusion incentivised viral marketing program cascades and stars network effects product and social network characteristics
The dynamics of Viral Marketing
27
Information diffusion
Studies of innovation adoption hybrid corn (Ryan and Gross, 1943) prescription drugs (Coleman et al. 1957)
Models (very many) Rogers, ‘Diffusion of Innovations’ Watts, Information cascades (2003) Kempe, Kleinberg, Tardos, Maximizing the spread of
Influence, (2005)
28
Motivation for viral marketing
viral marketing successfully utilizes social networks for adoption of some services hotmail gains 18 million users in 12 months,
spending only $50,000 on traditional advertising gmail rapidly gains users although referrals are the
only way to sign up
customers becoming less susceptible to mass marketing
mass marketing impractical for unprecedented variety of products online
29
The web savvy consumer and personalized recommendations
> 50% of people do research online before purchasing electronics
personalized recommendations based on prior purchase patterns and ratings Amazon, “people who bought x also bought y” MovieLens, “based on ratings of users like you…”
Is there still room for viral marketing?
30
Is there still room for viral marketing next to personalized recommendations?
We are more influenced by our friends than strangers
68% of consumers consult friends and family before purchasing home electronics (Burke 2003)
31
Incentivised viral marketing(our problem setting)
Senders and followers of recommendations receive discounts on products
10% credit 10% off
Recommendations are made to any number of people at the time of purchase
Only the recipient who buys first gets a discount
32
Product recommendation
network
purchase following a recommendation
customer recommending a product
customer not buying a recommended product
33
the data
large anonymous online retailer (June 2001 to May 2003)
15,646,121 recommendations 3,943,084 distinct customers 548,523 products recommended Products belonging to 4 product groups:
books DVDs music VHS
34
summary statistics by product group
products customers recommenda-tions
edges buy + get
discount
buy + no discount
Book 103,161 2,863,977 5,741,611 2,097,809 65,344 17,769
DVD 19,829 805,285 8,180,393 962,341 17,232 58,189
Music 393,598 794,148 1,443,847 585,738 7,837 2,739
Video 26,131 239,583 280,270 160,683 909 467
Full 542,719 3,943,084 15,646,121 3,153,676 91,322 79,164
high
low
peoplerecommendations
35
viral marketing programnot spreading virally
94% of users make first recommendation without having received one previously
size of giant connected component increases from 1% to 2.5% of the network (100,420 users) – small!
some sub-communities are better connected 24% out of 18,000 users for westerns on DVD 26% of 25,000 for classics on DVD 19% of 47,000 for anime (Japanese animated film) on DVD
others are just as disconnected 3% of 180,000 home and gardening 2-7% for children’s and fitness DVDs
36
medical study guide recommendation network
973
938
37
measuring cascade sizes
delete late recommendations count how many people are in a single cascade exclude nodes that did not buy
steep drop-off
very few large cascades
books
100
101
10210
0
102
104
106
= 1.8e6 x-4.98
38
cascades for DVDs
shallow drop off – fat tail
a number of large cascades
DVD cascades can grow large possibly as a result of websites where people sign up to
exchange recommendations
100
101
102
10310
0
102
104
~ x-1.56
39
simple model of propagating recommendations(ignoring for the moment the specific mechanics of the recommendation
program of the retailer)
Each individual will have pt successful recommendations. We model pt as a random variable.
At time t+1, the total number of people in the cascade,
Nt+1 = Nt * (1+pt)
Subtracting from both sides, and dividing by Nt, we have
40
simple model of propagating recommendations(continued)
Summing over long time periods
The right hand side is a sum of random variables and hence normally distributed.
Integrating both sides, we find that N is lognormally distributed
if large resembles power-law
41
participation level by individual
100
105
100
102
104
106
108
Number of recommendations
Co
un
t= 3.4e6 x-2.30 R2=0.96
very high variance
The most active customer made 83,729 recommendations and purchased 4,416 different items!
42
Network effects
43
does receiving more recommendationsincrease the likelihood of buying?
BOOKS DVDs
2 4 6 8 100
0.01
0.02
0.03
0.04
0.05
0.06
Incoming Recommendations
Pro
ba
bili
ty o
f B
uyi
ng
10 20 30 40 50 600
0.02
0.04
0.06
0.08
Incoming Recommendations
Pro
ba
bili
ty o
f B
uyi
ng
44
does sending more recommendationsinfluence more purchases?
10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
Outgoing Recommendations
Nu
mb
er
of
Pu
rch
ase
s
20 40 60 80 100 120 1400
1
2
3
4
5
6
7
Outgoing Recommendations
Nu
mb
er
of
Pu
rch
ase
s
BOOKS DVDs
45
the probability that the sender gets a credit with increasing numbers of recommendations
consider whether sender has at least one successful recommendation
controls for sender getting credit for purchase that resulted from others recommending the same product to the same person
10 20 30 40 50 60 70 800
0.02
0.04
0.06
0.08
0.1
0.12
Outgoing Recommendations
Pro
ba
bili
ty o
f C
red
it probability of receiving a credit levels off for DVDs
46
Multiple recommendations between two individuals weaken the impact of the bond on purchases
5 10 15 20 25 30 35 404
6
8
10
12x 10
-3
Exchanged recommendations
Pro
ba
bili
ty o
f b
uyi
ng
5 10 15 20 25 30 35 400.02
0.03
0.04
0.05
0.06
0.07
Exchanged recommendations
Pro
ba
bili
ty o
f b
uyi
ng
BOOKS DVDs
47
product and social network characteristics influencing
recommendation effectiveness
48
recommendation success by book category
consider successful recommendations in terms of av. # senders of recommendations per book category av. # of recommendations accepted
books overall have a 3% success rate (2% with discount, 1% without)
lower than average success rate (significant at p=0.01 level) fiction
romance (1.78), horror (1.81) teen (1.94), children’s books (2.06) comics (2.30), sci-fi (2.34), mystery and thrillers (2.40)
nonfiction sports (2.26) home & garden (2.26) travel (2.39)
higher than average success rate (statistically significant) professional & technical
medicine (5.68) professional & technical (4.54) engineering (4.10), science (3.90), computers & internet (3.61) law (3.66), business & investing (3.62)
49
professional and organized contexts
In general, professional & technical book recommendations are more often accepted(probably in part due to book cost)
Some organized contexts other than professional also have higher success rate, e.g. religion overall success rate 3.13% Christian themed books
Christian living and theology (4.7%) Bibles (4.8%)
not-as-organized religion new age (2.5%) occult spirituality (2.2%)
Well organized hobbies books on orchids recommended successfully twice as often as
books on tomato growing
50
examples of communities
use a community finding algorithm to identify groups of people sending recommendations to one another
# nodes # senders topics _
735 74 books: American literature, poetry
710 179 sci-fi books, TV series DVDs, alternative rock music
667 181 music: dance, indie
653 121 discounted DVDs
541 112 books: art & photography, web development, graphical design, sci-fi
502 104 books: sci-fi and other
388 77 books: Christianity and Catholicism
309 81 books: business and investing, computers, Harry Potter
192 30 books: parenting, women’s health, pregnancy
163 48 books: comparative religion, Egypt’s history, new age, role playing games
51
52
telling the world vs. telling your friends
consider # reviewers per book # recommenders per book
what we observe (ratio of recommendations to reviews given in parentheses) tell the world but not your friends
literature & fiction (0.57) mystery & thrillers (0.36) horror (0.44)
tell the world and your friends biographies (0.90) children’s books (1.12) religion (1.73) history (1.27) nonfiction (1.89)
tell just your friends about personal pursuits health, mind & body (2.39) home & garden (3.48) arts & photography (3.85) cooking, food & wine (3.49)
tell your colleagues about professional interests professional & technical (3.22) computers & internet (3.10) medicine (4.19) engineering (3.85) law (4.25)
53
anime DVDs
47,000 customers responsible for the 2.5 out of 16 million recommendations in the system
29% success rate per recommender of an anime DVD
giant component covers 19% of the nodes
Overall, recommendations for DVDs are more likely to result in a purchase (7%), but the anime community stands out
54
regressing on product characteristics
Variable transformation Coefficient
const -0.940 ***
# recommendations ln(r) 0.426 ***
# senders ln(ns) -0.782 ***
# recipients ln(nr) -1.307 ***
product price ln(p) 0.128 ***
# reviews ln(v) -0.011 ***
avg. rating ln(t) -0.027 *
R2 0.74
significance at the 0.01 (***), 0.05 (**) and 0.1 (*) levels
small tightly knit communities purchasing expensive products
55
Marketing in the long tail
Chris Anderson, ‘The Long Tail’, Wired, Issue 12.10 - October 2004
56
The long tail of successful recommendations
20% of the products account for 50% of the purchases
67 % of the products have only a single successful recommendation (30% of all purchases)
57
Conclusions
Overall incentivized viral marketing contributes marginally to
total sales occasionally large cascades occur
Observations for future diffusion models purchase decision more complex than threshold or
simple infection influence saturates as the number of contacts expands links lose effectiveness if they are overused
Conditions for successful recommendations professional and organizational contexts discounts on expensive items small, tightly knit communities
58
Overall observations
Interests and social connections co-evolve What we are exposed to depends on the active topics
in our community Who we communicate with depends on our interests
59
Thanks!
http://www-personal.umich.edu/~ladamic