Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´ opez Outline Introduction Collusion and Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions Pagerank Increase under Different Collusion Topologies Ricardo Baeza-Yates, Carlos Castillo and Vicente L´ opez ICREA Professor / Dept. of Technology / C´ atedra Telef´ onica Universitat Pompeu Fabra – Barcelona, Spain May 10th, 2005
64
Embed
PageRank Increase under Different Collusion Topologies (AIRWEB 2005)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank Increaseunder Different Collusion Topologies
Ricardo Baeza-Yates, Carlos Castillo and Vicente Lopez
ICREA Professor / Dept. of Technology / Catedra TelefonicaUniversitat Pompeu Fabra – Barcelona, Spain
This can be done by bad sites (spam) but also good sites
Colluding groups could use different topologies
Colluding groups could have different original rankings
How much would their ranking increase if ... ?
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Goal
Study collusion
Nepotistic linking in a Web graph
This can be done by bad sites (spam) but also good sites
Colluding groups could use different topologies
Colluding groups could have different original rankings
How much would their ranking increase if ... ?
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Goal
Study collusion
Nepotistic linking in a Web graph
This can be done by bad sites (spam) but also good sites
Colluding groups could use different topologies
Colluding groups could have different original rankings
How much would their ranking increase if ... ?
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Goal
Study collusion
Nepotistic linking in a Web graph
This can be done by bad sites (spam) but also good sites
Colluding groups could use different topologies
Colluding groups could have different original rankings
How much would their ranking increase if ... ?
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Goal
Study collusion
Nepotistic linking in a Web graph
This can be done by bad sites (spam) but also good sites
Colluding groups could use different topologies
Colluding groups could have different original rankings
How much would their ranking increase if ... ?
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Framework
We use Pagerank as the ranking function [Page et al., 1998]
Pagerank
Let LN×N row-wise normalized link matrixLet U a matrix such that Ui ,j = 1/NLet P = (1− ε)L + εUPagerank scores are given by v such that PTv = v
Pagerank scores are the probabilities of visiting a page using aprocess of random browsing, with a “reset” probability ofε ≈ 0.15.
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Framework
We use Pagerank as the ranking function [Page et al., 1998]
Pagerank
Let LN×N row-wise normalized link matrixLet U a matrix such that Ui ,j = 1/NLet P = (1− ε)L + εUPagerank scores are given by v such that PTv = v
Pagerank scores are the probabilities of visiting a page using aprocess of random browsing, with a “reset” probability ofε ≈ 0.15.
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Framework
We use Pagerank as the ranking function [Page et al., 1998]
Pagerank
Let LN×N row-wise normalized link matrixLet U a matrix such that Ui ,j = 1/NLet P = (1− ε)L + εUPagerank scores are given by v such that PTv = v
Pagerank scores are the probabilities of visiting a page using aprocess of random browsing, with a “reset” probability ofε ≈ 0.15.
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Gain from collusion
Maximum gain [Zhang et al., 2004]:
New Pagerank
Old Pagerank≤ 1
ε
As ε ≈ 0.15, maximum gain ≈ 7.
First task: improve this bound.
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Gain from collusion
Maximum gain [Zhang et al., 2004]:
New Pagerank
Old Pagerank≤ 1
ε
As ε ≈ 0.15, maximum gain ≈ 7.
First task: improve this bound.
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Impact of collusion in Pagerank
M pages N-M pages
The Web: N pages
G
G'
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Grouping nodes for Pagerank calculation
Links for Pagerank, can be “lumped” together[Clausen, 2004]:
M pages N-M pagesRandom jumps
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Links for Pagerank calculation
Pagerankcolluding nodes = Pjump + Pin + Pself
M nodes,Pagerank=
xN-M nodes,Pagerank=
1-x
Pin
Pjump
Pself
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: random jumps
There are N nodes in total, M in the colluding set:
Pjump = ε(M/N)
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming links
Pin =∑
(a,b):(a,b)∈Ein
Pagerank(a)
deg(a)
=∑
a:a∈G−G ′
Pagerank(a) p(a)
Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.
p =
∑a:a∈G−G ′ Pagerank(a)p(a)∑
a:a∈G−G ′ Pagerank(a)
=
∑a:a∈G−G ′ Pagerank(a)p(a)
1− x
Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming links
Pin =∑
(a,b):(a,b)∈Ein
Pagerank(a)
deg(a)
=∑
a:a∈G−G ′
Pagerank(a) p(a)
Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.
p =
∑a:a∈G−G ′ Pagerank(a)p(a)∑
a:a∈G−G ′ Pagerank(a)
=
∑a:a∈G−G ′ Pagerank(a)p(a)
1− x
Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming links
Pin =∑
(a,b):(a,b)∈Ein
Pagerank(a)
deg(a)
=∑
a:a∈G−G ′
Pagerank(a) p(a)
Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.
p =
∑a:a∈G−G ′ Pagerank(a)p(a)∑
a:a∈G−G ′ Pagerank(a)
=
∑a:a∈G−G ′ Pagerank(a)p(a)
1− x
Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming links
Pin =∑
(a,b):(a,b)∈Ein
Pagerank(a)
deg(a)
=∑
a:a∈G−G ′
Pagerank(a) p(a)
Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.
p =
∑a:a∈G−G ′ Pagerank(a)p(a)∑
a:a∈G−G ′ Pagerank(a)
=
∑a:a∈G−G ′ Pagerank(a)p(a)
1− x
Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming links
Pin =∑
(a,b):(a,b)∈Ein
Pagerank(a)
deg(a)
=∑
a:a∈G−G ′
Pagerank(a) p(a)
Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.
p =
∑a:a∈G−G ′ Pagerank(a)p(a)∑
a:a∈G−G ′ Pagerank(a)
=
∑a:a∈G−G ′ Pagerank(a)p(a)
1− x
Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming and self links
Pin can be rewritten as:
Pin = (1− ε)(1− x)p
Using the same trick for Pself , we can take s as the weightedaverage of the fraction of self-links of each page in thecolluding set, and write:
Pself = (1− ε)xs
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation: incoming and self links
Pin can be rewritten as:
Pin = (1− ε)(1− x)p
Using the same trick for Pself , we can take s as the weightedaverage of the fraction of self-links of each page in thecolluding set, and write:
Pself = (1− ε)xs
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank calculation summary
M nodes,Pagerank=
xN-M nodes,Pagerank=
1-x
Pin= (1-)(1-x)p
Pjump
= (M/N)
Pself
= (1-)xs
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Solving
Solving the stationary state Pin + Pjump + Pself = x yields:
xnormal =εMN + (1− ε) p
(p − s)(1− ε) + 1
What happens when colluding ?
Colluding means pointing more links to the insideThis means s → s ′, with s ′ > s, yielding xcolluding
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Solving
Solving the stationary state Pin + Pjump + Pself = x yields:
xnormal =εMN + (1− ε) p
(p − s)(1− ε) + 1
What happens when colluding ?
Colluding means pointing more links to the insideThis means s → s ′, with s ′ > s, yielding xcolluding
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank increase due to collusion
Making s ′ = 1, all links from the colluding set go inside now:
xcolluding
xnormal= 1 +
1− s
p + ε1−ε
Making s = p, originally the set was not colluding:
xcolluding
xnormal=
1
p(1− ε) + ε
Z This is inversely correlated to p, the original weightedfraction of links going to the colluding set
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank increase due to collusion
Making s ′ = 1, all links from the colluding set go inside now:
xcolluding
xnormal= 1 +
1− s
p + ε1−ε
Making s = p, originally the set was not colluding:
xcolluding
xnormal=
1
p(1− ε) + ε
Z This is inversely correlated to p, the original weightedfraction of links going to the colluding set
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Pagerank increase due to collusion
Making s ′ = 1, all links from the colluding set go inside now:
xcolluding
xnormal= 1 +
1− s
p + ε1−ε
Making s = p, originally the set was not colluding:
xcolluding
xnormal=
1
p(1− ε) + ε
Z This is inversely correlated to p, the original weightedfraction of links going to the colluding set
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Expected Pagerank change
xcolluding/xnormal as a function of p
1
2
3
4
5
6
7
10-3 10-2 10-1 100
Max
imum
pag
eran
k ch
ange
Weighted average of fraction of links to colluding nodes
1/ε
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a synthetic graph
Created using the generative model [Kumar et al., 2000]
Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]
100,000–nodes scale-free graph
Sampling by Pagerank
Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a synthetic graph
Created using the generative model [Kumar et al., 2000]
Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]
100,000–nodes scale-free graph
Sampling by Pagerank
Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a synthetic graph
Created using the generative model [Kumar et al., 2000]
Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]
100,000–nodes scale-free graph
Sampling by Pagerank
Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a synthetic graph
Created using the generative model [Kumar et al., 2000]
Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]
100,000–nodes scale-free graph
Sampling by Pagerank
Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Experiments in a synthetic graph
Created using the generative model [Kumar et al., 2000]
Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]
100,000–nodes scale-free graph
Sampling by Pagerank
Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Original Pagerank of the nodes
These are the original Pagerank values for each group
10-6
10-5
10-4
10-3
10-2
1 2 3 4 5 6 7 8 9 10
Page
rank
val
ues
Group
Originally very bad
Originally very good
Average
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Modified Pagerank of the nodes
These are the modified Pagerank values when colluding.
10-6
10-5
10-4
10-3
10-2
1 2 3 4 5 6 7 8 9 10
Page
rank
val
ues
Group
OriginalClique
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Distribution of Pagerank
i But Pagerank values follow a power law distribution ...
10-5
10-4
10-3
10-2
10-1
100
10-6 10-5 10-4 10-3 10-2
Freq
uenc
y
Pagerank value
x-2.1
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Modified Pagerank position of the nodes
These are the modified Pagerank positions (rankings) whencolluding.
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1 2 3 4 5 6 7 8 9 10
Page
rank
rank
ing
Group
OriginalClique
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Variation of Pagerank when colluding
These are the ratio of xcolluding/xoriginal
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8 9 10
New
val
ue /
orig
inal
val
ue
Group
1/ε − Change in Pagerank valueChange in ranking
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
It is not necessary to create a clique
Spammers can use a fraction of the links to try to avoiddetection
Some of the nodes are already colluding [Fetterly et al., 2004]
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
We study a set of Web sites
We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.
Disconnect the group
Create a ring
Add a central page linking to all of them
Add a central page linking to and from all of them (star)
Fully connect the group (clique)
Now we measure the new ranking
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
New rankings under graph modifications
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
DisconnectedNormal
CentralRing
Inv. RingStar
Clique
Ran
king
s
Strategy
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Adding 5%-50% of complete subgraph
0.980
0.985
0.990
0.995
1.000
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%
Ran
king
s
Percent of links of a complete subgraph
Average ranking
The best sites also increase their ranking
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Adding 5%-50% of complete subgraph
0.980
0.985
0.990
0.995
1.000
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%
Ran
king
s
Percent of links of a complete subgraph
Average ranking
The best sites also increase their ranking
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Conclusions
V Any group of nodes can increase its Pagerank
V Nodes with high Pagerank gain less by colluding
Ideas for link spam detection
X Only detecting regularities can fail to detect randomizedstructures
X Only detecting nepotistic links can give false positives
V Use evidence from multiple sources
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Thank you
PagerankIncrease under
Collusion
R. Baeza-Yates,C. Castillo and
V. Lopez
Outline
Introduction
Collusion andPagerank
Experiments in asynthetic graph
Experiments in areal Web graph
Conclusions
Clausen, A. (2004).The cost of attack of PageRank.In Proceedings of the international conference on agents, Webtechnologies and Internet commerce (IAWTIC), Gold Coast, Australia.
Fetterly, D., Manasse, M., and Najork, M. (2004).Spam, damn spam, and statistics: Using statistical analysis to locate spamWeb pages.In Proceedings of the seventh workshop on the Web and databases(WebDB), Paris, France.
Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A.,and Upfal, E. (2000).Stochastic models for the web graph.In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS), pages 57–65, Redondo Beach, CA, USA. IEEECS Press.
Page, L., Brin, S., Motwani, R., and Winograd, T. (1998).The Pagerank citation algorithm: bringing order to the web.Technical report, Stanford Digital Library Technologies Project.
Pandurangan, G., Raghavan, P., and Upfal, E. (2002).Using Pagerank to characterize Web structure.In Proceedings of the 8th Annual International Computing andCombinatorics Conference (COCOON), volume 2387 of Lecture Notes inComputer Science, pages 330–390, Singapore. Springer.
Zhang, H., Goel, A., Govindan, R., Mason, K., and Roy, B. V. (2004).Making eigenvector-based reputation systems robust to collusion.In Proceedings of the third Workshop on Web Graphs (WAW), volume3243 of Lecture Notes in Computer Science, pages 92–104, Rome, Italy.Springer.