Top Banner
Pagerank Increase under Collusion R. Baeza-Yates, C. Castillo and V. L´ opez Outline Introduction Collusion and Pagerank Experiments in a synthetic graph Experiments in a real Web graph Conclusions Pagerank Increase under Different Collusion Topologies Ricardo Baeza-Yates, Carlos Castillo and Vicente L´ opez ICREA Professor / Dept. of Technology / C´ atedra Telef´ onica Universitat Pompeu Fabra – Barcelona, Spain May 10th, 2005
64
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank Increaseunder Different Collusion Topologies

Ricardo Baeza-Yates, Carlos Castillo and Vicente Lopez

ICREA Professor / Dept. of Technology / Catedra TelefonicaUniversitat Pompeu Fabra – Barcelona, Spain

May 10th, 2005

Page 2: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

1 Introduction

2 Collusion and Pagerank

3 Experiments in a synthetic graph

4 Experiments in a real Web graph

5 Conclusions

Page 3: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Goal

Study collusion

Nepotistic linking in a Web graph

This can be done by bad sites (spam) but also good sites

Colluding groups could use different topologies

Colluding groups could have different original rankings

How much would their ranking increase if ... ?

Page 4: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Goal

Study collusion

Nepotistic linking in a Web graph

This can be done by bad sites (spam) but also good sites

Colluding groups could use different topologies

Colluding groups could have different original rankings

How much would their ranking increase if ... ?

Page 5: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Goal

Study collusion

Nepotistic linking in a Web graph

This can be done by bad sites (spam) but also good sites

Colluding groups could use different topologies

Colluding groups could have different original rankings

How much would their ranking increase if ... ?

Page 6: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Goal

Study collusion

Nepotistic linking in a Web graph

This can be done by bad sites (spam) but also good sites

Colluding groups could use different topologies

Colluding groups could have different original rankings

How much would their ranking increase if ... ?

Page 7: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Goal

Study collusion

Nepotistic linking in a Web graph

This can be done by bad sites (spam) but also good sites

Colluding groups could use different topologies

Colluding groups could have different original rankings

How much would their ranking increase if ... ?

Page 8: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Framework

We use Pagerank as the ranking function [Page et al., 1998]

Pagerank

Let LN×N row-wise normalized link matrixLet U a matrix such that Ui ,j = 1/NLet P = (1− ε)L + εUPagerank scores are given by v such that PTv = v

Pagerank scores are the probabilities of visiting a page using aprocess of random browsing, with a “reset” probability ofε ≈ 0.15.

Page 9: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Framework

We use Pagerank as the ranking function [Page et al., 1998]

Pagerank

Let LN×N row-wise normalized link matrixLet U a matrix such that Ui ,j = 1/NLet P = (1− ε)L + εUPagerank scores are given by v such that PTv = v

Pagerank scores are the probabilities of visiting a page using aprocess of random browsing, with a “reset” probability ofε ≈ 0.15.

Page 10: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Framework

We use Pagerank as the ranking function [Page et al., 1998]

Pagerank

Let LN×N row-wise normalized link matrixLet U a matrix such that Ui ,j = 1/NLet P = (1− ε)L + εUPagerank scores are given by v such that PTv = v

Pagerank scores are the probabilities of visiting a page using aprocess of random browsing, with a “reset” probability ofε ≈ 0.15.

Page 11: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Gain from collusion

Maximum gain [Zhang et al., 2004]:

New Pagerank

Old Pagerank≤ 1

ε

As ε ≈ 0.15, maximum gain ≈ 7.

First task: improve this bound.

Page 12: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Gain from collusion

Maximum gain [Zhang et al., 2004]:

New Pagerank

Old Pagerank≤ 1

ε

As ε ≈ 0.15, maximum gain ≈ 7.

First task: improve this bound.

Page 13: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Impact of collusion in Pagerank

M pages N-M pages

The Web: N pages

G

G'

Page 14: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Grouping nodes for Pagerank calculation

Links for Pagerank, can be “lumped” together[Clausen, 2004]:

M pages N-M pagesRandom jumps

Page 15: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Links for Pagerank calculation

Pagerankcolluding nodes = Pjump + Pin + Pself

M nodes,Pagerank=

xN-M nodes,Pagerank=

1-x

Pin

Pjump

Pself

Page 16: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank calculation: random jumps

There are N nodes in total, M in the colluding set:

Pjump = ε(M/N)

Page 17: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank calculation: incoming links

Pin =∑

(a,b):(a,b)∈Ein

Pagerank(a)

deg(a)

=∑

a:a∈G−G ′

Pagerank(a) p(a)

Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.

p =

∑a:a∈G−G ′ Pagerank(a)p(a)∑

a:a∈G−G ′ Pagerank(a)

=

∑a:a∈G−G ′ Pagerank(a)p(a)

1− x

Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are

Page 18: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank calculation: incoming links

Pin =∑

(a,b):(a,b)∈Ein

Pagerank(a)

deg(a)

=∑

a:a∈G−G ′

Pagerank(a) p(a)

Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.

p =

∑a:a∈G−G ′ Pagerank(a)p(a)∑

a:a∈G−G ′ Pagerank(a)

=

∑a:a∈G−G ′ Pagerank(a)p(a)

1− x

Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are

Page 19: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank calculation: incoming links

Pin =∑

(a,b):(a,b)∈Ein

Pagerank(a)

deg(a)

=∑

a:a∈G−G ′

Pagerank(a) p(a)

Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.

p =

∑a:a∈G−G ′ Pagerank(a)p(a)∑

a:a∈G−G ′ Pagerank(a)

=

∑a:a∈G−G ′ Pagerank(a)p(a)

1− x

Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are

Page 20: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank calculation: incoming links

Pin =∑

(a,b):(a,b)∈Ein

Pagerank(a)

deg(a)

=∑

a:a∈G−G ′

Pagerank(a) p(a)

Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.

p =

∑a:a∈G−G ′ Pagerank(a)p(a)∑

a:a∈G−G ′ Pagerank(a)

=

∑a:a∈G−G ′ Pagerank(a)p(a)

1− x

Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are

Page 21: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank calculation: incoming links

Pin =∑

(a,b):(a,b)∈Ein

Pagerank(a)

deg(a)

=∑

a:a∈G−G ′

Pagerank(a) p(a)

Where p(a) is the fraction of links from node a pointing tothe colluding set, possibly 0 for some nodes.

p =

∑a:a∈G−G ′ Pagerank(a)p(a)∑

a:a∈G−G ′ Pagerank(a)

=

∑a:a∈G−G ′ Pagerank(a)p(a)

1− x

Z p is a weighted average of p(a), it reflects how“important” pages in the colluding set are

Page 22: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank calculation: incoming and self links

Pin can be rewritten as:

Pin = (1− ε)(1− x)p

Using the same trick for Pself , we can take s as the weightedaverage of the fraction of self-links of each page in thecolluding set, and write:

Pself = (1− ε)xs

Page 23: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank calculation: incoming and self links

Pin can be rewritten as:

Pin = (1− ε)(1− x)p

Using the same trick for Pself , we can take s as the weightedaverage of the fraction of self-links of each page in thecolluding set, and write:

Pself = (1− ε)xs

Page 24: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank calculation summary

M nodes,Pagerank=

xN-M nodes,Pagerank=

1-x

Pin= (1-)(1-x)p

Pjump

= (M/N)

Pself

= (1-)xs

Page 25: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Solving

Solving the stationary state Pin + Pjump + Pself = x yields:

xnormal =εMN + (1− ε) p

(p − s)(1− ε) + 1

What happens when colluding ?

Colluding means pointing more links to the insideThis means s → s ′, with s ′ > s, yielding xcolluding

Page 26: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Solving

Solving the stationary state Pin + Pjump + Pself = x yields:

xnormal =εMN + (1− ε) p

(p − s)(1− ε) + 1

What happens when colluding ?

Colluding means pointing more links to the insideThis means s → s ′, with s ′ > s, yielding xcolluding

Page 27: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank increase due to collusion

Making s ′ = 1, all links from the colluding set go inside now:

xcolluding

xnormal= 1 +

1− s

p + ε1−ε

Making s = p, originally the set was not colluding:

xcolluding

xnormal=

1

p(1− ε) + ε

Z This is inversely correlated to p, the original weightedfraction of links going to the colluding set

Page 28: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank increase due to collusion

Making s ′ = 1, all links from the colluding set go inside now:

xcolluding

xnormal= 1 +

1− s

p + ε1−ε

Making s = p, originally the set was not colluding:

xcolluding

xnormal=

1

p(1− ε) + ε

Z This is inversely correlated to p, the original weightedfraction of links going to the colluding set

Page 29: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pagerank increase due to collusion

Making s ′ = 1, all links from the colluding set go inside now:

xcolluding

xnormal= 1 +

1− s

p + ε1−ε

Making s = p, originally the set was not colluding:

xcolluding

xnormal=

1

p(1− ε) + ε

Z This is inversely correlated to p, the original weightedfraction of links going to the colluding set

Page 30: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Expected Pagerank change

xcolluding/xnormal as a function of p

1

2

3

4

5

6

7

10-3 10-2 10-1 100

Max

imum

pag

eran

k ch

ange

Weighted average of fraction of links to colluding nodes

1/ε

Page 31: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Experiments in a synthetic graph

Created using the generative model [Kumar et al., 2000]

Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]

100,000–nodes scale-free graph

Sampling by Pagerank

Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes

Page 32: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Experiments in a synthetic graph

Created using the generative model [Kumar et al., 2000]

Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]

100,000–nodes scale-free graph

Sampling by Pagerank

Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes

Page 33: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Experiments in a synthetic graph

Created using the generative model [Kumar et al., 2000]

Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]

100,000–nodes scale-free graph

Sampling by Pagerank

Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes

Page 34: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Experiments in a synthetic graph

Created using the generative model [Kumar et al., 2000]

Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]

100,000–nodes scale-free graph

Sampling by Pagerank

Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes

Page 35: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Experiments in a synthetic graph

Created using the generative model [Kumar et al., 2000]

Power-law distribution with parameter −2.1 for in-degreeand Pagerank, and −2.7 for out-degree, using parametersfrom [Pandurangan et al., 2002]

100,000–nodes scale-free graph

Sampling by Pagerank

Divided in deciles, each decile has 1/10th of the PagerankPicked a group of 100 nodes at random from each decileGroup 1 are low-ranked nodes, group 10 are high-ranked nodes

Page 36: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Original Pagerank of the nodes

These are the original Pagerank values for each group

10-6

10-5

10-4

10-3

10-2

1 2 3 4 5 6 7 8 9 10

Page

rank

val

ues

Group

Originally very bad

Originally very good

Average

Page 37: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Modified Pagerank of the nodes

These are the modified Pagerank values when colluding.

10-6

10-5

10-4

10-3

10-2

1 2 3 4 5 6 7 8 9 10

Page

rank

val

ues

Group

OriginalClique

Page 38: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Distribution of Pagerank

i But Pagerank values follow a power law distribution ...

10-5

10-4

10-3

10-2

10-1

100

10-6 10-5 10-4 10-3 10-2

Freq

uenc

y

Pagerank value

x-2.1

Page 39: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Modified Pagerank position of the nodes

These are the modified Pagerank positions (rankings) whencolluding.

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1 2 3 4 5 6 7 8 9 10

Page

rank

rank

ing

Group

OriginalClique

Page 40: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Variation of Pagerank when colluding

These are the ratio of xcolluding/xoriginal

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10

New

val

ue /

orig

inal

val

ue

Group

1/ε − Change in Pagerank valueChange in ranking

Page 41: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

It is not necessary to create a clique

Spammers can use a fraction of the links to try to avoiddetection

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10

New

Pag

eran

k / o

rigi

nal P

ager

ank

Group

1/ε −Full clique

95%90%85%80%75%70%65%60%55%50%45%40%35%30%25%20%15%10%05%

In the paper, other topologies: star and ring

Page 42: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

It is not necessary to create a clique

Spammers can use a fraction of the links to try to avoiddetection

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8 9 10

New

Pag

eran

k / o

rigi

nal P

ager

ank

Group

1/ε −Full clique

95%90%85%80%75%70%65%60%55%50%45%40%35%30%25%20%15%10%05%

In the paper, other topologies: star and ring

Page 43: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Experiments in a real Web graph

Hostgraph of 310,486 Websites from Spain

10-6

10-5

10-4

10-3

10-2

10-1

100

10-6 10-5 10-4 10-3 10-2

Freq

uenc

y

Pagerank value

x-2.1

Page 44: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Experiments in a real Web graph

Some of the nodes are already colluding [Fetterly et al., 2004]

Page 45: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

We study a set of Web sites

We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.

Disconnect the group

Create a ring

Add a central page linking to all of them

Add a central page linking to and from all of them (star)

Fully connect the group (clique)

Now we measure the new ranking

Page 46: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

We study a set of Web sites

We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.

Disconnect the group

Create a ring

Add a central page linking to all of them

Add a central page linking to and from all of them (star)

Fully connect the group (clique)

Now we measure the new ranking

Page 47: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

We study a set of Web sites

We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.

Disconnect the group

Create a ring

Add a central page linking to all of them

Add a central page linking to and from all of them (star)

Fully connect the group (clique)

Now we measure the new ranking

Page 48: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

We study a set of Web sites

We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.

Disconnect the group

Create a ring

Add a central page linking to all of them

Add a central page linking to and from all of them (star)

Fully connect the group (clique)

Now we measure the new ranking

Page 49: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

We study a set of Web sites

We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.

Disconnect the group

Create a ring

Add a central page linking to all of them

Add a central page linking to and from all of them (star)

Fully connect the group (clique)

Now we measure the new ranking

Page 50: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

We study a set of Web sites

We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.

Disconnect the group

Create a ring

Add a central page linking to all of them

Add a central page linking to and from all of them (star)

Fully connect the group (clique)

Now we measure the new ranking

Page 51: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

We study a set of Web sites

We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.

Disconnect the group

Create a ring

Add a central page linking to all of them

Add a central page linking to and from all of them (star)

Fully connect the group (clique)

Now we measure the new ranking

Page 52: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

We study a set of Web sites

We picked a set of 242 reputable sites with rank ≈ 0.75, andwe modify their links.

Disconnect the group

Create a ring

Add a central page linking to all of them

Add a central page linking to and from all of them (star)

Fully connect the group (clique)

Now we measure the new ranking

Page 53: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

New rankings under graph modifications

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

DisconnectedNormal

CentralRing

Inv. RingStar

Clique

Ran

king

s

Strategy

Page 54: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Adding 5%-50% of complete subgraph

0.980

0.985

0.990

0.995

1.000

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%

Ran

king

s

Percent of links of a complete subgraph

Average ranking

The best sites also increase their ranking

Page 55: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Adding 5%-50% of complete subgraph

0.980

0.985

0.990

0.995

1.000

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55%

Ran

king

s

Percent of links of a complete subgraph

Average ranking

The best sites also increase their ranking

Page 56: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Conclusions

V Any group of nodes can increase its Pagerank

V Nodes with high Pagerank gain less by colluding

Ideas for link spam detection

X Only detecting regularities can fail to detect randomizedstructures

X Only detecting nepotistic links can give false positives

V Use evidence from multiple sources

Page 57: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Conclusions

V Any group of nodes can increase its Pagerank

V Nodes with high Pagerank gain less by colluding

Ideas for link spam detection

X Only detecting regularities can fail to detect randomizedstructures

X Only detecting nepotistic links can give false positives

V Use evidence from multiple sources

Page 58: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Conclusions

V Any group of nodes can increase its Pagerank

V Nodes with high Pagerank gain less by colluding

Ideas for link spam detection

X Only detecting regularities can fail to detect randomizedstructures

X Only detecting nepotistic links can give false positives

V Use evidence from multiple sources

Page 59: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Conclusions

V Any group of nodes can increase its Pagerank

V Nodes with high Pagerank gain less by colluding

Ideas for link spam detection

X Only detecting regularities can fail to detect randomizedstructures

X Only detecting nepotistic links can give false positives

V Use evidence from multiple sources

Page 60: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Conclusions

V Any group of nodes can increase its Pagerank

V Nodes with high Pagerank gain less by colluding

Ideas for link spam detection

X Only detecting regularities can fail to detect randomizedstructures

X Only detecting nepotistic links can give false positives

V Use evidence from multiple sources

Page 61: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Conclusions

V Any group of nodes can increase its Pagerank

V Nodes with high Pagerank gain less by colluding

Ideas for link spam detection

X Only detecting regularities can fail to detect randomizedstructures

X Only detecting nepotistic links can give false positives

V Use evidence from multiple sources

Page 62: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Thank you

Page 63: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Clausen, A. (2004).The cost of attack of PageRank.In Proceedings of the international conference on agents, Webtechnologies and Internet commerce (IAWTIC), Gold Coast, Australia.

Fetterly, D., Manasse, M., and Najork, M. (2004).Spam, damn spam, and statistics: Using statistical analysis to locate spamWeb pages.In Proceedings of the seventh workshop on the Web and databases(WebDB), Paris, France.

Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tomkins, A.,and Upfal, E. (2000).Stochastic models for the web graph.In Proceedings of the 41st Annual Symposium on Foundations ofComputer Science (FOCS), pages 57–65, Redondo Beach, CA, USA. IEEECS Press.

Page, L., Brin, S., Motwani, R., and Winograd, T. (1998).The Pagerank citation algorithm: bringing order to the web.Technical report, Stanford Digital Library Technologies Project.

Page 64: PageRank Increase under Different Collusion Topologies (AIRWEB 2005)

PagerankIncrease under

Collusion

R. Baeza-Yates,C. Castillo and

V. Lopez

Outline

Introduction

Collusion andPagerank

Experiments in asynthetic graph

Experiments in areal Web graph

Conclusions

Pandurangan, G., Raghavan, P., and Upfal, E. (2002).Using Pagerank to characterize Web structure.In Proceedings of the 8th Annual International Computing andCombinatorics Conference (COCOON), volume 2387 of Lecture Notes inComputer Science, pages 330–390, Singapore. Springer.

Zhang, H., Goel, A., Govindan, R., Mason, K., and Roy, B. V. (2004).Making eigenvector-based reputation systems robust to collusion.In Proceedings of the third Workshop on Web Graphs (WAW), volume3243 of Lecture Notes in Computer Science, pages 92–104, Rome, Italy.Springer.