Top Banner
115

So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Oct 14, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

So, You Think You Have a Power Law, Do You?Well Isn't That Special?

Cosma Shalizi

Statistics Department, Carnegie Mellon University

Santa Fe Institute

18 October 2010, NY Machine Learning Meetup

Page 2: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Summary

1 Everything good in the talk I owe to my co-authors, AaronClauset and Mark Newman

2 Power laws, p(x) ∝ x−α, are cool, but not that cool

3 Most of the studies claiming to �nd them use unreliable 19thcentury methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforwardmid-20th century statistics

5 Using reliable methods, lots of the claimed power lawsdisappear, or are at best �not proven�

You are now free to tune me out and turn on social media

Cosma Shalizi So, You Think You Have a Power Law?

Page 3: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Summary

1 Everything good in the talk I owe to my co-authors, AaronClauset and Mark Newman

2 Power laws, p(x) ∝ x−α, are cool, but not that cool

3 Most of the studies claiming to �nd them use unreliable 19thcentury methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforwardmid-20th century statistics

5 Using reliable methods, lots of the claimed power lawsdisappear, or are at best �not proven�

You are now free to tune me out and turn on social media

Cosma Shalizi So, You Think You Have a Power Law?

Page 4: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Summary

1 Everything good in the talk I owe to my co-authors, AaronClauset and Mark Newman

2 Power laws, p(x) ∝ x−α, are cool, but not that cool

3 Most of the studies claiming to �nd them use unreliable 19thcentury methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforwardmid-20th century statistics

5 Using reliable methods, lots of the claimed power lawsdisappear, or are at best �not proven�

You are now free to tune me out and turn on social media

Cosma Shalizi So, You Think You Have a Power Law?

Page 5: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Summary

1 Everything good in the talk I owe to my co-authors, AaronClauset and Mark Newman

2 Power laws, p(x) ∝ x−α, are cool, but not that cool

3 Most of the studies claiming to �nd them use unreliable 19thcentury methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforwardmid-20th century statistics

5 Using reliable methods, lots of the claimed power lawsdisappear, or are at best �not proven�

You are now free to tune me out and turn on social media

Cosma Shalizi So, You Think You Have a Power Law?

Page 6: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Summary

1 Everything good in the talk I owe to my co-authors, AaronClauset and Mark Newman

2 Power laws, p(x) ∝ x−α, are cool, but not that cool

3 Most of the studies claiming to �nd them use unreliable 19thcentury methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforwardmid-20th century statistics

5 Using reliable methods, lots of the claimed power lawsdisappear, or are at best �not proven�

You are now free to tune me out and turn on social media

Cosma Shalizi So, You Think You Have a Power Law?

Page 7: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Summary

1 Everything good in the talk I owe to my co-authors, AaronClauset and Mark Newman

2 Power laws, p(x) ∝ x−α, are cool, but not that cool

3 Most of the studies claiming to �nd them use unreliable 19thcentury methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforwardmid-20th century statistics

5 Using reliable methods, lots of the claimed power lawsdisappear, or are at best �not proven�

You are now free to tune me out and turn on social media

Cosma Shalizi So, You Think You Have a Power Law?

Page 8: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Summary

1 Everything good in the talk I owe to my co-authors, AaronClauset and Mark Newman

2 Power laws, p(x) ∝ x−α, are cool, but not that cool

3 Most of the studies claiming to �nd them use unreliable 19thcentury methods, and have no value as evidence either way

4 Reliable methods exist, and need only very straightforwardmid-20th century statistics

5 Using reliable methods, lots of the claimed power lawsdisappear, or are at best �not proven�

You are now free to tune me out and turn on social media

Cosma Shalizi So, You Think You Have a Power Law?

Page 9: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

What Are Power Law Distributions? Why Care?

p(x) ∝ x−α (continuous)

P (X = x) ∝ x−α (discrete)

∴ P (X ≥ x) ∝ x−(α−1)

andlog p(x) = logC − α log x

�Pareto� (continuous), �Zipf� or �zeta� (discrete)Explicitly:

p(x) =α− 1

xmin

(x

xmin

)−α(discrete version involves the Hurwitz zeta function)

Cosma Shalizi So, You Think You Have a Power Law?

Page 10: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

What Are Power Law Distributions? Why Care?

p(x) ∝ x−α (continuous)

P (X = x) ∝ x−α (discrete)

∴ P (X ≥ x) ∝ x−(α−1)

andlog p(x) = logC − α log x

�Pareto� (continuous), �Zipf� or �zeta� (discrete)

Explicitly:

p(x) =α− 1

xmin

(x

xmin

)−α(discrete version involves the Hurwitz zeta function)

Cosma Shalizi So, You Think You Have a Power Law?

Page 11: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

What Are Power Law Distributions? Why Care?

p(x) ∝ x−α (continuous)

P (X = x) ∝ x−α (discrete)

∴ P (X ≥ x) ∝ x−(α−1)

andlog p(x) = logC − α log x

�Pareto� (continuous), �Zipf� or �zeta� (discrete)Explicitly:

p(x) =α− 1

xmin

(x

xmin

)−α(discrete version involves the Hurwitz zeta function)

Cosma Shalizi So, You Think You Have a Power Law?

Page 12: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Money, Words, Cities

The three classic power law distributions

Pareto's law: wealth (richest 400 in US, 2003)

1e+09 5e+09 2e+10 5e+10

0.00

20.

010

0.05

00.

200

1.00

0

Wealth

Net worth (US$)

Sur

viva

l fun

ctio

n

Cosma Shalizi So, You Think You Have a Power Law?

Page 13: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Money, Words, Cities

The three classic power law distributions

Zipf's law: word frequencies (Moby Dick)

1 10 100 1000 10000

1e−

041e

−03

1e−

021e

−01

1e+

00

Word Frequencies

Number of occurences

Sur

viva

l fun

ctio

n

Cosma Shalizi So, You Think You Have a Power Law?

Page 14: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Money, Words, Cities

The three classic power law distributions

Zipf's law: city populations

1e+00 1e+02 1e+04 1e+06

1e−

041e

−03

1e−

021e

−01

1e+

00

City Sizes

US Census (2000)Population

Sur

viva

l fun

ctio

n

Cosma Shalizi So, You Think You Have a Power Law?

Page 15: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Properties

Highly right skewed

Heavy (fat, long, . . . ) tails: sub-exponential decay of p(x)Extreme inequality (�80/20�): high proportion of summed valuescomes from small fraction of samples/population�Scale-free�:

p(x |X ≥ s) =α− 1

s

(xs

)−αi.e., another power law, same α∴ no �typical scale�though xmin is the typical value

Cosma Shalizi So, You Think You Have a Power Law?

Page 16: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Properties

Highly right skewedHeavy (fat, long, . . . ) tails: sub-exponential decay of p(x)

Extreme inequality (�80/20�): high proportion of summed valuescomes from small fraction of samples/population�Scale-free�:

p(x |X ≥ s) =α− 1

s

(xs

)−αi.e., another power law, same α∴ no �typical scale�though xmin is the typical value

Cosma Shalizi So, You Think You Have a Power Law?

Page 17: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Properties

Highly right skewedHeavy (fat, long, . . . ) tails: sub-exponential decay of p(x)Extreme inequality (�80/20�): high proportion of summed valuescomes from small fraction of samples/population

�Scale-free�:

p(x |X ≥ s) =α− 1

s

(xs

)−αi.e., another power law, same α∴ no �typical scale�though xmin is the typical value

Cosma Shalizi So, You Think You Have a Power Law?

Page 18: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Properties

Highly right skewedHeavy (fat, long, . . . ) tails: sub-exponential decay of p(x)Extreme inequality (�80/20�): high proportion of summed valuescomes from small fraction of samples/population�Scale-free�:

p(x |X ≥ s) =α− 1

s

(xs

)−αi.e., another power law, same α

∴ no �typical scale�though xmin is the typical value

Cosma Shalizi So, You Think You Have a Power Law?

Page 19: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Properties

Highly right skewedHeavy (fat, long, . . . ) tails: sub-exponential decay of p(x)Extreme inequality (�80/20�): high proportion of summed valuescomes from small fraction of samples/population�Scale-free�:

p(x |X ≥ s) =α− 1

s

(xs

)−αi.e., another power law, same α∴ no �typical scale�

though xmin is the typical value

Cosma Shalizi So, You Think You Have a Power Law?

Page 20: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Properties

Highly right skewedHeavy (fat, long, . . . ) tails: sub-exponential decay of p(x)Extreme inequality (�80/20�): high proportion of summed valuescomes from small fraction of samples/population�Scale-free�:

p(x |X ≥ s) =α− 1

s

(xs

)−αi.e., another power law, same α∴ no �typical scale�though xmin is the typical value

Cosma Shalizi So, You Think You Have a Power Law?

Page 21: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Origin Myths

Catchy and mysterious origin myth from physics:

Distinct phases co-exist at phase transitions

∴ Each phase can appear by �uctuation inside the other, andvice versa

∴ In�nite-range correlations in space and time

∴ Central limit theorem breaks down

but macroscopic physical quantities are still averages

∴ they must have a scale-free distribution

So critical phenomena ⇒ power laws

Cosma Shalizi So, You Think You Have a Power Law?

Page 22: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Origin Myths (cont.)

De�ating origin myths:

Piles of papers on my o�ce �oor [1, 2, 3]

I start new piles at rate λ, so age of piles ∼ Exponential(λ)

All piles start with size xmin

Once a pile starts, on average it grows exponentially at rate µ

X ∼ Pareto(λ/µ+ 1, xmin)

Mixtures of exponentials work too [4]

Cosma Shalizi So, You Think You Have a Power Law?

Page 23: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Origin Myths (cont.)

De�ating origin myths:

Piles of papers on my o�ce �oor [1, 2, 3]

I start new piles at rate λ, so age of piles ∼ Exponential(λ)

All piles start with size xmin

Once a pile starts, on average it grows exponentially at rate µ

X ∼ Pareto(λ/µ+ 1, xmin)

Mixtures of exponentials work too [4]

Cosma Shalizi So, You Think You Have a Power Law?

Page 24: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

There are lots of claims that things follow power laws, especially inthe last ≈ 20 years, especially from physicists

word frequency, protein interaction degree (yeast), metabolic network

degree (E. coli), Internet autonomous system network, calls received,

intensity of wars, terrorist attack fatalities, bytes per HTTP request,

species per genus, # sightings per bird species, population a�ected by

blackouts, sales of best-sellers, population of US cities, area of wild�res,

solar �are intensity, earthquake magnitude, religious sect size, surname

frequency, individual net worth, citation counts, # papers authored, #

hits per URL, in-degree per URL, # entries in e-mail address books, . . .

Cosma Shalizi So, You Think You Have a Power Law?

Page 25: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

There are lots of claims that things follow power laws, especially inthe last ≈ 20 years, especially from physicists

word frequency, protein interaction degree (yeast), metabolic network

degree (E. coli), Internet autonomous system network, calls received,

intensity of wars, terrorist attack fatalities, bytes per HTTP request,

species per genus, # sightings per bird species, population a�ected by

blackouts, sales of best-sellers, population of US cities, area of wild�res,

solar �are intensity, earthquake magnitude, religious sect size, surname

frequency, individual net worth, citation counts, # papers authored, #

hits per URL, in-degree per URL, # entries in e-mail address books, . . .

Cosma Shalizi So, You Think You Have a Power Law?

Page 26: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

⇒ Mason Porter's Power Law Shop

Cosma Shalizi So, You Think You Have a Power Law?

Page 27: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

De�nitions and Examples

Cosma Shalizi So, You Think You Have a Power Law?

Page 28: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

How do physicists come up with their power laws?

Rememberlog p(x) = logC − α log x

& similarly for the CDF

Suggests:

Take a log-log plot of the histogram, or of the CDF, and

Fit an ordinary regression line, then

Use �tted slope as guess for α, check goodness of �t by R2

This is a clever idea for the 1890sFun fact: �statistical physics� involves no actual statistics

Cosma Shalizi So, You Think You Have a Power Law?

Page 29: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

How do physicists come up with their power laws?

Rememberlog p(x) = logC − α log x

& similarly for the CDF

Suggests:

Take a log-log plot of the histogram, or of the CDF, and

Fit an ordinary regression line, then

Use �tted slope as guess for α, check goodness of �t by R2

This is a clever idea for the 1890sFun fact: �statistical physics� involves no actual statistics

Cosma Shalizi So, You Think You Have a Power Law?

Page 30: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

How do physicists come up with their power laws?

Rememberlog p(x) = logC − α log x

& similarly for the CDF

Suggests:

Take a log-log plot of the histogram, or of the CDF, and

Fit an ordinary regression line, then

Use �tted slope as guess for α, check goodness of �t by R2

This is a clever idea

for the 1890sFun fact: �statistical physics� involves no actual statistics

Cosma Shalizi So, You Think You Have a Power Law?

Page 31: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

How do physicists come up with their power laws?

Rememberlog p(x) = logC − α log x

& similarly for the CDF

Suggests:

Take a log-log plot of the histogram, or of the CDF, and

Fit an ordinary regression line, then

Use �tted slope as guess for α, check goodness of �t by R2

This is a clever idea for the 1890s

Fun fact: �statistical physics� involves no actual statistics

Cosma Shalizi So, You Think You Have a Power Law?

Page 32: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

How do physicists come up with their power laws?

Rememberlog p(x) = logC − α log x

& similarly for the CDF

Suggests:

Take a log-log plot of the histogram, or of the CDF, and

Fit an ordinary regression line, then

Use �tted slope as guess for α, check goodness of �t by R2

This is a clever idea for the 1890sFun fact: �statistical physics� involves no actual statistics

Cosma Shalizi So, You Think You Have a Power Law?

Page 33: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Cosma Shalizi So, You Think You Have a Power Law?

Page 34: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Why Is This Bad?

Histograms: binning always throws away information, adds lots oferrorlog-sized bins are only in�nitessimally better

CDF or rank-size plot: values are not independent; ine�cientLeast-squares line:

Not a normalized distribution,

All the inferential assumptions for regression fail

Always has avoidable error as an estimate of α

Easily get large R2 for non-power-law distributions

Cosma Shalizi So, You Think You Have a Power Law?

Page 35: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Why Is This Bad?

Histograms: binning always throws away information, adds lots oferrorlog-sized bins are only in�nitessimally better

CDF or rank-size plot: values are not independent; ine�cientLeast-squares line:

Not a normalized distribution,

All the inferential assumptions for regression fail

Always has avoidable error as an estimate of α

Easily get large R2 for non-power-law distributions

Cosma Shalizi So, You Think You Have a Power Law?

Page 36: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Why Is This Bad?

Histograms: binning always throws away information, adds lots oferrorlog-sized bins are only in�nitessimally better

CDF or rank-size plot: values are not independent; ine�cient

Least-squares line:

Not a normalized distribution,

All the inferential assumptions for regression fail

Always has avoidable error as an estimate of α

Easily get large R2 for non-power-law distributions

Cosma Shalizi So, You Think You Have a Power Law?

Page 37: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Why Is This Bad?

Histograms: binning always throws away information, adds lots oferrorlog-sized bins are only in�nitessimally better

CDF or rank-size plot: values are not independent; ine�cientLeast-squares line:

Not a normalized distribution,

All the inferential assumptions for regression fail

Always has avoidable error as an estimate of α

Easily get large R2 for non-power-law distributions

Cosma Shalizi So, You Think You Have a Power Law?

Page 38: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Some Distributions Which Are Not Power Laws

Log-normal: lnX ∼ N (µ, σ2):

p(x) =1

(1− Φ( ln xmin−µσ ))x√2πσ2

e− (ln x−µ)2

2σ2

Stretched exponential/Weibull: X β ∼ Exponential(λ)

p(x) = βλeλxβminxβ−1e−λx

β

Power law with exponential cut-o� (�negative gamma�)

p(x) =1/L

Γ(1− α, xmin/L)(x/L)−αe−x/L

like a power law for x � L, like an exponential for x � L

Cosma Shalizi So, You Think You Have a Power Law?

Page 39: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Some Distributions Which Are Not Power Laws

Log-normal: lnX ∼ N (µ, σ2):

p(x) =1

(1− Φ( ln xmin−µσ ))x√2πσ2

e− (ln x−µ)2

2σ2

Stretched exponential/Weibull: X β ∼ Exponential(λ)

p(x) = βλeλxβminxβ−1e−λx

β

Power law with exponential cut-o� (�negative gamma�)

p(x) =1/L

Γ(1− α, xmin/L)(x/L)−αe−x/L

like a power law for x � L, like an exponential for x � L

Cosma Shalizi So, You Think You Have a Power Law?

Page 40: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Some Distributions Which Are Not Power Laws

Log-normal: lnX ∼ N (µ, σ2):

p(x) =1

(1− Φ( ln xmin−µσ ))x√2πσ2

e− (ln x−µ)2

2σ2

Stretched exponential/Weibull: X β ∼ Exponential(λ)

p(x) = βλeλxβminxβ−1e−λx

β

Power law with exponential cut-o� (�negative gamma�)

p(x) =1/L

Γ(1− α, xmin/L)(x/L)−αe−x/L

like a power law for x � L, like an exponential for x � L

Cosma Shalizi So, You Think You Have a Power Law?

Page 41: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Some Distributions Which Are Not Power Laws

Log-normal: lnX ∼ N (µ, σ2):

p(x) =1

(1− Φ( ln xmin−µσ ))x√2πσ2

e− (ln x−µ)2

2σ2

Stretched exponential/Weibull: X β ∼ Exponential(λ)

p(x) = βλeλxβminxβ−1e−λx

β

Power law with exponential cut-o� (�negative gamma�)

p(x) =1/L

Γ(1− α, xmin/L)(x/L)−αe−x/L

like a power law for x � L, like an exponential for x � L

Cosma Shalizi So, You Think You Have a Power Law?

Page 42: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Some Distributions Which Are Not Power Laws

Log-normal: lnX ∼ N (µ, σ2):

p(x) =1

(1− Φ( ln xmin−µσ ))x√2πσ2

e− (ln x−µ)2

2σ2

Stretched exponential/Weibull: X β ∼ Exponential(λ)

p(x) = βλeλxβminxβ−1e−λx

β

Power law with exponential cut-o� (�negative gamma�)

p(x) =1/L

Γ(1− α, xmin/L)(x/L)−αe−x/L

like a power law for x � L, like an exponential for x � L

Cosma Shalizi So, You Think You Have a Power Law?

Page 43: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Some Distributions Which Are Not Power Laws

Log-normal: lnX ∼ N (µ, σ2):

p(x) =1

(1− Φ( ln xmin−µσ ))x√2πσ2

e− (ln x−µ)2

2σ2

Stretched exponential/Weibull: X β ∼ Exponential(λ)

p(x) = βλeλxβminxβ−1e−λx

β

Power law with exponential cut-o� (�negative gamma�)

p(x) =1/L

Γ(1− α, xmin/L)(x/L)−αe−x/L

like a power law for x � L, like an exponential for x � L

Cosma Shalizi So, You Think You Have a Power Law?

Page 44: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

1 10 100 1000 10000

0.0

0.2

0.4

0.6

0.8

1.0

R^2 values from samples

black=Pareto, blue=lognormal500 replicates at each sample sizeSample size

R^2

●●●●

●●

●●●●●

●●●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●

●●●●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●

●●

●●●●●●●●●

●●●●

●●●●●●●●●

●●●●

●●●●●●

●●●

●●

●●●●●

●●

●●●●●●

●●●

●●

●●●●●●●●●

●●

●●●●●●●●●●●●●●●●

●●●●●●

●●

●●●

●●●

●●●●●●

●●●●

●●

●●●●●●●●●●

●●●●

●●●

●●●●●

●●●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●

●●

●●●●●●●●

●●●●●●●●●●

●●●●

●●

●●●●●●●●

●●

●●●●●●●●

●●

●●●

●●

●●●●●●●

●●●

●●●●●●●●●

●●●●●

●●●●●●●●●●●

●●

●●●●●●●●●●

●●●●●●●●

●●●●●●

●●●●●

●●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●

●●

●●●●●●●●●●

●●

●●●●●●●

●●●●●●

●●

●●

●●●●

●●●●●●●●●●

●●●●●●

●●●●●●●●

●●●●

●●

●●●●●

●●●●●●●●●●●●●●

●●

●●●●●

●●●

●●●

●●●●●●

●●●●

●●●●●●

●●●●●●

●●●●

●●

●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●

●●●●

●●●●●●●●●●●●●●

●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●

●●●●●

●●

●●●●●●

●●●●●●●

●●●●●●●

●●●●

●●●●●●●

●●●●●●●●●●●●

●●●●●●●●

●●

●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●●

●●●●

●●●

●●●●

●●●●●

●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●

●●●●●●●

●●

●●●●●●●●

●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●

●●●●●●●

●●

●●

●●●●●●●●●

●●

●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●

●●●●●●●●●●●●●●●●●

●●●

●●

●●●●●●

●●●●●●

●●

●●●●●●●

●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●

●●●●

●●●●●●●●

●●

●●●●●

●●●●

●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●

●●●●

●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●

●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●

●●

●●●●●●

●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●

●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●

●●

●●●●●●

●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●●●

●●●●●●

●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●

●●●●

●●●

●●●

●●●●●●●●●●

●●

●●

●●●

●●●●●

●●●●●

●●●

●●●●●●

●●●●

●●●●●

●●●●●●●●●●

●●●

●●●●●

●●●

●●

●●

●●

●●●●●

●●●●●●●●●●●●●

●●

●●●

●●

●●●●

●●●

●●●●●●●●●●

●●●

●●●●●●●

●●●

●●●●●●●●

●●●●●●●

●●

●●●

●●●●

●●●●●●

●●●●●●●●●

●●●●●●●

●●●

●●●●●●

●●

●●●●●

●●●●

●●●●●

●●●●●●●●●

●●●●●●

●●●●●

●●

●●●

●●●●

●●●

●●●●●●●

●●●

●●●●●●●●

●●●●

●●●

●●●●●

●●●●●

●●●●●●●●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●●●

●●●●●

●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●

●●

●●

●●●●●

●●●●●●●●

●●●

●●●●●●●

●●●●●●●●

●●

●●●●

●●●●

●●

●●●

●●●●●●

●●

●●●

●●

●●●●●●●

●●

●●

●●●●●●●●●●●

●●●

●●●●●●●

●●●●●

●●

●●●●●●

●●●●●●●

●●●

●●●●

●●

●●

●●●

●●●●

●●●●

●●

●●●●●●●

●●●●●

●●●●●●●●

●●

●●●●●

●●

●●●●●●

●●●●●●●●●●

●●●●●●●●●

●●●●●

●●

●●

●●●●●●

●●

●●●●●●●●

●●●●●●●●

●●●●●●●

●●

●●

●●

●●●●●●●●

●●●●●

●●●●●●●●●

●●

●●

●●

●●●●●●●●●●

●●●

●●●●●

●●●

●●●●●

●●●●●●●●●●

●●

●●●●●●●

●●

●●

●●●●●●●●●

●●●●

●●●●●●●●●●●

●●●●●●

●●●

●●●

●●●●●

●●●●●●●●●●●●

●●

●●●

●●●

●●●●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●

●●●●●

●●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●●●

●●

●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●●●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●●●●●●●

●●●

●●

●●●

●●●●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●●●●

●●●●●●

●●

●●●●

●●

●●●

●●●

●●●

●●●●

●●●●●

●●

●●

●●

●●●●●●●

●●

●●●●

●●●

●●●●

●●

●●

●●●●●●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●●●●●●●

●●

●●●●

●●●●

●●●

●●

●●●

●●●●●

●●

●●●

●●

●●●

●●●●●

●●●●●●●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●●●

●●

●●●

●●

●●

●●●

●●●●●●●

●●

●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●●

●●●

●●

●●●●●●

●●

●●

●●

●●

●●●●

●●●●●●

●●

●●

●●

●●●

●●●●●

●●

●●●●●

●●●●●●

●●

●●●●●

●●

●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

●●●

●●●●

●●●

●●●

●●●

●●

●●●

●●●

●●

●●

●●●●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●●

●●●●●●

●●●

●●

●●●

●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●●

●●●

●●●

●●

●●●●

●●

●●●

●●●

●●●●

●●●

●●

●●●

●●

●●

●●●●

●●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●

●●●●

●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●●

●●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

R2 for a log normal (limiting value > 0.9)Cosma Shalizi So, You Think You Have a Power Law?

Page 45: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Abusing linear regression makes the baby Gauss cry

Cosma Shalizi So, You Think You Have a Power Law?

Page 46: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Blogospheric Navel-Gazing

Shirky [5]: in-degree of weblogs follows a power-law, manyconsequences for media ecology, etc., etc.

Data via [6]

Cosma Shalizi So, You Think You Have a Power Law?

Page 47: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Blogospheric Navel-Gazing

Shirky [5]: in-degree of weblogs follows a power-law, manyconsequences for media ecology, etc., etc.

1 5 10 50 100 500 1000

5e-04

5e-03

5e-02

5e-01

In-degree distribution of weblogs, late 2003

In-degree

Sur

viva

l fun

ctio

n

Cosma Shalizi So, You Think You Have a Power Law?

Page 48: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

You Can Do Everything with Least Squares, Right?Actually, NoAlternative Distributions

Blogospheric Navel-Gazing

Shirky [5]: in-degree of weblogs follows a power-law, manyconsequences for media ecology, etc., etc.

1 5 10 50 100 500 1000

5e-04

5e-03

5e-02

5e-01

In-degree distribution of weblogs, late 2003

In-degree

Sur

viva

l fun

ctio

n

Cosma Shalizi So, You Think You Have a Power Law?

Page 49: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Estimating the Exponent

Use maximum likelihood

L(α, xmin) = n logα− 1

xmin

− αn∑

i=1

logxi

xmin

∂αL =

n

α− 1−

n∑i=1

logxi

xmin

α̂ = 1 +n∑n

i=1 log xi/xmin

Cosma Shalizi So, You Think You Have a Power Law?

Page 50: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Estimating the Exponent

Use maximum likelihood

L(α, xmin) = n logα− 1

xmin

− αn∑

i=1

logxi

xmin

∂αL =

n

α− 1−

n∑i=1

logxi

xmin

α̂ = 1 +n∑n

i=1 log xi/xmin

Cosma Shalizi So, You Think You Have a Power Law?

Page 51: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Estimating the Exponent

Use maximum likelihood

L(α, xmin) = n logα− 1

xmin

− αn∑

i=1

logxi

xmin

∂αL =

n

α− 1−

n∑i=1

logxi

xmin

α̂ = 1 +n∑n

i=1 log xi/xmin

Cosma Shalizi So, You Think You Have a Power Law?

Page 52: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Estimating the Exponent

Use maximum likelihood

L(α, xmin) = n logα− 1

xmin

− αn∑

i=1

logxi

xmin

∂αL =

n

α− 1−

n∑i=1

logxi

xmin

α̂ = 1 +n∑n

i=1 log xi/xmin

Cosma Shalizi So, You Think You Have a Power Law?

Page 53: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Properties of the MLE

Consistent: α̂→ α

Standard error: Var [α̂] = n−1(α− 1)2 + O(n−2)E�cient: no consistent alternative with less varianceIn particular, dominates regression

Asymptotically Gaussian: α̂ N (α, (α−1)2n )

Ancient: Worked out in the 1950s [7, 8]Computationally trivial

Cosma Shalizi So, You Think You Have a Power Law?

Page 54: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Properties of the MLE

Consistent: α̂→ αStandard error: Var [α̂] = n−1(α− 1)2 + O(n−2)

E�cient: no consistent alternative with less varianceIn particular, dominates regression

Asymptotically Gaussian: α̂ N (α, (α−1)2n )

Ancient: Worked out in the 1950s [7, 8]Computationally trivial

Cosma Shalizi So, You Think You Have a Power Law?

Page 55: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Properties of the MLE

Consistent: α̂→ αStandard error: Var [α̂] = n−1(α− 1)2 + O(n−2)E�cient: no consistent alternative with less varianceIn particular, dominates regression

Asymptotically Gaussian: α̂ N (α, (α−1)2n )

Ancient: Worked out in the 1950s [7, 8]Computationally trivial

Cosma Shalizi So, You Think You Have a Power Law?

Page 56: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Properties of the MLE

Consistent: α̂→ αStandard error: Var [α̂] = n−1(α− 1)2 + O(n−2)E�cient: no consistent alternative with less varianceIn particular, dominates regression

Asymptotically Gaussian: α̂ N (α, (α−1)2n )

Ancient: Worked out in the 1950s [7, 8]Computationally trivial

Cosma Shalizi So, You Think You Have a Power Law?

Page 57: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Properties of the MLE

Consistent: α̂→ αStandard error: Var [α̂] = n−1(α− 1)2 + O(n−2)E�cient: no consistent alternative with less varianceIn particular, dominates regression

Asymptotically Gaussian: α̂ N (α, (α−1)2n )

Ancient: Worked out in the 1950s [7, 8]

Computationally trivial

Cosma Shalizi So, You Think You Have a Power Law?

Page 58: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Properties of the MLE

Consistent: α̂→ αStandard error: Var [α̂] = n−1(α− 1)2 + O(n−2)E�cient: no consistent alternative with less varianceIn particular, dominates regression

Asymptotically Gaussian: α̂ N (α, (α−1)2n )

Ancient: Worked out in the 1950s [7, 8]Computationally trivial

Cosma Shalizi So, You Think You Have a Power Law?

Page 59: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

α̂ depends on xmin; �Hill� plot [9]

1 5 10 50 100 500 1000

05

1015

2025

3035

Hill Plot for weblog in-degree

xmin

α̂

Cosma Shalizi So, You Think You Have a Power Law?

Page 60: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

α̂ depends on xmin; �Hill� plot [9]

1 5 10 50 100 500 1000

05

1015

2025

3035

Hill Plot for weblog in-degree

xmin

α̂

Cosma Shalizi So, You Think You Have a Power Law?

Page 61: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Estimating the Scaling Region

Maximizing likelihood over xmin leads to trouble (try it and see)

Only want the scaling region in the tail anywayMinimize discrepancy between �tted and empirical distributions[10]:

x̂min = argminxmin

maxx≥xmin

|P̂n(x)− P(x ; α̂, xmin)|

= argminxmin

dKS(P̂n,P(α̂, xmin))

Cosma Shalizi So, You Think You Have a Power Law?

Page 62: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Estimating the Scaling Region

Maximizing likelihood over xmin leads to trouble (try it and see)Only want the scaling region in the tail anyway

Minimize discrepancy between �tted and empirical distributions[10]:

x̂min = argminxmin

maxx≥xmin

|P̂n(x)− P(x ; α̂, xmin)|

= argminxmin

dKS(P̂n,P(α̂, xmin))

Cosma Shalizi So, You Think You Have a Power Law?

Page 63: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Estimating the Scaling Region

Maximizing likelihood over xmin leads to trouble (try it and see)Only want the scaling region in the tail anywayMinimize discrepancy between �tted and empirical distributions[10]:

x̂min = argminxmin

maxx≥xmin

|P̂n(x)− P(x ; α̂, xmin)|

= argminxmin

dKS(P̂n,P(α̂, xmin))

Cosma Shalizi So, You Think You Have a Power Law?

Page 64: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

top 2.8%

0 500 1000 1500 2000

0.0

0.2

0.4

0.6

0.8

1.0

xmin

d KS

Cosma Shalizi So, You Think You Have a Power Law?

Page 65: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

top 2.8%

0 500 1000 1500 2000

0.0

0.2

0.4

0.6

0.8

1.0

xmin

d KS

268

Cosma Shalizi So, You Think You Have a Power Law?

Page 66: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

1 5 10 50 100 500 1000

5e-04

5e-03

5e-02

5e-01

In-degree distribution of weblogs, late 2003

In-degree

Sur

viva

l fun

ctio

n

Cosma Shalizi So, You Think You Have a Power Law?

Page 67: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

1 5 10 50 100 500 1000

5e-04

5e-03

5e-02

5e-01

In-degree distribution of weblogs, late 2003

In-degree

Sur

viva

l fun

ctio

n

Cosma Shalizi So, You Think You Have a Power Law?

Page 68: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

1 5 10 50 100 500 1000

5e-04

5e-03

5e-02

5e-01

In-degree distribution of weblogs, late 2003

In-degree

Sur

viva

l fun

ctio

n

Cosma Shalizi So, You Think You Have a Power Law?

Page 69: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Goodness-of-Fit

How can we tell if it's a good �t or not, if we can't use R2?

You shouldn't use R2 that way for a regression

Use a goodness-of-�t test!Kolmogorov-Smirnov statistic is nice: for CDFs P,Q

dKS(P,Q) = maxx|P(x)− Q(x)|

Compare empirical CDF to theoretical oneTabulated p-values, assuming the theoretical CDF isn't estimatedAnalytic corrections via heroic probability theory [11, pp. 99�]or, use the bootstrap, like a civilized person

Cosma Shalizi So, You Think You Have a Power Law?

Page 70: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Goodness-of-Fit

How can we tell if it's a good �t or not, if we can't use R2?You shouldn't use R2 that way for a regression

Use a goodness-of-�t test!Kolmogorov-Smirnov statistic is nice: for CDFs P,Q

dKS(P,Q) = maxx|P(x)− Q(x)|

Compare empirical CDF to theoretical oneTabulated p-values, assuming the theoretical CDF isn't estimatedAnalytic corrections via heroic probability theory [11, pp. 99�]or, use the bootstrap, like a civilized person

Cosma Shalizi So, You Think You Have a Power Law?

Page 71: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Goodness-of-Fit

How can we tell if it's a good �t or not, if we can't use R2?You shouldn't use R2 that way for a regression

Use a goodness-of-�t test!

Kolmogorov-Smirnov statistic is nice: for CDFs P,Q

dKS(P,Q) = maxx|P(x)− Q(x)|

Compare empirical CDF to theoretical oneTabulated p-values, assuming the theoretical CDF isn't estimatedAnalytic corrections via heroic probability theory [11, pp. 99�]or, use the bootstrap, like a civilized person

Cosma Shalizi So, You Think You Have a Power Law?

Page 72: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Goodness-of-Fit

How can we tell if it's a good �t or not, if we can't use R2?You shouldn't use R2 that way for a regression

Use a goodness-of-�t test!Kolmogorov-Smirnov statistic is nice: for CDFs P,Q

dKS(P,Q) = maxx|P(x)− Q(x)|

Compare empirical CDF to theoretical one

Tabulated p-values, assuming the theoretical CDF isn't estimatedAnalytic corrections via heroic probability theory [11, pp. 99�]or, use the bootstrap, like a civilized person

Cosma Shalizi So, You Think You Have a Power Law?

Page 73: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Goodness-of-Fit

How can we tell if it's a good �t or not, if we can't use R2?You shouldn't use R2 that way for a regression

Use a goodness-of-�t test!Kolmogorov-Smirnov statistic is nice: for CDFs P,Q

dKS(P,Q) = maxx|P(x)− Q(x)|

Compare empirical CDF to theoretical oneTabulated p-values, assuming the theoretical CDF isn't estimated

Analytic corrections via heroic probability theory [11, pp. 99�]or, use the bootstrap, like a civilized person

Cosma Shalizi So, You Think You Have a Power Law?

Page 74: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Goodness-of-Fit

How can we tell if it's a good �t or not, if we can't use R2?You shouldn't use R2 that way for a regression

Use a goodness-of-�t test!Kolmogorov-Smirnov statistic is nice: for CDFs P,Q

dKS(P,Q) = maxx|P(x)− Q(x)|

Compare empirical CDF to theoretical oneTabulated p-values, assuming the theoretical CDF isn't estimatedAnalytic corrections via heroic probability theory [11, pp. 99�]

or, use the bootstrap, like a civilized person

Cosma Shalizi So, You Think You Have a Power Law?

Page 75: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Goodness-of-Fit

How can we tell if it's a good �t or not, if we can't use R2?You shouldn't use R2 that way for a regression

Use a goodness-of-�t test!Kolmogorov-Smirnov statistic is nice: for CDFs P,Q

dKS(P,Q) = maxx|P(x)− Q(x)|

Compare empirical CDF to theoretical oneTabulated p-values, assuming the theoretical CDF isn't estimatedAnalytic corrections via heroic probability theory [11, pp. 99�]or, use the bootstrap, like a civilized person

Cosma Shalizi So, You Think You Have a Power Law?

Page 76: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Given: n data points x1:n1 Estimate α and xmin; ntail = # of data points ≥ xmin

2 Calculate dKS for data and best-�t power law = d∗

3 Draw n random values b1, . . . bn as follows:1 with probability ntail/n, draw from power-law2 otherwise, pick one of the xi < xmin uniformly

4 Find α̂, x̂min, dKS for b1:n5 Repeat many times to get distribution of dKS values

6 p-value = fraction of simulations where d ≥ d∗

For the blogs: p = 6.6× 10−2

Cosma Shalizi So, You Think You Have a Power Law?

Page 77: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Testing Against Alternatives

Compare against alternatives: more statistical power, moresubstantive information

∗IC is sub-optimal hereBetter: Vuong's normalized log-likelihood-ratio test [12]Two models, θ, ψ

R(ψ, θ) = log pψ(x1:n)− log pθ(x1:n)

R(ψ, θ) > 0 means: the data were more likely under ψ than under θHow much more likely do they need to be?

Cosma Shalizi So, You Think You Have a Power Law?

Page 78: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Testing Against Alternatives

Compare against alternatives: more statistical power, moresubstantive information∗IC is sub-optimal here

Better: Vuong's normalized log-likelihood-ratio test [12]Two models, θ, ψ

R(ψ, θ) = log pψ(x1:n)− log pθ(x1:n)

R(ψ, θ) > 0 means: the data were more likely under ψ than under θHow much more likely do they need to be?

Cosma Shalizi So, You Think You Have a Power Law?

Page 79: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Testing Against Alternatives

Compare against alternatives: more statistical power, moresubstantive information∗IC is sub-optimal hereBetter: Vuong's normalized log-likelihood-ratio test [12]

Two models, θ, ψ

R(ψ, θ) = log pψ(x1:n)− log pθ(x1:n)

R(ψ, θ) > 0 means: the data were more likely under ψ than under θHow much more likely do they need to be?

Cosma Shalizi So, You Think You Have a Power Law?

Page 80: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Testing Against Alternatives

Compare against alternatives: more statistical power, moresubstantive information∗IC is sub-optimal hereBetter: Vuong's normalized log-likelihood-ratio test [12]Two models, θ, ψ

R(ψ, θ) = log pψ(x1:n)− log pθ(x1:n)

R(ψ, θ) > 0 means: the data were more likely under ψ than under θHow much more likely do they need to be?

Cosma Shalizi So, You Think You Have a Power Law?

Page 81: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Distribution of Likelihood Ratios: Fixed Models

Assume X1,X2, . . . all IID, with true distribution νFix θ and ψ; what is distribution of n−1R(ψ, θ)?

n−1R(ψ, θ) =log pψ(x1:n)− log pθ(x1:n)

n

=1

n

n∑i=1

logpψ(xi )

pθ(xi )

mean of IID terms so use law of large numbers:

1

nR(ψ, θ)→ Eν

[log

pψ(X )

pθ(X )

]= D(ν‖θ)− D(ν‖ψ)

R(ψ, θ) > 0 ≈ ψ diverges less from ν than θ does

Cosma Shalizi So, You Think You Have a Power Law?

Page 82: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Distribution of Likelihood Ratios: Fixed Models

Assume X1,X2, . . . all IID, with true distribution νFix θ and ψ; what is distribution of n−1R(ψ, θ)?

n−1R(ψ, θ) =log pψ(x1:n)− log pθ(x1:n)

n

=1

n

n∑i=1

logpψ(xi )

pθ(xi )

mean of IID terms so use law of large numbers:

1

nR(ψ, θ)→ Eν

[log

pψ(X )

pθ(X )

]= D(ν‖θ)− D(ν‖ψ)

R(ψ, θ) > 0 ≈ ψ diverges less from ν than θ does

Cosma Shalizi So, You Think You Have a Power Law?

Page 83: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Distribution of Likelihood Ratios: Fixed Models

Assume X1,X2, . . . all IID, with true distribution νFix θ and ψ; what is distribution of n−1R(ψ, θ)?

n−1R(ψ, θ) =log pψ(x1:n)− log pθ(x1:n)

n

=1

n

n∑i=1

logpψ(xi )

pθ(xi )

mean of IID terms so use law of large numbers:

1

nR(ψ, θ)→ Eν

[log

pψ(X )

pθ(X )

]= D(ν‖θ)− D(ν‖ψ)

R(ψ, θ) > 0 ≈ ψ diverges less from ν than θ does

Cosma Shalizi So, You Think You Have a Power Law?

Page 84: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Distribution of Likelihood Ratios: Fixed Models

Assume X1,X2, . . . all IID, with true distribution νFix θ and ψ; what is distribution of n−1R(ψ, θ)?

n−1R(ψ, θ) =log pψ(x1:n)− log pθ(x1:n)

n

=1

n

n∑i=1

logpψ(xi )

pθ(xi )

mean of IID terms so use law of large numbers:

1

nR(ψ, θ)→ Eν

[log

pψ(X )

pθ(X )

]= D(ν‖θ)− D(ν‖ψ)

R(ψ, θ) > 0 ≈ ψ diverges less from ν than θ does

Cosma Shalizi So, You Think You Have a Power Law?

Page 85: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Use CLT:

1√nR(ψ, θ) N (

√n(D(ν‖θ)− D(ν‖ψ)), ω2

ψ,θ)

where

ω2ψ,θ = Var

[log

pψ(X )

pθ(X )

]so if the models are equally good, we get a mean-zero Gaussianbut if one is better R(ψ, θ)→ ±∞, depending

Cosma Shalizi So, You Think You Have a Power Law?

Page 86: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Distribution of R with Estimated Models

two classes of models Ψ,Θ; ψ̂, θ̂ = ML estimated modelsψ̂ → ψ∗, θ̂ → θ∗: converging to pseudo-truth; ψ∗ 6= θ∗

some regularity assumptions

Everything works out as if no estimation:

1√nR(ψ̂, θ̂) N (

√n(D(ν‖θ∗)− D(ν‖ψ∗)), ω2

ψ∗,θ∗)

1

nR(ψ̂, θ̂) → D(ν‖θ∗)− D(ν‖ψ∗)

ω̂2 ≡ Varsample

[log

pψ(X )

pθ(X )

]→ ω2

ψ∗,θ∗

Cosma Shalizi So, You Think You Have a Power Law?

Page 87: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Distribution of R with Estimated Models

two classes of models Ψ,Θ; ψ̂, θ̂ = ML estimated modelsψ̂ → ψ∗, θ̂ → θ∗: converging to pseudo-truth; ψ∗ 6= θ∗

some regularity assumptionsEverything works out as if no estimation:

1√nR(ψ̂, θ̂) N (

√n(D(ν‖θ∗)− D(ν‖ψ∗)), ω2

ψ∗,θ∗)

1

nR(ψ̂, θ̂) → D(ν‖θ∗)− D(ν‖ψ∗)

ω̂2 ≡ Varsample

[log

pψ(X )

pθ(X )

]→ ω2

ψ∗,θ∗

Cosma Shalizi So, You Think You Have a Power Law?

Page 88: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Vuong's Test for Non-Nested Model Classes

Assume all conditions from before

If the two models are really equally close to the truth,

R√nω̂2

N (0, 1)

but if one is better, normalized log likelihood ratio goes to ±∞,telling you which is better

Don't need to adjust for parameter #, but any o(n)adjustment is �ne; [13] is probably better than ∗ICDoes not assume that truth is in either Ψ or Θ

Does assume ψ∗ 6= θ∗

Cosma Shalizi So, You Think You Have a Power Law?

Page 89: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Vuong's Test for Non-Nested Model Classes

Assume all conditions from beforeIf the two models are really equally close to the truth,

R√nω̂2

N (0, 1)

but if one is better, normalized log likelihood ratio goes to ±∞,telling you which is better

Don't need to adjust for parameter #, but any o(n)adjustment is �ne; [13] is probably better than ∗ICDoes not assume that truth is in either Ψ or Θ

Does assume ψ∗ 6= θ∗

Cosma Shalizi So, You Think You Have a Power Law?

Page 90: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Vuong's Test for Non-Nested Model Classes

Assume all conditions from beforeIf the two models are really equally close to the truth,

R√nω̂2

N (0, 1)

but if one is better, normalized log likelihood ratio goes to ±∞,telling you which is better

Don't need to adjust for parameter #, but any o(n)adjustment is �ne; [13] is probably better than ∗ICDoes not assume that truth is in either Ψ or Θ

Does assume ψ∗ 6= θ∗

Cosma Shalizi So, You Think You Have a Power Law?

Page 91: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Back to Blogs

Fit a log-normal to the same tail (to give the advantage to powerlaw)

R(power law, log − normal) = −0.85ω̂ = 0.098

R√nω̂2

= −0.83

so the log-normal �ts better, but not by much � we'd see�uctuations at least that big 41% of the time if they were equallygood

Cosma Shalizi So, You Think You Have a Power Law?

Page 92: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Fitting a log-normal to the complete data

1 5 10 50 100 500 1000

5e-04

5e-03

5e-02

5e-01

In-degree distribution of weblogs, late 2003

In-degree

Sur

viva

l fun

ctio

n

Cosma Shalizi So, You Think You Have a Power Law?

Page 93: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Fitting a log-normal to the complete data

1 5 10 50 100 500 1000

5e-04

5e-03

5e-02

5e-01

In-degree distribution of weblogs, late 2003

In-degree

Sur

viva

l fun

ctio

n

Cosma Shalizi So, You Think You Have a Power Law?

Page 94: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Fitting a log-normal to the complete data

1 5 10 50 100 500 1000

5e-04

5e-03

5e-02

5e-01

In-degree distribution of weblogs, late 2003

In-degree

Sur

viva

l fun

ctio

n

Cosma Shalizi So, You Think You Have a Power Law?

Page 95: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Visualization

Beyond the log-log plot: Handcock and Morris's relativedistribution [14, 15]Compare two whole distributions, not just mean/variance etc.

Have a reference distribution, CDF F0 (or just a reference

sample) and a comparison sample y1, . . . ynConstruct relative data

ri = F0(yi )

relative CDF:G (r) = F (F−10 (r))

relative density

g(r) =f (F−10 (r))

f0(F−10 (r))

Cosma Shalizi So, You Think You Have a Power Law?

Page 96: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Visualization

Beyond the log-log plot: Handcock and Morris's relativedistribution [14, 15]Compare two whole distributions, not just mean/variance etc.Have a reference distribution, CDF F0 (or just a reference

sample) and a comparison sample y1, . . . ynConstruct relative data

ri = F0(yi )

relative CDF:G (r) = F (F−10 (r))

relative density

g(r) =f (F−10 (r))

f0(F−10 (r))

Cosma Shalizi So, You Think You Have a Power Law?

Page 97: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Relative data are uniform ⇔ distributions are the same

g(r) tells us where and how the distributions di�er

Can estimate G (r) by empirical CDF of ri

Can estimate g(r) by non-parametric density estimation on ri

Invariant under any monotone transformation of the data(multiplication, taking logs, etc.)

Related to Neyman's smooth test of goodness-of-�t

Can adjust for covariates �exibly [15]

R package: reldist, from CRAN

Cosma Shalizi So, You Think You Have a Power Law?

Page 98: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Relative Distribution with Power Laws

1 Estimate power law distribution from data

2 Use that as the reference distribution

Cosma Shalizi So, You Think You Have a Power Law?

Page 99: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Estimating the ExponentEstimating the Scaling RegionGoodness-of-FitTesting Against AlternativesVisualization

Reference proportion

Rel

ativ

e D

ensi

ty

0.2

0.4

0.6

0.8

1.0

1.2

1.4

0.0 0.2 0.4 0.6 0.8 1.0

270 300 340 410 560 2100

Cosma Shalizi So, You Think You Have a Power Law?

Page 100: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

How Bad Is the Literature?

[10] looked at 24 claimed power laws

word frequency, protein interaction degree (yeast), metabolic network

degree (E. coli), Internet autonomous system network, calls received,

intensity of wars, terrorist attack fatalities, bytes per HTTP request,

species per genus, # sightings per bird species, population a�ected by

blackouts, sales of best-sellers, population of US cities, area of wild�res,

solar �are intensity, earthquake magnitude, religious sect size, surname

frequency, individual net worth, citation counts, # papers authored, #

hits per URL, in-degree per URL, # entries in e-mail address books

Of these, the only clear power law is word frequencyThe rest: indistinguishable from log-normal and/or stretchedexponential; and/or cut-o� signi�cantly better than pure powerlaw; and/or goodness-of-�t is just horrible

Cosma Shalizi So, You Think You Have a Power Law?

Page 101: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

How Bad Is the Literature?

[10] looked at 24 claimed power laws

word frequency, protein interaction degree (yeast), metabolic network

degree (E. coli), Internet autonomous system network, calls received,

intensity of wars, terrorist attack fatalities, bytes per HTTP request,

species per genus, # sightings per bird species, population a�ected by

blackouts, sales of best-sellers, population of US cities, area of wild�res,

solar �are intensity, earthquake magnitude, religious sect size, surname

frequency, individual net worth, citation counts, # papers authored, #

hits per URL, in-degree per URL, # entries in e-mail address books

Of these, the only clear power law is word frequencyThe rest: indistinguishable from log-normal and/or stretchedexponential; and/or cut-o� signi�cantly better than pure powerlaw; and/or goodness-of-�t is just horrible

Cosma Shalizi So, You Think You Have a Power Law?

Page 102: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

How Bad Is the Literature?

[10] looked at 24 claimed power laws

word frequency, protein interaction degree (yeast), metabolic network

degree (E. coli), Internet autonomous system network, calls received,

intensity of wars, terrorist attack fatalities, bytes per HTTP request,

species per genus, # sightings per bird species, population a�ected by

blackouts, sales of best-sellers, population of US cities, area of wild�res,

solar �are intensity, earthquake magnitude, religious sect size, surname

frequency, individual net worth, citation counts, # papers authored, #

hits per URL, in-degree per URL, # entries in e-mail address books

Of these, the only clear power law is word frequency

The rest: indistinguishable from log-normal and/or stretchedexponential; and/or cut-o� signi�cantly better than pure powerlaw; and/or goodness-of-�t is just horrible

Cosma Shalizi So, You Think You Have a Power Law?

Page 103: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

How Bad Is the Literature?

[10] looked at 24 claimed power laws

word frequency, protein interaction degree (yeast), metabolic network

degree (E. coli), Internet autonomous system network, calls received,

intensity of wars, terrorist attack fatalities, bytes per HTTP request,

species per genus, # sightings per bird species, population a�ected by

blackouts, sales of best-sellers, population of US cities, area of wild�res,

solar �are intensity, earthquake magnitude, religious sect size, surname

frequency, individual net worth, citation counts, # papers authored, #

hits per URL, in-degree per URL, # entries in e-mail address books

Of these, the only clear power law is word frequencyThe rest: indistinguishable from log-normal and/or stretchedexponential; and/or cut-o� signi�cantly better than pure powerlaw; and/or goodness-of-�t is just horrible

Cosma Shalizi So, You Think You Have a Power Law?

Page 104: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

What's Bad About Hallucinating Power Laws?

Scientists should not try to explain things which don't happen

e.g., a dozen years of theorizing why animal foraging patterns should follow a power

law, after [16], when they don't [17]

Decision-makers waste resources planning for power laws whichdon't exist

Cosma Shalizi So, You Think You Have a Power Law?

Page 105: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

What's Bad About Hallucinating Power Laws?

Scientists should not try to explain things which don't happene.g., a dozen years of theorizing why animal foraging patterns should follow a power

law, after [16], when they don't [17]

Decision-makers waste resources planning for power laws whichdon't exist

Cosma Shalizi So, You Think You Have a Power Law?

Page 106: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

What's Bad About Hallucinating Power Laws?

Scientists should not try to explain things which don't happene.g., a dozen years of theorizing why animal foraging patterns should follow a power

law, after [16], when they don't [17]

Decision-makers waste resources planning for power laws whichdon't exist

Cosma Shalizi So, You Think You Have a Power Law?

Page 107: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Does It Really Matter Whether It's a Power Law?

Maybe all that matters is that the distribution has a heavy tailProbably true for Shirky

Then don't say that it's a power lawDo look at density estimation methods for heavy-tailed distributions[18, 19]

Data-independent transformation from [0,∞) to [0, 1]

Nonparametric density estimate on [0, 1]

Inverse transform

Cosma Shalizi So, You Think You Have a Power Law?

Page 108: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Does It Really Matter Whether It's a Power Law?

Maybe all that matters is that the distribution has a heavy tailProbably true for Shirky

Then don't say that it's a power law

Do look at density estimation methods for heavy-tailed distributions[18, 19]

Data-independent transformation from [0,∞) to [0, 1]

Nonparametric density estimate on [0, 1]

Inverse transform

Cosma Shalizi So, You Think You Have a Power Law?

Page 109: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

Does It Really Matter Whether It's a Power Law?

Maybe all that matters is that the distribution has a heavy tailProbably true for Shirky

Then don't say that it's a power lawDo look at density estimation methods for heavy-tailed distributions[18, 19]

Data-independent transformation from [0,∞) to [0, 1]

Nonparametric density estimate on [0, 1]

Inverse transform

Cosma Shalizi So, You Think You Have a Power Law?

Page 110: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

The Correct Line

1 Lots of distributions give straightish log-log plots

2 Regression on log-log plots is bad; don't do it, and don'tbelieve those who do it.

3 Use maximum likelihood to estimate the scaling exponent

4 Use goodness of �t to estimate the scaling region

5 Use goodness of �t tests to check goodness of �t

6 Use Vuong's test to check alternatives

7 Ask yourself whether you really care

Cosma Shalizi So, You Think You Have a Power Law?

Page 111: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

[1] Herbert A. Simon. On a class of skew distribution functions.Biometrika, 42:425�440, 1955. URLhttp://www.jstor.org/pss/2333389.

[2] Yuji Ijiri and Herbert A. Simon. Skew Distributions and the

Sizes of Business Firms. North-Holland, Amsterdam, 1977.With Charles P. Bonini and Theodore A. van Wormer.

[3] William J. Reed and Barry D. Hughes. From gene families andgenera to incomes and Internet �le sizes: Why power laws areso common in nature. Physical Review E, 66:067103, 2002.doi: 10.1103/PhysRevE.66.067103.

[4] B. A. Maguire, E. S. Pearson, and A. H. A. Wynn. The timeintervals between industrial accidents. Biometrika, 39:168�180, 1952. URL http://www.jstor.org/pss/2332475.

[5] Clay Shirky. Power laws, weblogs, and inequality. In MitchRatcli�e and Jon Lebkowsky, editors, Extreme Democracy,

Cosma Shalizi So, You Think You Have a Power Law?

Page 112: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

forthcoming. URL http:

//www.shirky.com.writings/powerlaw_weblog.html.

[6] Henry Farrell and Daniel Drezner. The power and politics ofblogs. Public Choice, 134:15�30, 2008. URL http://www.

utsc.utoronto.ca/~farrell/blogpaperfinal.pdf.

[7] A. N. M. Muniruzzaman. On measures of location anddispersion and tests of hypotheses in a Pareto population.Bulletin of the Calcutta Statistical Association, 7:115�123,1957.

[8] H. L. Seal. The maximum likelihood �tting of the discretePareto law. Journal of the Institute of Actuaries, 78:115�121,1952. URL http://www.actuaries.org.uk/files/pdf/

library/JIA-078/0115-0121.pdf.

[9] B. M. Hill. A simple general approach to inference about thetail of a distribution. Annals of Statistics, 3:1163�1174, 1975.

Cosma Shalizi So, You Think You Have a Power Law?

Page 113: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

URLhttp://projecteuclid.org/euclid.aos/1176343247.

[10] Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman.Power-law distributions in empirical data. SIAM Review, 51:661�703, 2009. URL http://arxiv.org/abs/0706.1062.

[11] David Pollard. Convergence of Stochastic Processes. SpringerSeries in Statistics. Springer-Verlag, New York, 1984. URLhttp://www.stat.yale.edu/~pollard/1984book/.

[12] Quang H. Vuong. Likelihood ratio tests for model selectionand non-nested hypotheses. Econometrica, 57:307�333, 1989.URL http://www.jstor.org/pss/1912557.

[13] Hwan-sik Choi and Nicholas M. Kiefer. Di�erential geometryand bias correction in nonnested hypothesis testing. Onlinepreprint, 2006. URL http://www.arts.cornell.edu/econ/

kiefer/GeometryMS6.pdf.

Cosma Shalizi So, You Think You Have a Power Law?

Page 114: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

[14] Mark S. Handcock and Martina Morris. Relative distributionmethods. Sociological Methodology, 28:53�97, 1998. URLhttp://www.jstor.org/pss/270964.

[15] Mark S. Handcock and Martina Morris. Relative DistributionMethods in the Social Sciences. Springer-Verlag, Berlin, 1999.

[16] G. M. Viswanathan, V. Afanasyev, S. V. Buldyrev, E. J.Murphy, P. A. Prince, and H. E. Stanley. Lévy �ight searchpatterns of wandering albatrosses. Nature, 381:413�415, 1996.URLhttp://polymer.bu.edu/hes/articles/vabmps96.pdf.

[17] Andrew M. Edwards, Richard A. Phillips, Nicholas W.Watkins, Mervyn P. Freeman, Eugene J. Murphy, VsevolodAfanasyev, Sergey V. Buldyrev, M. G. E. da Luz, E. P.Raposo, H. Eugene Stanley, and Gandhimohan M.Viswanathan. Revisiting lévy �ight search patterns ofwandering albatrosses, bumblebees and deer. Nature, 449:

Cosma Shalizi So, You Think You Have a Power Law?

Page 115: So, You Think You Have a Power Law, Do You? Well Isn't ...cshalizi/2010-10-18-Meetup.pdf · So, ouY Think ouY Have a Power Law, Do ou?Y Well Isn't That Special? Cosma Shalizi Statistics

Power Laws: What? So What?Bad Practices

Better PracticesNo Really, So What?

References

1044�1048, 2007. doi: 10.1038/nature06199. URL http:

//polymer.bu.edu/hes/articles/epwfmabdrsgv07.pdf.

[18] Natalia M. Markovitch and Udo R. Krieger. Nonparametricestimation of long-tailed density functions and its applicationto the analysis of World Wide Web tra�c. PerformanceEvaluation, 42:205�222, 2000. doi:10.1016/S0166-5316(00)00031-6.

[19] Natalia Markovich. Nonparametric Analysis of UnivariateHeavy-Tailed Data: Research and Practice. John Wiley, NewYork, 2007.

Cosma Shalizi So, You Think You Have a Power Law?