Top Banner
An introduction to power laws Colin Gillespie November 15, 2012
53
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1.An introduction to power laws Colin Gillespie November 15, 2012

2. Talk outline1 Introduction to power laws2 Distributional properties3 Parameter inference4 Power law generating mechanismshttp://xkcd.com/ 3. Classic example: distribution of US citiesq2000 No. of Cities1500 Some data sets vary over enormous1000 range 500 US towns & cities: q q q qqq qq qq q q q q qDufeld (pop 52)0 q qqq qq q q5New York City (pop 8 mil) 104.510105.5 106 106.5 107 City population The data is highly right-skewed Cumulative No. of Citiesq When the data is plotted on a103 q logarithmic scale, it seems to follow a qq102 q straight lineqqq qqqqqqqq This observation is attributed to Zipf 101q q qq q qq100 q105 105.5106 106.5City population 4. Distribution of world citiesWorld city populations for 8 countrieslogsize vs logrank107.5 New York Mumbai (Bombay) So PauloDelhiDjakartaLos AngelesShanghaiKolkata (Calcutta) Moscou Lagos Log Population Pkin (Beijing)Rio de Janeiro ChicagoRuhr7 10Hong Kong (Xianggang) WashingtonChongqing ChennaiBoston San Francisco San Jos (Madras)ShenyangDallas Fort WorthTianjinBangaloreHyderabad Philadelphie Detroit Bandung Houston MiamiCanton (Guangzhou) Atlanta Ahmadabad Belo Horizonte PuneSan Diego Tijuana IbadanXian SaintPetersbourg HarbinWuhan ShantouChengdu HangzhouPhoenix Kano Nanjing Medan SaintPetersburg Seattle Tampa Alegre Berlin Surabaya Porto Recife Minneapolis Salvador CuritibaKanpur Jinan106.5 BrasiliaFortaleza CincinnatiCleveland Hambourg Francfort Surat Changchun Jaipur Lucknow Denver Shijiazhuang SaintLouis Dalian TaiyuanZiboBrownsville McAllen Matamoros Reynosa Orlando Nagpur PatnaCampinas Portland Ciudad Juarez Qingdao TangshanEl PasoGuiyang PittsburghKunming Sacramento Charlotte BelemMunich Stuttgart CityAnshan Salt LakeChangsha BninWuxi ZhengzhouNanchang PalembangGoinia San AntonioIndianapolis Kansas City Columbus IndoreLas Vegas Mirat HarcourtKaduna JilinLanzhou Port Niznij Novgorod Santos Pandang (Macassar)Manaus Oshogbo Raleigh VadodaraUjungBhopal CirebonXinyang Nashik Bhubaneswar Ludhiana Beach Norfolk Core du Nord) Durham Agra ZhanjiangVirginia Austin Coimbatore Nashville DandongSinuiju (Chine VitoriaGreensboro WinstonSalemXuzhouLuoyang Yogyakarta VisakhapatnamUrumqi Nanning Semarang Tanjungkarang (Bandar Lampung)Fuzhou (Bnars) Kochi MannheimHuainanVaranasi Rajkot NovosibirskBielefeldBaotou Aba Volgograd Onitsha Suzhou Hefei QiqiharDenpasar SamaraHandan LeipzigHalleSo Luis Louisville GrandAsansolRostov Madurai Datong Rapids Iekaterinburg Allahabad Bengbu Mataram JacksonvilleNingboGreenville Jamshedpur Memphis City Spartanburg OklahomaNatal Surakarta Jabalpur Richmond Tcheliabinsk BirminghamWenzhouNuremberg TegalDhanbad MaisuruChemnitzZwickauRongchengOgbomoshoAmritsar Brme BuffaloMaceio AurangabadHohhot NouvelleOrlansRochesterMaiduguri Daqing ZhangjiakouTeresinaVijayawada SaarbruckForbach Hanovre Albany (France)Omsk Abuja Bhilai AomenSholapurSaratovKazanBaodingSrinagar Dresde Pingxiang Thiruvananthapuram Benxi Pessoa Zhenjiang Xianyang 106 Chandigarh RanchiGuwahati FresnoKrasnojarsk JoaoKozhikkod KnoxvilleUfa Samarinda Malang Ilorin Tucson100100.5 101101.5102 Log Rankhttp://brenocon.com/blog/2009/05/zipfs-law-and-world-city-populations/ 5. What does it mean?Let p (x )dx be the fraction of cities with a population between x andx + dxIf this histogram is a straight line on log log scales, then ln p (x ) = ln x + cwhere and c are constantsHence p (x ) = Cx where C = ec 6. What does it mean?Let p (x )dx be the fraction of cities with a population between x andx + dxIf this histogram is a straight line on log log scales, then ln p (x ) = ln x + cwhere and c are constantsHence p (x ) = Cx where C = ecDistributions of this form are said to follow a power lawThe constant is called the exponent of the power lawWe typically dont care about c. 7. The power law distribution Namef (x )Notes Power law x Pareto distribution Exponential e x 1(ln x )2 log-normalx exp(2 2 ) Power law x Zeta distribution Power law x x = 1, . . . , n, Zipfs dist (x ) Yule (x + ) Poisson x /x ! 8. Alleged power-law phenomena The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville The numbers of customers affected in electrical blackouts in the United States between 1984 and 2002 The number of links to web sites found in a 1997 web crawl of about 200 million web pages 9. Alleged power-law phenomena The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville The numbers of customers affected in electrical blackouts in the United States between 1984 and 2002 The number of links to web sites found in a 1997 web crawl of about 200 million web pages The number of hits on web pages The number of papers scientist write The number of citations received by papers Annual incomes Sales of books, music; in fact anything that can be sold 10. Zipf plots BlackoutsFires Flares100102104106108 1P(x) Moby DickTerrorism Web links100102104106108 100 102104 106 100 102 104 106 100 102104 106x 11. Distributional properties 12. The power law distributionThe power-law distribution isp (x ) x where , the scaling parameter, is a constantThe scaling parameter typically lies in the range 2 < < 3, althoughthere are some occasional exceptionsTypically, the entire process doesnt obey a power lawInstead, the power law applies only for values greater than someminimum xmin 13. Power law: PDF & CDF 1.50 1.75 2.00 2.25 2.50For the continuous PL, the pdf is1.5 PDF1 x1.0 p (x ) =xmin xmin0.5where > 1 and xmin > 0.0.0 CDFThe CDF is:1.5 +1 1.0x P (x ) = 1 xmin0.5 0.0 0.0 2.55.07.5 10.0 x 14. Power law: PDF & CDFFor the discrete power law, the pmf is1.50 1.75 2.00 2.25 2.50PDF x 1.5 p (x ) = (, xmin )1.0where 0.5 0.0 (, xmin ) = (n + xmin ) 1.5CDFn =01.0is the generalised zeta function0.5When xmin = 1, (, 1) is the standardzeta function 0.00.0 2.55.07.5 10.0x 15. MomentsMoments: 1 x m = E [X m ] =x m p (x ) =xmxmin 1 m minHence, when m 1, we have diverging moments 16. MomentsMoments:1x m = E [X m ] =x m p (x ) =xm xmin 1 m minHence, when m 1, we have diverging momentsSo when < 2, all moments are innite < 3, all second and higher-order moments are innite < 4, all third order and higher-order moments are innite.... 17. Distributional propertiesFor any power law with exponent > 1, the median is dened: x1/2 = 21/(1) xmin 18. Distributional propertiesFor any power law with exponent > 1, the median is dened: x1/2 = 21/(1) xminIf we use power-law to model wealth distribution, then we might be interestedin the fraction of wealth in the richer half: +2x1 / 2 xp (x )dxx1/2 == 2(2)/(1) xp (x )dxxminxminprovided > 2, the integrals converge 19. Distributional propertiesFor any power law with exponent > 1, the median is dened: x1/2 = 21/(1) xminIf we use power-law to model wealth distribution, then we might be interestedin the fraction of wealth in the richer half: +2x1 / 2 xp (x )dxx1/2 == 2(2)/(1) xp (x )dxxminxminprovided > 2, the integrals convergeWhen the wealth distribution was modelled using a power-law, wasestimated to be 2.1, so 20.09194% of the wealth is in the hands of thericher 50% of the population 20. Top-heavy distribution & the 80/20 rulePareto principle: aka 80/20 ruleThe law of the vital few, and the principle of factor sparsity states that, for manyevents, roughly 80% of the effects come from 20% of the causes 21. Top-heavy distribution & the 80/20 rulePareto principle: aka 80/20 ruleThe law of the vital few, and the principle of factor sparsity states that, for manyevents, roughly 80% of the effects come from 20% of the causesFor example, the distribution of world GDP Population quantileIncome Richest 20%82.70% Second 20% 11.75% Third 20% 2.30% Fourth 20%1.85% Poorest 20% 1.40%Other examples are:80% of your prots come from 20% of your customers80% of your complaints come from 20% of your customers80% of your prots come from 20% of the time you spend 22. Scale-free distributionsThe power law distribution is often referred to as a scale-free distributionA power law is the only distribution that is the same on regardless of thescale 23. Scale-free distributionsThe power law distribution is often referred to as a scale-free distributionA power law is the only distribution that is the same on regardless of thescaleFor any b, we havep (bx ) = g (b )p (x )That is, if we increase the scale by which we measure x by a factor of b,the shape of the distribution p (x ) is unchanged, except for a multiplicativeconstantThe PL distribution is the only distribution with this property 24. Random numbersFor the continuous case, we can generate random numbers using thestandard inversion method:x = xmin (1 u )1/(1)where U U (0, 1) 25. Random numbers The discrete case is a bit more tricky Instead, we have to solve the CMF numerically by doubling up and a binary search 26. Random numbers The discrete case is a bit more tricky Instead, we have to solve the CMF numerically by doubling up and a binary search So for a given u, we rst bound the solution to the equation via:1: x2 := xmin2: repeat3: x1 := x24: x2 := 2x15: until P (x2 ) < 1 u Basically, the algorithm tests whether u [x , 2x ), starting with x = xmin Once we have the region we use a binary search 27. Fitting power law distributions 28. Fitting power law distributionsSuppose we know xmin and wish to estimate the exponent . 29. Method 11 Bin your data: [xmin , xmin + x ), [xmin +x , xmin + 2 x)2 Plot your data on a log-log plot3 Use least squares to estimate Bin size: 0.01 Bin size: 0.1 Bin size: 1.0 100 101 102 CDF 103 104 105100 101102103 100 101102103 100 101102103 xYou could also use logarithmic binning (which is better) or should I say not asbad? 30. Method 2Similar to method 1, but Dont bin, just plot the data CDF Then use least squares to estimate Using linear regression is a bad idea 31. Method 2Similar to method 1, but Dont bin, just plot the data CDF Then use least squares to estimate Using linear regression is a bad ideaError estimates are completely offIt doesnt even provide a good point estimate of 32. Method 2Similar to method 1, but Dont bin, just plot the data CDF Then use least squares to estimate Using linear regression is a bad ideaError estimates are completely offIt doesnt even provide a good point estimate of On the bright side you do get a good R 2 value 33. Method 3: Log-LikelihoodThe log-likelihood isnt that hard to deriveContinuous: n xi (|x , xmin ) = n log( 1) n log(xmin ) logi =1xminDiscrete:n(|x , xmin ) = n log[ (, xmin )] log(xi ) i =1 xmin 1n= n log[ ()] + n log xi log(xi ) i =1i =1 34. MLEsMaximising the log-likelihood gives 1 nxi = 1+n ln xxmini =1An estimate of the associated error is1= n 35. MLEsMaximising the log-likelihood gives1nxi = 1+n lnxxmin i =1An estimate of the associated error is1= nThe discrete case is a bit more tricky and involves ignoring higher order terms,to get: 1 nxi1+n lnxxmin 0.5i =1 36. Estimating xminRecall that the power-law pdf is 1x p (x ) =xminxminwhere > 1 and xmin > 0xmin isnt a parameter in the usual since - its a cut-off in the state spaceTypically power-laws are only present in the distributional tails.So how much of the data should we discard so our distribution ts apower-law? 37. Estimating xmin : method 1 The most common way is just look at the log-log plot What could be easier!BlackoutsFires Flares 100 102 104 106 1081P(x)Moby DickTerrorism Web links 100 102 104 106 108100 102104 106 100 102 104 106 100 102104 106 x 38. Estimating xmin : method 2Use a "Bayesian approach" - the BIC:2 + k ln n = 2 + xmin ln nIncreasing xmin increases the number of parametersOnly suitable for discrete distributions 39. Estimating xmin : method 3Minimise the distance between the data and the tted model CDFs:D = max |S (x ) P (x )| x xminwhere S (x ) is the CDF of the data and P (x ) is the theoretical CDF (theKolmogorov-Smirnov statistic)Our estimate xmin is then the value of xmin that minimises DUse some form of bootstrapping to get a handle on uncertainty of xmin 40. Mechanisms for generating PL distributions 41. Word distributionsSuppose we type randomly on atypewriterWe hit the space bar with probability qsand a letter with probability qlIf there are m letters in the alphabet,then ql = (1 qs )/m http://activerain.com/ 42. Word distributionsSuppose we type randomly on atypewriterWe hit the space bar with probability qsand a letter with probability qlIf there are m letters in the alphabet,then ql = (1 qs )/mThe distribution of word frequency has http://activerain.com/the form p (x ) x 43. Relationship between value and Zipfs principle of leasteffort. value Examples in literature Least effort for < 1.6 Advanced schizophrenia1.6 < 2 Military combat texts, Wikipedia, WebAnnotatorpages listed on the open directory project=2 Single author textsEqual effort levels2 < 2.4 Multi author texts Audience > 2.4 Fragmented discourse schizophrenia 44. Random walks Suppose we have a 1d random walk At each unit of time, we move 14 qq q q2 q q qqqq q qq q qqPosition0 qq qqq q q q q qqq q qq q q 2qq q q 40 10 20 30 Time 45. Random walks Suppose we have a 1d random walk At each unit of time, we move 14 qq q q2 q q qqqq q qq q qqPosition0 qq qqq q q q q qqq q qq q q 2qq q q 40 10 20 30 Time If we start at n = 0, what is the probability for the rst return time at time t 46. Random walks With a bit of algebra, we get: n (2n) f2n = (2n 1)22n For large n, we get 2f2n n (2n 1)2 So as n , we get f2n n3/2 So the distribution of return times follows a power law with exponent = 3/2! 47. Random walks With a bit of algebra, we get: n (2n) f2n = (2n 1)22n For large n, we get 2f2n n (2n 1)2 So as n , we get f2n n3/2 So the distribution of return times follows a power law with exponent = 3/2! Tenuous link to phylogenetics 48. Phase transitions and critical phenomena Suppose we have a simple lattice. Each square is coloured with probability p = 0.5 We can look at the clusters of coloured squares. For example, the mean cluster area, s , of a randomly chosen square: If a square is white, then zero If a square is coloured, but surround by white, then one etc 49. Phase transitions and critical phenomena Suppose we have a simple lattice. Each square is coloured with probability p = 0.5 We can look at the clusters of coloured squares. For example, the mean cluster area, s , of a randomly chosen square: If a square is white, then zero If a square is coloured, but surround by white, then one etc When p is small, s is independent of the lattice size When p is large, s depends on the lattice size 50. Phase transitions and critical phenomena p=0.3 As we increase p, the value of s also increases For some p, s starts to increase with the lattice size p=0.5927... This is know as the critical value, and is p = pc = 0.5927462.. If we calculate the distribution of p (s ), then when p = pc , p (s ) follows a power-law distribution p=0.9 51. Forest reThis simple model has been used as a primitive model of forest resWe start with an empty lattice and trees grow at randomEvery so often, a forest re strikes at randomIf the forest is too connected, i.e. large p, then the forest burns downSo (it is argued) that the forest size oscillates around p = pc 52. Forest reThis simple model has been used as a primitive model of forest resWe start with an empty lattice and trees grow at randomEvery so often, a forest re strikes at randomIf the forest is too connected, i.e. large p, then the forest burns downSo (it is argued) that the forest size oscillates around p = pcThis is an example of self-organised criticality 53. Future workThere isnt even an R package for power law estimationWriting this talk I have (more or less) written oneUse a Bayesian change point model to estimate xmin in a vaguelysensible wayRJMCMC to change between the power law and other heavy taileddistributionsReferencesA. Clauset, C.R. Shalizi, and M.E.J. Newman.Power-lawdistributionsinempiricaldata.http://arxiv.org/abs/0706.1062MEJ Newman. Powerlaws,ParetodistributionsandZipfslaw.http://arxiv.org/abs/cond-mat/0412004