PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 1 Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London [email protected][email protected]http://www.casa.ucl.ac.uk/ Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL)
23
Embed
Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty
Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London [email protected][email protected] http://www.casa.ucl.ac.uk/ Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL). Motivation. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science1
Scaling in the Geography of US Computer Science
Rui Carvalho and Michael BattyUniversity College London
Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science2
Motivation• Why Geography?
– Scientists: who can I collaborate with in my city/country?
– Funding Agencies: where are new research centres emerging? Is regional distribution of funds optimal?
– Scientometrics: distinguish between J. Smith (PSU) and J. Smith (UCL);
• Preprint server challenges:– [USA] NIH-funded investigators are
required to submit to PubMed their papers within 1 year of publication (effective May 2, 2005);
– [UK] Wellcome Trust-funded papers will in future have to be placed in a central public archive within six months of publication;
• Data mining challenges:– Processing of large databases give promise
to uncover knowledge hidden behind the mass of available data;
– Dramatically speed up achievements formerly reached solely by human effort and provide new results that could not have been reached by humans unaided;
• Statistical Challenges:– Conventional wisdom holds that
(geographical) spatial point processes have characteristic scales...
– Yet most “real world” phenomena are often far from equilibrium.
PNAS, 6 April 2004
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science3
Plan
• Open Archives Datasets:– Citeseer (Computer Science);– arXiv.org (mainly Physics, but also Maths and CS)
• Geographical Datasets:– The US census bureau makes available on the web datasets for
geocoding, but Europe lacks a unified ‘open-access’ database;
• Plan:– Extract ZIP codes from authors’ addresses;– Map research centres geographically;
• Questions about the research centres:– How productive are they?– Are there non-trivial spatial structures at a geographical scale?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science4
Plan
• Open Archives Datasets:– Citeseer (Computer Science);– arXiv.org (mainly Physics, but also Maths and CS)
• Geographical Datasets:– The US census bureau makes available on the web datasets for
geocoding, but Europe lacks a unified ‘open-access’ database;
• Plan:– Extract ZIP codes from authors’ addresses;– Map research centres geographically;
• Questions about the research centres:– How productive are they?– Are there non-trivial spatial structures at a geographical
scale?
Can Statistical Physics Help?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science5
What is Citeseer?
• Founded by Steve Lawrence and C. Lee Giles in 1997 (NEC);
• Now at Penn State http://citeseer.ist.psu.edu/• Archive of computer science research papers
harvested from the web and submitted by users;• Currently (Dec 2005) contains over 730,000
documents;• Citeseer was developed as a model for
Autonomous Citation Indexing, i.e. citation indexes are created automatically;
• Can search content in postscript and PDF files.
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science6
Data Collecting and Parsing
• Citeseer metadata:– 525,055 computer science research papers;– 399,757 (76.14%) of which are unique;– 103,172 (25.81%) of the unique papers have one or more US
authors;– 2,975 different ZIP codes in the unique papers belong to the
US conterminous states (48 states, plus the District of Columbia);
• 5 most productive ZIP codes:1. Count: 3950 Zip: 15213 Carnegie Mellon Univ, Pittsburgh PA;2. Count 3403 Zip: 02139 MIT, Cambridge, MA;3. Count: 2954 Zip: 94305 Stanford Univ, CA;4. Count: 2691 Zip: 94720 Univ California at Berkley, CA;5. Count: 2309 Zip: 61801 Univ Illinois at Urbana Champaign, IL
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science7
Q1: How productive are the research centres?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science8
Q2: Non-trivial spatial structures?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science9
The Geography of Citeseer
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science10
Cartograms
Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)Density-equalizing map projections: Diffusion-based algorithm and applicationsMichael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science11
Cartograms
Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)Density-equalizing map projections: Diffusion-based algorithm and applicationsMichael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science12
CartogramsDiffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science13
CartogramsDiffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science14
Spatial Point Processes
• Moments:– First moment: ρ, expected number of points
per unit area;– Second moment: Ripley’s function. ρK(r) is
the expected number of points within distance r of a point.
• For a Poisson process, ;
• But neither the first or second moments give a feel for the way in which spatial distribution changes within an area.
2K r r
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science15
The Two-Point Correlation Function
• The two-point correlation function
describes the probability to find a point in volume dV(x1) and another point in dV(x2) at distance r = |x1-x2|;
• For a Poisson process g(r)=1;
• Edge corrections (Ripley’s Weights): take a circle centred on point x passing through another point y. If the circle lies entirely within the domain, D, the point is counted once. If a proportion p(x,y) of the circle lies within D, the point is counted as 1/p points.
22 1 2 1 2 1 2, ( ) ( ) ( ) ( )x x dV x dV x g r dV x dV x
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science16
Computation of the Two-Point Correlation Function
Intersection with border gives more than one polygon
Geographical range at which the two-point correlation function can be approximated by a power-law
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science17
Two-Point Correlation Function
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science18
Speculation: knowledge diffusion?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science19
Speculation: Universality?
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science20