Top Banner
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science 1 Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London [email protected] [email protected] http://www.casa.ucl.ac.uk/ Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL)
23

Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty

Feb 02, 2016

Download

Documents

len

Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty University College London [email protected] [email protected] http://www.casa.ucl.ac.uk/ Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL). Motivation. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science1

Scaling in the Geography of US Computer Science

Rui Carvalho and Michael BattyUniversity College London

[email protected] [email protected]://www.casa.ucl.ac.uk/

Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL)

Page 2: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science2

Motivation• Why Geography?

– Scientists: who can I collaborate with in my city/country?

– Funding Agencies: where are new research centres emerging? Is regional distribution of funds optimal?

– Scientometrics: distinguish between J. Smith (PSU) and J. Smith (UCL);

• Preprint server challenges:– [USA] NIH-funded investigators are

required to submit to PubMed their papers within 1 year of publication (effective May 2, 2005);

– [UK] Wellcome Trust-funded papers will in future have to be placed in a central public archive within six months of publication;

• Data mining challenges:– Processing of large databases give promise

to uncover knowledge hidden behind the mass of available data;

– Dramatically speed up achievements formerly reached solely by human effort and provide new results that could not have been reached by humans unaided;

• Statistical Challenges:– Conventional wisdom holds that

(geographical) spatial point processes have characteristic scales...

– Yet most “real world” phenomena are often far from equilibrium.

PNAS, 6 April 2004

Page 3: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science3

Plan

• Open Archives Datasets:– Citeseer (Computer Science);– arXiv.org (mainly Physics, but also Maths and CS)

• Geographical Datasets:– The US census bureau makes available on the web datasets for

geocoding, but Europe lacks a unified ‘open-access’ database;

• Plan:– Extract ZIP codes from authors’ addresses;– Map research centres geographically;

• Questions about the research centres:– How productive are they?– Are there non-trivial spatial structures at a geographical scale?

Page 4: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science4

Plan

• Open Archives Datasets:– Citeseer (Computer Science);– arXiv.org (mainly Physics, but also Maths and CS)

• Geographical Datasets:– The US census bureau makes available on the web datasets for

geocoding, but Europe lacks a unified ‘open-access’ database;

• Plan:– Extract ZIP codes from authors’ addresses;– Map research centres geographically;

• Questions about the research centres:– How productive are they?– Are there non-trivial spatial structures at a geographical

scale?

Can Statistical Physics Help?

Page 5: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science5

What is Citeseer?

• Founded by Steve Lawrence and C. Lee Giles in 1997 (NEC);

• Now at Penn State http://citeseer.ist.psu.edu/• Archive of computer science research papers

harvested from the web and submitted by users;• Currently (Dec 2005) contains over 730,000

documents;• Citeseer was developed as a model for

Autonomous Citation Indexing, i.e. citation indexes are created automatically;

• Can search content in postscript and PDF files.

Page 6: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science6

Data Collecting and Parsing

• Citeseer metadata:– 525,055 computer science research papers;– 399,757 (76.14%) of which are unique;– 103,172 (25.81%) of the unique papers have one or more US

authors;– 2,975 different ZIP codes in the unique papers belong to the

US conterminous states (48 states, plus the District of Columbia);

• 5 most productive ZIP codes:1. Count: 3950 Zip: 15213 Carnegie Mellon Univ, Pittsburgh PA;2. Count 3403 Zip: 02139 MIT, Cambridge, MA;3. Count: 2954 Zip: 94305 Stanford Univ, CA;4. Count: 2691 Zip: 94720 Univ California at Berkley, CA;5. Count: 2309 Zip: 61801 Univ Illinois at Urbana Champaign, IL

Page 7: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science7

Q1: How productive are the research centres?

Page 8: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science8

Q2: Non-trivial spatial structures?

Page 9: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science9

The Geography of Citeseer

Page 10: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science10

Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)Density-equalizing map projections: Diffusion-based algorithm and applicationsMichael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)

Page 11: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science11

Cartograms

Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)Density-equalizing map projections: Diffusion-based algorithm and applicationsMichael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)

Page 12: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science12

CartogramsDiffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)

Page 13: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science13

CartogramsDiffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)

Page 14: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science14

Spatial Point Processes

• Moments:– First moment: ρ, expected number of points

per unit area;– Second moment: Ripley’s function. ρK(r) is

the expected number of points within distance r of a point.

• For a Poisson process, ;

• But neither the first or second moments give a feel for the way in which spatial distribution changes within an area.

2K r r

Page 15: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science15

The Two-Point Correlation Function

• The two-point correlation function

describes the probability to find a point in volume dV(x1) and another point in dV(x2) at distance r = |x1-x2|;

• For a Poisson process g(r)=1;

• Edge corrections (Ripley’s Weights): take a circle centred on point x passing through another point y. If the circle lies entirely within the domain, D, the point is counted once. If a proportion p(x,y) of the circle lies within D, the point is counted as 1/p points.

22 1 2 1 2 1 2, ( ) ( ) ( ) ( )x x dV x dV x g r dV x dV x

Page 16: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science16

Computation of the Two-Point Correlation Function

Intersection with border gives more than one polygon

Geographical range at which the two-point correlation function can be approximated by a power-law

Page 17: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science17

Two-Point Correlation Function

Page 18: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science18

Speculation: knowledge diffusion?

Page 19: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science19

Speculation: Universality?

Page 20: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science20

To find out more

• http://www.casa.ucl.ac.uk/

• Spatially Embedded Complex Systems Engineering (SECSE):http://www.secse.net/members: UCL, Leeds, Southampton, Sussex

[email protected] [email protected]

Page 21: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science21

Page 22: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science22

Plot of state R&D expenditure (NSF) vs population

Page 23: Scaling in the Geography  of US Computer Science Rui Carvalho and Michael Batty

PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science23

Poisson Point Process

• We say that a spatial process is completely random iff:– The number of events in any planar region A

with area |A| follows a Poisson distribution with mean λ |A|, where λ is the density of points;

– For any two disjoint regions A and B, the random variables N(A) and N(B) are independent.