Top Banner
Disseminating census microdata: the IPUMS Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and and IECM experiences, 2002-2010 (and plans for beyond) plans for beyond) * * * * * * Robert McCaa and Albert Esteve Robert McCaa and Albert Esteve Minnesota Population Center and Minnesota Population Center and Centre d’Estudis Centre d’Estudis Demogràfics Demogràfics [email protected] ; ; [email protected] www.ipums.org/international (Global) (Global) www.iecm-project.org (Europe portal) (Europe portal) “Only used statistics are useful statistics.” -- Joint UNECE/Eurostat Meeting on Population and Housing Censuses inf.1
21

Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Dec 14, 2015

Download

Documents

Carlo Garfield
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Disseminating census microdata: the IPUMS and IECM Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond)experiences, 2002-2010 (and plans for beyond)

* * ** * *Robert McCaa and Albert EsteveRobert McCaa and Albert Esteve

Minnesota Population Center and Minnesota Population Center and Centre d’Estudis Demogràfics Centre d’Estudis Demogràfics

[email protected]; ; [email protected] www.ipums.org/international (Global) (Global)www.iecm-project.org (Europe portal) (Europe portal)

“Only used statistics are useful statistics.”-- Joint UNECE/Eurostat Meeting on Population and Housing Censuses inf.1

Page 2: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

1.1. Discuss dissemination statisticsDiscuss dissemination statisticsfrom 59,170 extracts downloaded by IPUMS registered usersfrom 59,170 extracts downloaded by IPUMS registered users

2.2. Invite 21 European partnersInvite 21 European partnersto entrust 2010 round samples as expeditiously as possible to entrust 2010 round samples as expeditiously as possible

3.3. Invite non-partnersInvite non-partnersto entrust samples of historical censuses (2000 and earlier to entrust samples of historical censuses (2000 and earlier rounds) as well as for the 2010 roundrounds) as well as for the 2010 round

3 goals of presentation:3 goals of presentation:IPUMS/IECM census microdata projects IPUMS/IECM census microdata projects

Page 3: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

no. of no. of slidesslides

1.1. IPUMS-International: massive, global disseminationIPUMS-International: massive, global dissemination 772.2. IPUMS-International: usage statisticsIPUMS-International: usage statistics 993.3. Conclusion Conclusion 22

Outline: Outline: Integrating census samples and metadata for timely Integrating census samples and metadata for timely

dissemination via the IPUMS-International dissemination via the IPUMS-International and IECM initiatives, 2010-2014and IECM initiatives, 2010-2014

Page 4: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

1. IPUMS-International: 1. IPUMS-International: Massive, Global Integration and Massive, Global Integration and

DisseminationDissemination “…best practice for a data repository of international “…best practice for a data repository of international

statistical data”statistical data”--Dennis Trewin --Dennis Trewin

chair UNECE task force on Statistical Confidentiality & Microdata Accesschair UNECE task force on Statistical Confidentiality & Microdata Access See also: See also: » 2006: "IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted

access census microdata extracts to academic users," Monographs of official statistics: Work session on statistical data confidentiality.

» 2009: Entrusting census microdata and metadata for timely integration and dissemination via the IPUMS-EurAsia and IECM initiatives, 2010-2014. ECE/CES/GE.41/2009/23

Page 5: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

IPUMS-International:IPUMS-International:

» Begun in 1999, IPUMS-International is the world’s largest Begun in 1999, IPUMS-International is the world’s largest integrated demographic database: integrated demographic database: » 159 integrated, anonymized census samples (55 countries) 159 integrated, anonymized census samples (55 countries)

» 325 million person records; 3,600 approved researchers325 million person records; 3,600 approved researchers

» Database is likely to double over the next five years, by the Database is likely to double over the next five years, by the addition of:addition of:» 2010 round samples of 17 current Eur-Asian partners2010 round samples of 17 current Eur-Asian partners: Armenia, : Armenia,

Austria, Belarus, Canada, France, Greece, Hungary, Italy, Austria, Belarus, Canada, France, Greece, Hungary, Italy, Kyrgyzstan, Netherlands, Portugal, Romania, Slovenia, Spain, Kyrgyzstan, Netherlands, Portugal, Romania, Slovenia, Spain, Switzerland, UK, USA, etc.Switzerland, UK, USA, etc.

» Samples for 8 Eur-Asian countries currently in developmentSamples for 8 Eur-Asian countries currently in development: : Belgium, Czech Republic, Ireland, Germany, Poland, Turkey, Belgium, Czech Republic, Ireland, Germany, Poland, Turkey, Turkmenistan, UkraineTurkmenistan, Ukraine

» Future partnersFuture partners? Albania? Bulgaria? Croatia? Estonia? Finland? ? Albania? Bulgaria? Croatia? Estonia? Finland? ……

Page 6: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

59,170 extracts—586,643 variables—disseminated59,170 extracts—586,643 variables—disseminatedjumped 10% in June, with the 2010 launchjumped 10% in June, with the 2010 launch

» IPUMS-International NEVER disseminates source IPUMS-International NEVER disseminates source microdata!microdata!

» 4 IPUMS 4 IPUMS constructed constructed variables ranked in the top 30variables ranked in the top 30» Spouse’s location in householdSpouse’s location in household» Mother’s location in householdMother’s location in household» Father’s location in household Father’s location in household » Spouse rule for inferring location in householdSpouse rule for inferring location in household

» These variables are constructed from household samplesThese variables are constructed from household samples» 3 countries with person samples are invited to construct household 3 countries with person samples are invited to construct household

samples: samples: » Canada Canada » NetherlandsNetherlands» UKUK

Page 7: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Microdata

Integrated into IPUMS

Entrusted to IPUMS None entrusted

None inventoried

IPUMS-International IPUMS-International dark greendark green = integrated and disseminating = integrated and disseminating

(55 countries, 159 censuses, 325 millon person records)(55 countries, 159 censuses, 325 millon person records)green = to be integrated (35 countries, 90 censuses, 150 mill.)green = to be integrated (35 countries, 90 censuses, 150 mill.)

Mollweide projection

IPUMS-InternationalIPUMS-International

2011:2011:Cambodia 2008Cambodia 2008Egypt 2006 Egypt 2006 France 2006France 2006GermanyGermanyIndonesiaIndonesiaIrelandIrelandetc.etc.

2012:2012:why not yours?why not yours?

Page 8: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

2011 launch at the 582011 launch at the 58thth Session Session ISI: ISI: Dublin, Aug 21-26, 2011Dublin, Aug 21-26, 2011

http://www.isi2011.iehttp://www.isi2011.ie

» European samples to be launchedEuropean samples to be launched» France, 2006France, 2006» Germany (1970-87; DFR ‘71, ‘81)Germany (1970-87; DFR ‘71, ‘81)» Ireland (1971-2006) Ireland (1971-2006)

» Beyond Europe, samples for: Beyond Europe, samples for: » Cambodia 2008Cambodia 2008» Egypt 2006Egypt 2006» Jamaica, 1981-2001Jamaica, 1981-2001» Iran 2006Iran 2006» Etc.Etc.

» Successive annual launches planned Successive annual launches planned for 2012, 2013, 2014. for 2012, 2013, 2014.

Page 9: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Dissemination of microdata extracts viaDissemination of microdata extracts viaIPUMS-International IPUMS-International

» IPUMS-International NEVER disseminates source IPUMS-International NEVER disseminates source microdata!microdata!

» Usage is restricted to bona-fide researchers who agree to Usage is restricted to bona-fide researchers who agree to stringent conditions of use to protect statistical confidentialitystringent conditions of use to protect statistical confidentiality

» IPUMS disseminates extracts, custom-tailored to researchers IPUMS disseminates extracts, custom-tailored to researchers needsneeds

» Unlike most statistical agencies which disseminates an identical entire Unlike most statistical agencies which disseminates an identical entire sample to every user sample to every user

Page 10: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Dissemination of microdata and metadata extracts Dissemination of microdata and metadata extracts

» The massive scale of IPUMS requires users to be selective:The massive scale of IPUMS requires users to be selective:» Select country (or countries)Select country (or countries)» Select samples (census years) Select samples (census years) » Select variables (e.g., age, sex, educational attainment, etc.)Select variables (e.g., age, sex, educational attainment, etc.)» Select sub-populations (e.g., nurses)Select sub-populations (e.g., nurses)» Select sample density Select sample density

» Once an extract request is submitted, the IPUMS extract Once an extract request is submitted, the IPUMS extract engine:engine:

» Constructs the microdata extractConstructs the microdata extract» Constructs the metadataConstructs the metadata» Emails the researcher to retrieve the extract Emails the researcher to retrieve the extract

password protected, transmission is encrypted 128 bit SSLpassword protected, transmission is encrypted 128 bit SSL

» The researcher downloads the extract, un-zips and analyzesThe researcher downloads the extract, un-zips and analyzes» Extract system validated as usage has soaredExtract system validated as usage has soared

Page 11: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

2. IPUMS-International2. IPUMS-InternationalUsage statisticsUsage statistics

See card hand-out for list of current samples and usage statistics See card hand-out for list of current samples and usage statistics

Page 12: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Usage Statistics (June 4, 2010)Usage Statistics (June 4, 2010)

» 59,170 extracts (jumped 10% in June)59,170 extracts (jumped 10% in June)» Average: 1,000 extracts per countryAverage: 1,000 extracts per country

» Smallest number of extracts: Kyrgyz Republic, 116 Smallest number of extracts: Kyrgyz Republic, 116 census of 1999; first year of availability census of 1999; first year of availability

» Largest number of extracts: Mexico, 7,637 Largest number of extracts: Mexico, 7,637 6 censuses, 8 years of availability6 censuses, 8 years of availabilityMexico 2000: 2,464 extractsMexico 2000: 2,464 extracts

» Usage statistics by country: see Table 2Usage statistics by country: see Table 2»

Page 13: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Table 2. Usage statistics: Sample Rank and Details Table 2. Usage statistics: Sample Rank and Details Table 2. Extract Rank and Sample Details for the Top Five and all European Countries

Rank Country Sample %* Variables (n)* Years of census samples Extracts1 Mexico 10 120 1960p, 70, 90, 95, 2000, 05 7,6372 Brazil 5 106 1960, 70, 80, 91, 2000 5,1913 United States 5 92 1960, 70, 80, 90, 2000, 05 4,5594 Colombia 10 120 1964p, 72, 85, 93, 2005 3,4285 France 5 99 1962, 68, 75, 82, 90, 99 2,79510 Canada 2.5 59 1971p, 81p, 91p, 2001p 1,61412 Spain 5 99 1981, 91, 2001 1,51413 Greece 10 89 1971, 81, 91, 2001 1,49619 Hungary 5 74 1970, 80, 90, 2001 1,13221 Austria 10 75 1971, 81, 91, 2001 1,08722 Portugal 5 96 1981, 91, 2001 1,02823 Romania 10 97 1976, 92, 2002 1,01223 Austria 10 75 1971, 81, 91, 2001 1,08729 UK 3 47 1991, 2001p 65730 Netherlands 1 33 1960p, 71p, 2001p 57032 Belarus 10 84 1999 33338 Italy 5 81 2001 20943 Slovenia 10 80 2002 133 Total extracts from the IPUMS-International database for 55 countries (158 samples) Jun 4, 2010 59,170*2000 round census; refers to all integrated variables, including IPUMS constructed variables.“p” = person sample; all other samples are of households

Page 14: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Table 3. 32 most popular variables Table 3. 32 most popular variables Table 3. Thirty-two most popular variables in IPUMS-International

Label Extracts Mnemonic Comment1 Educational attainment 19,307 EDATTAN2 Age (single years to 85+) 19,009 AGE Grouped age n=3,8383 Employment status 18,490 EMPSTAT4 Marital status 18,214 MARST5 Person weight 17,511 WTPER Technical variable6 Relationship to head 15,783 RELATE7 Sex 14,595 SEX8 Class of work 12,583 CLASSWK9 Ownership of dwelling 8,050 OWNRSHP

10 Occupation ISCO recode 8,004 OCCISCO11 School attendance 7,919 SCHOOL12 Years of schooling 7,576 YRSCHL13 Literate 7,290 LIT14 Urban/rural 7,098 URBAN15 Industry-general code 7,044 INDGEN16 Household weight 6,656 WTHH Technical variable17 Children ever born 6,363 CHBORN18 Nativity (native/foreign born) 6,332 NATIVTY19 Occupation 6,246 OCC

Page 15: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Table 3. 32 most popular variables (cont.)Table 3. 32 most popular variables (cont.) Table 3. Thirty-two most popular variables in IPUMS-International

Label Extracts Mnemonic Comment1 Educational attainment 19,307 EDATTAN

19 Occupation 6,246 OCC20 Country of birth 6,153 BPLCTRY21 Religion 6,075 RELIG22 Industry 5,670 IND23 Location of spouse in household 5,007 SPLOC Constructed (household) 24 Rule for locating spouse 4,171 SPRULE Constructed (household)25 Location of mother in household 4,153 MOMLOC Constructed (household)26 Number of children surviving 4,074 CHSURV27 Place of residence 5 years ago 4,064 MGRATE528 Location of father in household 3,983 POPLOC Constructed (household)29 Total household income 3,965 INCTOT Household variable30 Earned income 3,655 INCEARN31 Number of rooms 3,465 ROOMS32 Consensual union 3,443 CONSENS

Page 16: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

For uses, see http://bibliography.ipums.orgFor uses, see http://bibliography.ipums.org

Page 17: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

And: scholar.google.com And: scholar.google.com IPUMS & name of country, subject, etc.IPUMS & name of country, subject, etc.

Page 18: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Minimum Standards for Samples Entrusted to IPUMS Minimum Standards for Samples Entrusted to IPUMS for disseminationfor dissemination

1.1. Household samples onlyHousehold samples only2.2. High precision: 5% minimum, 10% preferredHigh precision: 5% minimum, 10% preferred3.3. Broad set of variables—omit only those required for Broad set of variables—omit only those required for

statistical confidentiality (low-level geography, low frequency statistical confidentiality (low-level geography, low frequency attributes)attributes)

4.4. Detailed codesDetailed codes» Age: single year to 85 Age: single year to 85 » Occupation, industry: 3 digit ISCO, ISICOccupation, industry: 3 digit ISCO, ISIC» Country of birth: detail individual countries consistent with Country of birth: detail individual countries consistent with

statistical confidentialitystatistical confidentiality

» Thanks to INSEE France for sample of recensement renovee, Thanks to INSEE France for sample of recensement renovee, 2004-2008: 20 million person records to be launched next 2004-2008: 20 million person records to be launched next year.year.

Page 19: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Conclusion: Invitation to continued cooperationConclusion: Invitation to continued cooperation

» In 1999, our dream: integrate samples of 21 countries in 10 In 1999, our dream: integrate samples of 21 countries in 10 yearsyears

» Thanks to generous cooperation of 55 National Statistical OfficesThanks to generous cooperation of 55 National Statistical Offices» Undreamed technological innovationsUndreamed technological innovations

» By 2009, integrated samples for 44 countriesBy 2009, integrated samples for 44 countries» Number of users and usage far exceeded expectationsNumber of users and usage far exceeded expectations

» For the 2010 decade, our dream: For the 2010 decade, our dream: » Double the number of usersDouble the number of users» Double the number of integrated samplesDouble the number of integrated samples» Re-draw samples that do not meet minimum standards, where feasibleRe-draw samples that do not meet minimum standards, where feasible

» Participating statistical agencies: please entrust 2010 samples Participating statistical agencies: please entrust 2010 samples in due coursein due course

» Other statistical agencies: entrust series of samples for each Other statistical agencies: entrust series of samples for each census for which microdata existcensus for which microdata exist

Page 20: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

……and to the 58and to the 58thth Session ISI: Session ISI: Dublin, Aug 21-26, 2011Dublin, Aug 21-26, 2011

http://www.isi2011.iehttp://www.isi2011.ie

» IPUMS Workshop, Aug 19-20IPUMS Workshop, Aug 19-20» New IPUMS initiativesNew IPUMS initiatives» Reports by IPUMS usersReports by IPUMS users» Reports by National Statistical Reports by National Statistical

Office-partners Office-partners » IPUMS sponsorship for delegates from IPUMS sponsorship for delegates from

participating countries: participating countries: » economy air, economy air, » registration fees, registration fees, » 8 nights accomodations and modest 8 nights accomodations and modest

per-diemper-diem» Simultaneous interpretation: Simultaneous interpretation:

Russian/French/EnglishRussian/French/English

Page 21: Disseminating census microdata: the IPUMS and IECM experiences, 2002-2010 (and plans for beyond) * * * Robert McCaa and Albert Esteve Minnesota Population.

Thank you for your cooperation!!Thank you for your cooperation!!

[email protected]@ced.uab.es

www.ipums.org/internationalwww.iecm-project.org