IPUMS & AICMD Add Value IPUMS & AICMD Add Value to African Census Microdata to African Census Microdata Robert McCaa and Patricia Kelly-Hall Robert McCaa and Patricia Kelly-Hall ASSD VII, January ASSD VII, January , 2012 , 2012 Cape Town, South Africa Cape Town, South Africa * * * * * * ipums.org/international ipums.org/international ecastats.uneca.org/aicmd ecastats.uneca.org/aicmd [email protected][email protected]for additional details, please see for additional details, please see : : www.hist.umn.edu/~rmccaa/ipums-africa www.hist.umn.edu/~rmccaa/ipums-africa 1
48
Embed
IPUMS & AICMD Add Value to African Census Microdata Robert McCaa and Patricia Kelly-Hall ASSD VII, January, 2012 Cape Town, South Africa * * * ipums.org/international.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
IPUMS & AICMD Add Value IPUMS & AICMD Add Value to African Census Microdata to African Census Microdata
Robert McCaa and Patricia Kelly-HallRobert McCaa and Patricia Kelly-HallASSD VII, JanuaryASSD VII, January, 2012, 2012Cape Town, South AfricaCape Town, South Africa
for additional details, please seefor additional details, please see::www.hist.umn.edu/~rmccaa/ipums-africa www.hist.umn.edu/~rmccaa/ipums-africa
““Dissemination [means] opening up the value inherent in our data”Dissemination [means] opening up the value inherent in our data” --Walter Radermacher (President, Eurostat) and Pieter Everaers (Director, Eurostat)
IPUMS+AICMD open up the value inherent in IPUMS+AICMD open up the value inherent in microdata for censuses throughout Africa.microdata for censuses throughout Africa.
2
The purpose of this talk: “value added by 3The purpose of this talk: “value added by 3rdrd parties” parties”
1.1. Encourage National Statistical Offices to entrust census Encourage National Statistical Offices to entrust census microdata samples to the IPUMS-International projectmicrodata samples to the IPUMS-International project
2.2. Describe some of the value that IPUMS-International adds to Describe some of the value that IPUMS-International adds to integrated microdata and metadata.integrated microdata and metadata.
Free access to the microdata for bona fide researchersFree access to the microdata for bona fide researchers Extensive analysis of data quality before the samples are releasedExtensive analysis of data quality before the samples are released Integrated metadata (compare questions in 1, 2, … many censuses)Integrated metadata (compare questions in 1, 2, … many censuses) Integrated, pooled microdata (multiple censuses, countries)Integrated, pooled microdata (multiple censuses, countries)
3.3. Encourage usage of integrated samples by African Encourage usage of integrated samples by African researchersresearchers
Usage is relatively low, but increasing quickly as more samples Usage is relatively low, but increasing quickly as more samples become availablebecome available
Advantages of IPUMS for Ireland
• Bonus for CSO: as a result of this project, our historic data sets are now in a much more usable format
• IPUMS allows – mix of Census years available in 1 file
• Comparability with other countries
• Ease of access for users
• Positive publicity for Census in Ireland
Central Statistics Office-Ireland Deirdre Cullen, Senior Statistician, testimonial (not in the paper):
IntroductionIntroduction When NSOs disseminate microdata, the task is costly, risky and often When NSOs disseminate microdata, the task is costly, risky and often
unsatisfactoryunsatisfactory IPUMS+AICMD partnership offers solution for African countriesIPUMS+AICMD partnership offers solution for African countries Invitation to participate, entrust microdata for 2010 and earlier censuses Invitation to participate, entrust microdata for 2010 and earlier censuses
without undue delaywithout undue delay
IPUMS+AICMD adds value to population microdata:IPUMS+AICMD adds value to population microdata:1.1. Statistical confidentiality and security – disclosure controls, restricted Statistical confidentiality and security – disclosure controls, restricted
access access 2.2. Integration – census microdata and metadataIntegration – census microdata and metadata3.3. Dissemination – custom tailored extracts: country(ies), census(es), Dissemination – custom tailored extracts: country(ies), census(es),
Why Statistical Offices entrust Responsibility of Why Statistical Offices entrust Responsibility of Disseminating Census Microdata to IPUMS-InternationalDisseminating Census Microdata to IPUMS-International» NSO Dissemination is costly, risky and often unsatisfactoryNSO Dissemination is costly, risky and often unsatisfactory
» CostlyCostly: scarce human resources to prepare sample, assure statistical : scarce human resources to prepare sample, assure statistical confidentiality, and manage access for relatively few users (however confidentiality, and manage access for relatively few users (however important they may be!)important they may be!)
» RiskyRisky: little experience in anonymizing and managing access to : little experience in anonymizing and managing access to microdata, yet great responsibilitymicrodata, yet great responsibility» US Census Bureau anonymization protocol egregiously corrupted ages for elderly in US Census Bureau anonymization protocol egregiously corrupted ages for elderly in
ACS microdata—took 5 years to discover the error!ACS microdata—took 5 years to discover the error!
» UnsatisfactoryUnsatisfactory: excessive anonymization, slow to provide access. : excessive anonymization, slow to provide access. Troublesome for NSO statisticians who do not wish to risk their job to Troublesome for NSO statisticians who do not wish to risk their job to some academic. some academic. Most deny access to all but the most persistent, influential would-be Most deny access to all but the most persistent, influential would-be users. Complaints (of a large European NSO):users. Complaints (of a large European NSO):» ““I haven't used the [microdata]; the bureaucracy was just too slow to get much use out I haven't used the [microdata]; the bureaucracy was just too slow to get much use out
of it.”of it.”» ““[Access] is unbelievably bureaucratic and difficult – this discourages people from [Access] is unbelievably bureaucratic and difficult – this discourages people from
using it. It took me 6 months to get the data.”using it. It took me 6 months to get the data.”
IPUMS-International assumes responsibilities and risks IPUMS-International assumes responsibilities and risks for integrating & disseminating microdata and metadatafor integrating & disseminating microdata and metadata» Uniform Memorandum of Understanding with each NSO: Uniform Memorandum of Understanding with each NSO:
» Founding partners (2001): Kenya, South Africa, Ghana, Egypt, Founding partners (2001): Kenya, South Africa, Ghana, Egypt, France, Spain, China, Vietnam, Kenya, Colombia, Mexico, USA … France, Spain, China, Vietnam, Kenya, Colombia, Mexico, USA … now almost 100 countriesnow almost 100 countries
» Specific conditions of access: ownership of data (NSO), use, access, Specific conditions of access: ownership of data (NSO), use, access, restrictions, confidentiality, security, publication, violations, sharing, restrictions, confidentiality, security, publication, violations, sharing, jurisdiction, and precedence. jurisdiction, and precedence.
» Almost 100 countries entrust census microdata to IPUMS-I. Almost 100 countries entrust census microdata to IPUMS-I. » 6 most populous countries 6 most populous countries NOTNOT entrusting census microdata entrusting census microdata
to IPUMS: India, *Nigeria, Russian Federation, Japan, to IPUMS: India, *Nigeria, Russian Federation, Japan, Algeria, *Korea (RO—may join at the UNSC in New York)Algeria, *Korea (RO—may join at the UNSC in New York)» * = negotiating* = negotiating» No data: Congo (DR), Myanmar, Afghanistan, Uzbekistan, SomaliaNo data: Congo (DR), Myanmar, Afghanistan, Uzbekistan, Somalia
8
9
90+ National Statistics Offices have endorsed the IPUMS-90+ National Statistics Offices have endorsed the IPUMS-International Memorandum of UnderstandingInternational Memorandum of Understanding
IPUMS-International results posted at IPUMS-International results posted at http://bibliography.ipums.orghttp://bibliography.ipums.org
IPUMS MilestonesIPUMS Milestones
» 1995: IPUMS-USA first release of integrated microdata1995: IPUMS-USA first release of integrated microdata» 1999: IPUMS-International funded by NSF & NIH1999: IPUMS-International funded by NSF & NIH» 2002: 12002: 1stst International launch: 7 countries, 25 samples. International launch: 7 countries, 25 samples.» 2007 launch (562007 launch (56thth ISI): ISI): 32 89 32 89» 2009 launch (572009 launch (57thth ISI): ISI): 44 44 130 130
» ~279 million person records~279 million person records» ~3,000 registered users~3,000 registered users
» 2011 launch (582011 launch (58thth ISI): ISI): 62 62 185 185» 397 million person records397 million person records» 5,000 registered users5,000 registered users
» 2013 (ISI Hong Kong!):2013 (ISI Hong Kong!): ~70 ~70 ~225~225» ~500 million person records~500 million person records» ~7,000 registered users~7,000 registered users
Microdata
Disseminating
Integrating None entrusted
None inventoried
Microdata
Disseminating
Integrating None entrusted
None inventoried
Cartogram of IPUMS+AICMD partners weighted by populationCartogram of IPUMS+AICMD partners weighted by population
dark greendark green = integrated and disseminating = integrated and disseminating 2002-2011 2002-2011
Open Open Invitation to Invitation to Cooperate , Cooperate , Entrust and Entrust and
AccessAccess
12
The IPUMS-International teamThe IPUMS-International team(includes National Science Foundation Board)(includes National Science Foundation Board)
(Not present: some computer gurus, researchers, research assistants, civil (Not present: some computer gurus, researchers, research assistants, civil service employees, and others who were not at the NSF Board meeting)service employees, and others who were not at the NSF Board meeting)
Steven Ruggles, inventor of IPUMS, Professor of History, and Director of the Minnesota Population Center
See, pp. 3-5: See, pp. 3-5: 2012: “IPUMS and AICMD Add Significant Value to African Census Microdata,” ASSD VII,
Cape Town, South Africa, January 2012.. ..
I. I. StatisticalStatistical Confidentiality and Security Confidentiality and Security
14
1. Statistical Confidentiality and Microdata Security2. Statistical disclosure control protections3. Restricted access
MPCMPC
NSI …62+NSI …62+
NSI 1NSI 1
……..
MPC integrates MPC integrates metadata and metadata and
IPUMS-International manages access and IPUMS-International manages access and entrusts researchers with custom-entrusts researchers with custom-
tailored tailored <ddi> <ddi> , SAS, STATA, and SPSS , SAS, STATA, and SPSS metadata and microdata extracts for any metadata and microdata extracts for any
combination of countries, censuses, combination of countries, censuses, sub-populations, and variablessub-populations, and variables
Trusted Trusted researcherresearcher
Trusted Trusted researcherresearcher
……..
1. Statistical Confidentiality and security.1. Statistical Confidentiality and security.Trusted researcher receives customized extracts Trusted researcher receives customized extracts
NSI entrusts NSI entrusts census metadata census metadata and anonymized and anonymized
microdata to microdata to MPCMPC
15
» ““...the best practice for an international repository of ...the best practice for an international repository of microdata”microdata”
» ““The security of IPUMS is first class…the standard of the best The security of IPUMS is first class…the standard of the best national statistical offices”national statistical offices”
» ““...a valuable and trustworthy microdata service. ...a valuable and trustworthy microdata service. It meets the It meets the fundamental principles of good practice with respect to fundamental principles of good practice with respect to confidentiality and microdata.”confidentiality and microdata.”
» ““in full compliance with the principles and recommendations of in full compliance with the principles and recommendations of the CES [Conference of European Statisticians]”the CES [Conference of European Statisticians]”
Dennis Trewin on-site evaluation.Dennis Trewin on-site evaluation. former: former: Australian StatisticianAustralian Statistician, chair: , chair: Conference of European Conference of European
Statisticians Task Force on Microdata and ConfidentialityStatisticians Task Force on Microdata and Confidentiality
1.1. Microdata are anonymized by suppressing any names, Microdata are anonymized by suppressing any names, addresses, or precise geographic identifiers. addresses, or precise geographic identifiers.
2.2. Sample is drawn so that researchers have access to only a Sample is drawn so that researchers have access to only a minor fraction of the complete dataset. minor fraction of the complete dataset.
3.3. Disclosure protections are imposed on the sample, variable-Disclosure protections are imposed on the sample, variable-by-variable and code-by-code. by-variable and code-by-code.
4.4. A small fraction of households is swapped across geographic A small fraction of households is swapped across geographic boundaries.boundaries.
• See case of Switzerland with 5% household samples for four See case of Switzerland with 5% household samples for four censuses.censuses.
• Suppression thresholds are set by each NSO.Suppression thresholds are set by each NSO.• Great satisfaction from NSOs and researchersGreat satisfaction from NSOs and researchers
3. Restricted access: Thwarting intruders by legal and 3. Restricted access: Thwarting intruders by legal and administrative proceduresadministrative procedures
» Usage is restricted to bona-fide researchers who agree to Usage is restricted to bona-fide researchers who agree to stringent conditions of use to protect statistical confidentialitystringent conditions of use to protect statistical confidentiality
» 1,100 word application form; <5,300 word Facebook policy1,100 word application form; <5,300 word Facebook policy» Agree to 8 specific conditions of useAgree to 8 specific conditions of use» Supply extensive personal and institution details Supply extensive personal and institution details » Identify your employer’s Office for Protection of Human Subject, Identify your employer’s Office for Protection of Human Subject,
IRB, etc.IRB, etc.» Describe research detailing need for accessDescribe research detailing need for access
» Rogue intruders face legal and institutional sanctionsRogue intruders face legal and institutional sanctions» University attorney’s office is obligated to initiate sanctions against University attorney’s office is obligated to initiate sanctions against
both individual and the institutionboth individual and the institution—similar to NIH probationary status—similar to NIH probationary status
Links to Partner Statistical Agency WebsitesLinks to Partner Statistical Agency Websites
Restricted Access: User Registration and LoginRestricted Access: User Registration and Login
19
Despite the “P” (Public) in IPUMS, Despite the “P” (Public) in IPUMS, access to the microdata is access to the microdata is
restrictedrestricted. .
Thwarting intruders by legal and administrative Thwarting intruders by legal and administrative proceduresprocedures
» Usage is restricted to bona-fide researchers who agree to Usage is restricted to bona-fide researchers who agree to stringent conditions of use to protect statistical confidentialitystringent conditions of use to protect statistical confidentiality
» 1,100 word application form; <5,300 word Facebook policy1,100 word application form; <5,300 word Facebook policy» Agree to 8 specific conditions of useAgree to 8 specific conditions of use» Supply extensive personal and institution details Supply extensive personal and institution details » Identify your employer’s Office for Protection of Human Subject, Identify your employer’s Office for Protection of Human Subject,
IRB, etc.IRB, etc.» Describe research detailing need for accessDescribe research detailing need for access
» Rogue intruders face legal and institutional sanctionsRogue intruders face legal and institutional sanctions» University attorney’s office is obligated to initiate sanctions against University attorney’s office is obligated to initiate sanctions against
both individual and the institutionboth individual and the institution—similar to NIH probationary status—similar to NIH probationary status
Application form for IPUMS-IApplication form for IPUMS-Irequesting information on institutional affiliationrequesting information on institutional affiliation
Conditions of use: must agree to each one--no exceptionsConditions of use: must agree to each one--no exceptions Data must not be redistributed without authorization.
All data extracted from the IPUMS-International database are intended solely for the use of the licensee. Under IPUMS-International agreements with collaborating agencies, redistribution of the data to third parties is prohibited. Each member of a research team using the data must apply for access and be licensed individually.
The microdata are intended only for scholarly research and educational purposes. These microdata are provided for the exclusive purposes of teaching and scholarly research, and may not be used for any other purposes without explicit written approval from the relevant official statistical authority.
Commercial use and redistribution of the microdata is strictly prohibited. Users are prohibited from using microdata acquired from the Integrated Public Use Microdata Series International or other authorized distributors in the pursuit of any commercial or income-generating venture either privately, or otherwise.
Use of the microdata must follow strict rules of confidentiality. Users will maintain the confidentiality of persons and households. Any attempt to ascertain the identity of persons or households from the microdata is prohibited. Alleging that a person or household has been identified in these data is also prohibited. Statistical results that might reveal the identity of persons or entities may not be reported or published in any form.
The microdata must always be safely secured. Users will implement security measures to prevent unauthorized access to microdata acquired from Integrated Public Use Microdata Series International, its partners or authorized distributors. Upon the completion of this research, data may be retained only if they can be safely secured. If security cannot be guaranteed, the microdata must be destroyed.
Scholarly publications are permitted, and must be cited appropriately. The publishing of research results based on IPUMS-International microdata is permitted in communications such as scholarly papers, journals and the like. The authors of these communications are required to cite Integrated Public Use Microdata Series-International and the relevant official statistical authority as the source of the microdata, and to indicate that the results and views expressed are those of the author. Users are requested to provide the IPUMS-International staff with a full citation for any publications resulting from their work with these data.
Any violation of this license agreement will result in disciplinary action, including possible loss of employment. Violation of this agreement will lead to revocation of this license, recall of all microdata acquired, a motion of censure to the relevant professional organization(s) and civil prosecution under national or international statutes, at the discretion of the Regents of the University of Minnesota and the official statistical agencies. Sanctions likewise may be taken against the institution with which the violator is affiliated.
User agrees to notify [email protected] regarding errors in the data.
√
√
√
√
√
√
√
√
See, pp. 6-8: See, pp. 6-8: 2012: “IPUMS and AICMD Add Significant Value to African Census Microdata,” ASSD VII,
Bibliography: view cites, link to publicationsBibliography: view cites, link to publications
23
24
5. DDI Compatible Metadata (we share!)
25Mapped in DDI; compatible with IHSN Microdata toolkitMapped in DDI; compatible with IHSN Microdata toolkitcopies entered into the NADA catalog and archivecopies entered into the NADA catalog and archive
Top 20 institutions using IPUMS-I (Appendix 4)Top 20 institutions using IPUMS-I (Appendix 4)
38
1 University of Michigan 7422 Columbia University 7013 Universitat de Barcelona, Spain 6154 Harvard University 5895 Inter - American Development Bank 4996 Arizona State University 4957 National University of Singapore, Singapore 4678 World Bank 4089 University of California - Berkeley 362
10 Universidade Federal de Minas Gerais, Brazil 31411 University of Chicago 28512 Universidad del Valle, Colombia 27013 Institute for Health Metrics & Evaluation 26014 Princeton University 23715 University of Wisconsin - Madison 23416 Brown University 22917 University of Vienna, Austria 22918 University of Pittsburgh 22719 University of Delaware 21320 El Colegio de México, México 214
Dissemination of microdata and metadata extracts Dissemination of microdata and metadata extracts
» The massive scale of IPUMS requires users to be selective:The massive scale of IPUMS requires users to be selective:» Select country (or countries)Select country (or countries)» Select samples (census years) Select samples (census years) » Select variables (e.g., age, sex, educational attainment, etc.)Select variables (e.g., age, sex, educational attainment, etc.)» Select sub-populations (e.g., nurses)Select sub-populations (e.g., nurses)» Select sample density Select sample density
» Once an extract request is submitted, the IPUMS extract Once an extract request is submitted, the IPUMS extract engine:engine:
» Constructs the microdata extractConstructs the microdata extract» Constructs the metadataConstructs the metadata» Emails the researcher to retrieve the extract Emails the researcher to retrieve the extract
password protected, transmission is encrypted 128 bit SSLpassword protected, transmission is encrypted 128 bit SSL
» The researcher downloads the extract, un-zips and analyzesThe researcher downloads the extract, un-zips and analyzes» Extract system validated as usage has soaredExtract system validated as usage has soared
e. Analyze e. Analyze using own using own softwaresoftware
Use the extract system to “Select Cases”. Use the extract system to “Select Cases”. Example: DisabilityExample: Disability
Second: Click the box to include the variable Second: Click the box to include the variable Third: Click “select cases” box Third: Click “select cases” box
Click here, to select Click here, to select every person in every person in households containing households containing an individual with an individual with employment disabilityemployment disability
Fourth: Scroll down, select “disabled”, then Fourth: Scroll down, select “disabled”, then “Continue to next step”“Continue to next step”
2010 round censuses. Minimum Standards for 2010 round censuses. Minimum Standards for Samples Entrusted to IPUMS for disseminationSamples Entrusted to IPUMS for dissemination
1.1. Household samplesHousehold samples2.2. High precision: 5% minimum, 10% preferredHigh precision: 5% minimum, 10% preferred3.3. Broad set of variables—omit only those required for Broad set of variables—omit only those required for
statistical confidentiality (low-level geography, low frequency statistical confidentiality (low-level geography, low frequency attributes)attributes)
4.4. Detailed codesDetailed codes» Age: single year to 85 Age: single year to 85 » Occupation, industry: 3 digit ISCO, ISICOccupation, industry: 3 digit ISCO, ISIC» Country of birth: detail individual countries consistent with Country of birth: detail individual countries consistent with
» Thanks to INSEE France for sample of recensement renovee, Thanks to INSEE France for sample of recensement renovee, 2004-2008: 20 million person records launched in IPUMS-I 2004-2008: 20 million person records launched in IPUMS-I
See, pp. 11: See, pp. 11: 2012: “IPUMS and AICMD Add Significant Value to African Census Microdata,” ASSD VII,
Cape Town, South Africa, January 2012.
IV. EthicsIV. Ethics
45
13. Statistical Transparency14. Academic Freedom15. Reduce Research Fraud and Exaggeration of Results16. Share Research Results
1.1. Free, easy access to data for many countries and censusesFree, easy access to data for many countries and censuses2.2. Large sample sizes: Large sample sizes: • Make it possible to include many different variables in a Make it possible to include many different variables in a
regression… multi-level model…regression… multi-level model…• Produce separate estimates for population sub-groupsProduce separate estimates for population sub-groups• Easy to extract samples with a target sample size (e.g., 50mb)Easy to extract samples with a target sample size (e.g., 50mb)• Easy to revise an extract for a larger size or to include more Easy to revise an extract for a larger size or to include more
countries, censuses, variables or sub-populationscountries, censuses, variables or sub-populations2.2. Students show a great deal of creativity in using IPUMS-IStudents show a great deal of creativity in using IPUMS-I3.3. Skills acquired have an immediate pay-off when applying for Skills acquired have an immediate pay-off when applying for
jobs (e.g., World Bank), graduate school, etc.jobs (e.g., World Bank), graduate school, etc.
““IPUMS-I is an excellent resource for teaching…” IPUMS-I is an excellent resource for teaching…” -- Dr. David Lam-- Dr. David Lam, , president Population Association of America president Population Association of America
Africa Mirror Site: http://ecastats.uneca.org/aicmd/ Africa Mirror Site: http://ecastats.uneca.org/aicmd/
47
IPUMS-International: Free, Worldwide Microdata Access Now for Censuses of 62
Countries--80 by 2015 Robert McCaa, Steven Ruggles, Matt Sobek and Wendy L. Thomas Robert McCaa, Steven Ruggles, Matt Sobek and Wendy L. Thomas
Session STS065 The Future of Microdata AccessSession STS065 The Future of Microdata Access58th International Statistical Institute, Dublin, Ireland, 26 August, 2011
for additional details, please seefor additional details, please see::www.hist.umn.edu/~rmccaa/ipums-africa www.hist.umn.edu/~rmccaa/ipums-africa
48
““Dissemination [means] opening up the value inherent in our data”Dissemination [means] opening up the value inherent in our data” --Walter Radermacher (President, Eurostat) and Pieter Everaers (Director, Eurostat) IPUMS opens up the value inherent in census microdata.IPUMS opens up the value inherent in census microdata.for the 2010 roundfor the 2010 roundfor the 2000, 1990 and earlier rounds (where microdata exist)for the 2000, 1990 and earlier rounds (where microdata exist)And for many countriesAnd for many countries
Thank youThank you
To discuss cooperation, please discuss with To discuss cooperation, please discuss with Dr. Patricia Kelly-Hall or email: Dr. Patricia Kelly-Hall or email: