Disseminating Disseminating official statistics with a focus on official statistics with a focus on census microdata census microdata Example: IPUMS-International Example: IPUMS-International http:// www.ipums.org * * * * * * Robert McCaa, Professor of Population Robert McCaa, Professor of Population History History and Wendy L. Thomas, Archivist, and Wendy L. Thomas, Archivist, University of Minnesota Population University of Minnesota Population Center Center [email protected]This .ppt, docs, & This .ppt, docs, & additional additional information at: information at: www.hist.umn.edu/~rmccaa/ipums-africa www.hist.umn.edu/~rmccaa/ipums-africa
40
Embed
Roundtable on Archiving and Disseminating official statistics with a focus on census microdata Example: IPUMS-International * * *
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Roundtable on Archiving and DisseminatingRoundtable on Archiving and Disseminatingofficial statistics with a focus on census microdataofficial statistics with a focus on census microdata
Our common fate on a crowded planet: Our common fate on a crowded planet: new forms of global cooperation are required.new forms of global cooperation are required.
We must engage interdisciplinary research We must engage interdisciplinary research combining theory and practice.combining theory and practice.
--Jeffrey D. Sachs, --Jeffrey D. Sachs, Common WealthCommon Wealth (Penguin 2008) (Penguin 2008)
A Census Microdata RevolutionA Census Microdata Revolution
1.1. Preserve all microdata and documentation 20 slidesPreserve all microdata and documentation 20 slides
Product (tables and microdata)Product (tables and microdata)
Process (of conducting census and producing census Process (of conducting census and producing census microdata)microdata)
2.2. Integrate microdata and metadataIntegrate microdata and metadata 8 8
3.3. Disseminate to researchers world-wide 3Disseminate to researchers world-wide 3
Conclusion: strengths, challenges, 7 golden rules 4Conclusion: strengths, challenges, 7 golden rules 4
A Census Microdata RevolutionA Census Microdata Revolution
1.1. Preserve all census microdata and documentationPreserve all census microdata and documentationproduct and process:product and process: 1960s – present1960s – present ~100 countries (80 have endorsed IPUMS MoU)~100 countries (80 have endorsed IPUMS MoU) ~400 censuses (219 are entrusted to IPUMS)~400 censuses (219 are entrusted to IPUMS)
2.2. Integrate: both microdata and metadataIntegrate: both microdata and metadata
3.3. Disseminate to researchers world-wide— “extracts” Disseminate to researchers world-wide— “extracts” of database: countries, censuses, sub-populations, of database: countries, censuses, sub-populations, sample size, variables sample size, variables
IPUMS-International Today IPUMS-International Today dark greendark green = already integrated: = already integrated:
35 countries, 111 censuses, 263 million person records35 countries, 111 censuses, 263 million person recordsgreen = to be integrated: 39 countries, 103 censuses, 150 mill.green = to be integrated: 39 countries, 103 censuses, 150 mill.
Mollweide projection
IPUMS dissemination calendar (see handout)IPUMS dissemination calendar (see handout)samples for 35 countries available now, 74 soonsamples for 35 countries available now, 74 soon
» Europe 10:4Europe 10:4» Available (10):Available (10): Austria, Belarus, France, Greece, Hungary, Netherlands, Portugal, Austria, Belarus, France, Greece, Hungary, Netherlands, Portugal,
» 2002 - 12002 - 1stst International release: 7 countries, including International release: 7 countries, including Colombia and MexicoColombia and Mexico
» 2008: 35 countries, 111 censuses2008: 35 countries, 111 censuses» ~263 million person records~263 million person records» Two thousand usersTwo thousand users
» 2013: ~70 countries, ~200 censuses2013: ~70 countries, ~200 censuses» 214 sets of microdata are already entrusted to MPC214 sets of microdata are already entrusted to MPC» Coming: Germany (8), Switzerland (4), Bangladesh (2), Cuba (1)...Coming: Germany (8), Switzerland (4), Bangladesh (2), Cuba (1)...
1. Preserve (Archive)1. Preserve (Archive)IPUMS Global workshop, ISI (Lisbon, Aug 2007)IPUMS Global workshop, ISI (Lisbon, Aug 2007)
• Comprehensive preservation of both data and documentation Comprehensive preservation of both data and documentation (metadata) with easily searchable indices(metadata) with easily searchable indices
• Continually updated with technological innovation—hardware, Continually updated with technological innovation—hardware, software (doc, pdf, txt, xls, jpg, etc.) and wet-ware software (doc, pdf, txt, xls, jpg, etc.) and wet-ware
– Disseminating: the web revolution Disseminating: the web revolution
• The consumer’s perspective (researchers)The consumer’s perspective (researchers)– Access: locate and use on the web without obstacles Access: locate and use on the web without obstacles
– Disseminating: free access to anyone, anywhere, anytime Disseminating: free access to anyone, anywhere, anytime (access postponed is access denied)(access postponed is access denied)
• What are your interests?What are your interests?
Our perspective:Our perspective:• ““Archiving Census Microdata and Documentation: Archiving Census Microdata and Documentation:
Preserving Memory, Increasing Stakeholders” (UNSD Preserving Memory, Increasing Stakeholders” (UNSD NYC, 2001) – copy of paper at ~rmccaa/ipums-africaNYC, 2001) – copy of paper at ~rmccaa/ipums-africa– Long term, 7 keys: readable, intelligible, identifiable, Long term, 7 keys: readable, intelligible, identifiable,
– What to preserve: the product and the processWhat to preserve: the product and the process
– How to assess future value: stakeholders, future impact, How to assess future value: stakeholders, future impact, anticipated use, informing the futureanticipated use, informing the future
Data recovery. Example: Bangladesh Bureau of Data recovery. Example: Bangladesh Bureau of Statistics--1981 census, 276 tapes, recovery in Aug. ‘08)Statistics--1981 census, 276 tapes, recovery in Aug. ‘08)
Census Microdata: 1950sCensus Microdata: 1950sfew countries archived microdatafew countries archived microdata
(a country in green indicates microdata exist for the decade) (a country in green indicates microdata exist for the decade)see: www.hist.umn.edu/~rmccaa/IUMSI/country6.htmsee: www.hist.umn.edu/~rmccaa/IUMSI/country6.htm
Mollweide projection
Census Microdata: 1960sCensus Microdata: 1960sThe Americas: The Americas:
in the vanguard for preservation of microdatain the vanguard for preservation of microdata
Mollweide projection
Census Microdata: 1970sCensus Microdata: 1970sthe preservation of microdata was almost universal in the Americasthe preservation of microdata was almost universal in the Americas
and was becoming widespread in Europe, Africa and Asiaand was becoming widespread in Europe, Africa and Asia
Mollweide projection
Mali, 1976: Mali, 1976: census census microdata microdata recovered from recovered from old Bernoulli old Bernoulli boxesboxes
Census Microdata: 1980sCensus Microdata: 1980sThe preservation of microdata became generalizedThe preservation of microdata became generalized
Mollweide projection
Ghana, 1984: Ghana, 1984: census census microdata microdata recovered recovered from floppy from floppy discs!discs!
Census Microdata: 1990sCensus Microdata: 1990smany countries preserved microdatamany countries preserved microdata
(or are disposed to recover them) (or are disposed to recover them)
Mollweide projection
Census Microdata: 2000sCensus Microdata: 2000smany countries have microdatamany countries have microdata
(or are disposed to make them available for research) (or are disposed to make them available for research)
Mollweide projection
Inventory of census microdata archived by region Inventory of census microdata archived by region and decade (% of censuses conducted)and decade (% of censuses conducted)
•Note: cases confirmed by the corresponding official statistical institute. Some Note: cases confirmed by the corresponding official statistical institute. Some datasets remain to be certified. Some countries have not responded to the invitation to datasets remain to be certified. Some countries have not responded to the invitation to inventory their stocks of data. inventory their stocks of data. Source: http://www.hist.umn.edu/~rmccaa/IPUMS/country6.htmSource: http://www.hist.umn.edu/~rmccaa/IPUMS/country6.htm
Region/continent Countries 2000s 1990s 1980s 1970s 1960s
Latin America 21 100% 100% 89% 81% 72%
North America 27 91% 72% 64% 24% 8%
Africa 58 15% 22% 25% 15% 2%
Asia 44 ?% 54% 31% 30% 13%
Europe 46 ?% 67% 55% 41% 13%
Pacific(pob>.5m) 7 100% 100% 100% 43% 29%
1.1. Census Questionnaires (forms): dwellings, Census Questionnaires (forms): dwellings, households, persons, mortality, migration, etc.households, persons, mortality, migration, etc.
3.3. Data Dictionaries (layouts)Data Dictionaries (layouts)
4.4. CodebooksCodebooks
a.a. Geographic codesGeographic codes
b.b. Occupation / Industry / Education codesOccupation / Industry / Education codes
5.5. Data processing protocolsData processing protocols
6.6. Official StatisticsOfficial Statistics
7.7. Official Reports (Analytical, Technical, Methdological) Official Reports (Analytical, Technical, Methdological)
7 Essential Types of Metadata for Each Census7 Essential Types of Metadata for Each CensusSee IPUMS Documentation (“Table 1”)See IPUMS Documentation (“Table 1”)
7 Essential Types of Metadata for Each Census7 Essential Types of Metadata for Each CensusExample: Ghana Example: Ghana
2. Integration: 2. Integration: Microdata and MetadataMicrodata and Metadata
IPUMS integration of metadata and IPUMS integration of metadata and microdatamicrodata
» Comprehensive documentation, including Comprehensive documentation, including » Data dictionaries and codebooksData dictionaries and codebooks
» Complete original source documentation in the official Complete original source documentation in the official language:language: questionnaires, manuals, etc. questionnaires, manuals, etc.
» All translated to English All translated to English (from the German--thanks again to (from the German--thanks again to Martin Podehl!!)Martin Podehl!!) and converted into metadatabase for each and converted into metadatabase for each censuscensus
retains not only significant distinctions retains not only significant distinctions but also integrates comparable conceptsbut also integrates comparable concepts
retains not only significant distinctions retains not only significant distinctions but also integrates comparable conceptsbut also integrates comparable concepts
Goal of integration coding scheme: Goal of integration coding scheme: Assist each researcher in making informed Assist each researcher in making informed decisions on comparability—not to attempt decisions on comparability—not to attempt to make the one best decision for all to make the one best decision for all researchers.researchers.
IPUMSI IPUMSI Col Col Fra Fra Ken Mex Mex US Viet Viet
Note: In the source data columns: a comma indicates more than one code was coded to the respective IPUMS-International
value; an asterisk means programming logic was used; B indicates a blank in the source data.
Translation Table for Employment Status
Harmonized Codes and Labels Source Data Codes (selected samples) MetadataMetadata: Employment Status: Employment Status
EMPSTATEMPSTATEmployment statusEmployment status
DescriptionDescriptionEMPSTAT indicates whether or not the respondent was part of the labor force -- EMPSTAT indicates whether or not the respondent was part of the labor force -- working or seeking work -- over a specified period of time. Depending on the sample, working or seeking work -- over a specified period of time. Depending on the sample, EMPSTAT can also convey further information.EMPSTAT can also convey further information.
The first digit of EMPSTAT is fully comparable, and classifies the population into three The first digit of EMPSTAT is fully comparable, and classifies the population into three groups: employed, unemployed, and inactive. The combination of employed and groups: employed, unemployed, and inactive. The combination of employed and unemployed yields the total labor force. The second and third digits of EMPSTAT unemployed yields the total labor force. The second and third digits of EMPSTAT preserve additional information available for some countries and census years but not preserve additional information available for some countries and census years but not for others.for others.
Employment status is sometimes referred to in other sources as "activity status."Employment status is sometimes referred to in other sources as "activity status."
Comparability -- GeneralComparability -- GeneralThe age of persons to whom the question applies varies across the samples (see The age of persons to whom the question applies varies across the samples (see Universe). Universe).
The reference period for the employment status question varies. For most samples, The reference period for the employment status question varies. For most samples, employment status was reported with respect to the day of the census or…employment status was reported with respect to the day of the census or…
IPUMSI IPUMSI Col Col Fra Fra Ken Mex Mex US Viet Viet
Note: In the source data columns: a comma indicates more than one code was coded to the respective IPUMS-International
value; an asterisk means programming logic was used; B indicates a blank in the source data.
Translation Table for Employment Status
Harmonized Codes and Labels Source Data Codes (selected samples) MetadataMetadata: Employment Status, example: Mexico: Employment Status, example: Mexico
Comparability -- MexicoComparability -- MexicoThe universe and reference period are fully comparable across the Mexico samples. The universe and reference period are fully comparable across the Mexico samples.
The 1970 Census did not provide detail on the inactive population except for The 1970 Census did not provide detail on the inactive population except for "houseworkers," while the later samples have numerous subcategories."houseworkers," while the later samples have numerous subcategories.
In 1990, the employment status question refers to "Principal Activity" and therefore under-In 1990, the employment status question refers to "Principal Activity" and therefore under-reports secondary economic activity by students, housewives, family-workers, the semi-reports secondary economic activity by students, housewives, family-workers, the semi-retired, and others.retired, and others.
The 2000 Census sought to overcome deficiencies in reporting work status for people whose The 2000 Census sought to overcome deficiencies in reporting work status for people whose primary activity was not work (students, housewives, retirees, etc.), but who in fact were primary activity was not work (students, housewives, retirees, etc.), but who in fact were working according to international definitions. A second question introduced for the first working according to international definitions. A second question introduced for the first time in 2000 sought to capture this secondary economic activity. For strict comparability time in 2000 sought to capture this secondary economic activity. For strict comparability with earlier Mexican censuses, this recovered activity (codes 1101-1106) should be with earlier Mexican censuses, this recovered activity (codes 1101-1106) should be considered "inactive."considered "inactive."……
Integrate: retain all significant detail, harmonize everythingIntegrate: retain all significant detail, harmonize everythingNot standardize: force square pegs in round holesNot standardize: force square pegs in round holes
IPUMS integrated metadata: Instantly, compare text &/or IPUMS integrated metadata: Instantly, compare text &/or image of enumeration forms and instructions for any image of enumeration forms and instructions for any
combination of countries and censuses (example: combination of countries and censuses (example: educational attainment)educational attainment)
In addition…In addition…
»Microdata: new high precision samples not Microdata: new high precision samples not only for contemporary censuses but also for only for contemporary censuses but also for historical ones (before the 90s)historical ones (before the 90s)
» Systematic metadata for all variablesSystematic metadata for all variables»UniversesUniverses»DefinitionsDefinitions»Comparability Comparability »Dynamic System—facilitates comparing the Dynamic System—facilitates comparing the
wording of questionnaires and instructions for any wording of questionnaires and instructions for any combination of countries and censusescombination of countries and censuses
3. Dissemination3. Dissemination
- Caution -- Caution -
• IPUMS microdata are anonymized samples.IPUMS microdata are anonymized samples.– They are for advanced analysis and research. They are for advanced analysis and research. – Use of a statistical software is required.Use of a statistical software is required.– Statistical software provides great power.Statistical software provides great power.– “ “With great power, comes great responsibility.”With great power, comes great responsibility.”
• IPUMS samples are for analysis.IPUMS samples are for analysis.• IPUMS samples are IPUMS samples are not not official statistics.official statistics.
Conclusion: Conclusion: IPUMS Strengths and Challenges plus IPUMS Strengths and Challenges plus 7 golden rules for promoting microdata 7 golden rules for promoting microdata
revolutionrevolution
The IPUMS team (Feb. 2008) The IPUMS team (Feb. 2008)
(Not present: computer gurus, some researchers, (Not present: computer gurus, some researchers, and others who were too busy for a photo!)and others who were too busy for a photo!)
Steven Ruggles, inventor of IPUMS, Professor of History, and Director of the Minnesota Population Center
1.1. Uniform legal authorization with national statistical Uniform legal authorization with national statistical authorities authorities
2.2. Access restricted to academics with need who agree to abide Access restricted to academics with need who agree to abide by stringent confidentiality protectionsby stringent confidentiality protections
3.3. Sanctions against individual and institution—denial of access Sanctions against individual and institution—denial of access to all microdata for the entire institutionto all microdata for the entire institution
4.4. Experienced integration teamsExperienced integration teams
5.5. Proven web-based distribution systemProven web-based distribution system
6.6. High user satisfaction with microdata & metadataHigh user satisfaction with microdata & metadata
1.1. Microdata to recover (30 countries), integrate Microdata to recover (30 countries), integrate (60 countries)(60 countries)
2.2. 2010 round of censuses (~100 countries) 2010 round of censuses (~100 countries)
3.3. Tabulator (research tool—not official stats) Tabulator (research tool—not official stats)
4.4. GISGIS
5.5. High security laboratory for sensitive, High security laboratory for sensitive, comprehensive microdatacomprehensive microdata
1.1. Respect “restricted-access” conditions of use: Respect “restricted-access” conditions of use: » protect confidentialityprotect confidentiality
» ““share” data only with registered users share” data only with registered users
2.2. Study both source documentation and metadata: Study both source documentation and metadata: » Original source: census forms, instructions to enumerators, etc.Original source: census forms, instructions to enumerators, etc.
3.3. Construct extracts judiciously:Construct extracts judiciously:» extract only needed countries, censuses, variables, sub-pops extract only needed countries, censuses, variables, sub-pops
» use sample size &/or “subsamp” features to keep samples smalluse sample size &/or “subsamp” features to keep samples small
4.4. Use weights:Use weights:either households or individuals (geographical strata = power)either households or individuals (geographical strata = power)
5.5. Analyze carefully:Analyze carefully:proper statistical techniques, keeping in mind data quality, sample errorproper statistical techniques, keeping in mind data quality, sample error
6.6. Cite properly: Cite properly: IPUMSIPUMS and National Statistical Agencies and National Statistical Agencies
7.7. Share publications: Share publications: IPUMSIPUMS and National Statistical Agenciesand National Statistical Agencies
7 golden rules for 7 golden rules for the global microdata revolutionthe global microdata revolution