Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle, Miriam King, Matthew Sobek Minnesota Population Center, University of Minnesota [email protected]
92
Embed
Data Projects at the Minnesota Population Center Resources for Comparative Population and Health Research Seattle, Washington May 22, 2014 Elizabeth Boyle,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Data Projects at the Minnesota Population Center
Resources for Comparative Population and Health Research
Seattle, WashingtonMay 22, 2014
Elizabeth Boyle, Miriam King, Matthew Sobek Minnesota Population Center, University of Minnesota
We build data infrastructure for research community. Specialize in data harmonization.
World’s largest collection of individual population and health data, across 9 projects.
50,000 registered users from over 100 countries.
Free
Minnesota Population Center
MPC Data Dissemination, 1993-2012
Gigabytes per week
MPC Data Projects
The Problem
1. Combining data from multiple sources is time consuming
Discovery Data management
2. It’s error prone Recoding data Overlook documentation
3. Hard to replicate results
4. Discourages comparative research
Outline
Harmonization methods
Dissemination system
International projects Integrated DHS Terra Populus IPUMS-International
Terminology
Harmonization:
Combining datasets collected at different times or places into a single, consistent data series.
“Integration”
Metadata:
Data about data. Documentation in broadest sense.
Relation to head
Marital status Education Occupation
Microdata
Summary Data
Harmonization Methods
Metadata
Data
Dissemination
Systematize Metadata(record layout file, pdf)
MPC Data DictionaryVariable Start Width Value Var ValueLabel Frequency Universe
SMOKE100 57 1 Ever smoked 100 cigarettes All persons
1 Yes 54,189
2 No 59,501
7 Don't know/Not sure 205
9 Refused 39
SMOKENOW 58 1 Smoke cigarettes now Persons who ever smoked
1 Yes 25,644
2 No 28,535
7 Don't know/Not sure 0
9 Refused 10
Blank [no label] 59,745
SMOKE30 59 2 Number of days smoked in the last 30 Persons who currently smoke
1 to 30 Number of days 25,290
77 Don't know/Not sure 293
88 None 49
99 Refused 12
Blank [no label] 88,290
SMOKENUM 61 2 Number of cigarettes smoked per day Persons who currently smoke
0 to 76 Number of cigarettes 22,292
77 Don't know/Not sure 248
99 Refused 43
Blank [no label] 91,351
WaterAccess
Convert Questionnaires to Metadata(Mexico 2000)
5. Number of Rooms
How many rooms are used for sleeping without counting hallways? _____ Write the number
Without counting the hallways or bathrooms how many total rooms are in this dwelling? Count the kitchen
_____Write the number
6. Access to water
Read all of the options until you get an affirmative answer. Circle only one answer
1 Running water inside the dwelling 2 Running water outside the dwelling but on the land 3 Running water from a public faucet or hydrant 4 Running water that is carried from another dwelling 5 Tanked in by truck 6 Water from a well, river, lake, stream or other
Answers 3, 4, 5, 6 continue with number 8
7. Water supply
How many days of the week is water available? Circle only one answer
1 Daily 2 Every third day 3 Twice a week 4 Once a week 5 Occasionally
Metadata: Questionnaire Text
Water access
Bedrooms
Rooms
XML-Tagged Questionnaire Text
Data: Variable Harmonization
Marital Status: IPUMS-International
Bangladesh 2011
1 = Unmarried
2 = Married
3 = Widowed
4 = Divorced/separated
Mexico 1970
1 = Married, civil & relig
2 = Married, civil
3 = Married, religious
4 = Consensual union
5 = Widowed
6 = Divorced
7 = Separated
8 = Single
Kenya 1999
1 = Never married
2 = Monogamous
3 = Polygamous
4 = Widowed
5 = Divorced
6 = Separated
Translation TableInput
Bangladesh
2011
4 = Divrc or separated
1 = Unmarried
2 = Married
3 = Widowed
Mexico1970
1 = Married, civil & relig2 = Married, civil
3 = Married, religious
4 = Consensual union
5 = Widowed
6 = Divorced
7 = Separated
8 = Single
Kenya1999
1 = Never married
2 = Monogamous
3 = Polygamous
4 = Widowed
5 = Divorced
6 = Separated
LabelCode
Translation TableHarmonized
1 = Never married1 = Married, civil & relig
4 = Divrc or separated
1 = Unmarried
2 = Married
3 = Widowed
2 = Married, civil
3 = Married, religious
4 = Consensual union
5 = Widowed
6 = Divorced
7 = Separated
8 = Single
Single
Married or in union
Married, formally
Civil
Religious
Civil and religious
Monogamous
Polygamous
Consensual union
Separated
Divorced
2 = Monogamous
3 = Polygamous
4 = Widowed
5 = Divorced
6 = Separated
1 0 0
2 0 0
2 1 0
2 1 1
2 1 2
2 1 3
2 1 4
2 1 5
2 2 0
0 0
3 1 0
3 2 0
0 0
Mexico1970
Input
Bangladesh
2011Kenya1999
Divorced or separated3
Widowed4
LabelCode
Translation TableHarmonized
1 = Never married
1 = Married, civil & relig
4 = Divrc or separated
1 = Unmarried
2 = Married
3 = Widowed
2 = Married, civil
3 = Married, religious
4 = Consensual union
5 = Widowed
6 = Divorced
7 = Separated
8 = SingleSingle
Married or in union
Married, formally
Civil
Religious
Civil and religious
Monogamous
Polygamous
Consensual union
Separated
Divorced
2 = Monogamous
3 = Polygamous
4 = Widowed
5 = Divorced
6 = Separated
1 0 0
2 0 0
2 1 0
2 1 1
2 1 2
2 1 3
2 1 4
2 1 5
2 2 0
0 0
3 1 0
3 2 0
0 0
Mexico1970
Input
Bangladesh
2011Kenya1999
Divorced or separated3
Widowed4
Data Dissemination System
Data Dissemination System
Variables Page
Variables Page
238 censuses
Sample Filtering
Variables Page – Filtered
Variable Page: Marital Status
Variable Codes(Marital status)
Variable Codes(Marital status)
Variable Codes(Marital status)
Variable Page: Marital Status
Variable Comparability Discussion(Marital status)
Variable Page: Documentation
Questionnaire Text
Questionnaire Text(Marital status, Cambodia)
Variables Page
Extract Summary
Case Selection
Age of spouse
Employment status of father
Occupation of father
Attached Characteristics
Extract Summary
Download or Revise Extract
On-line Analysis
The International Projects
Integrated DHS
Foremost source of health information for the developing world
Funded by USAID
Since 1980s, over 300 surveys, 90 countries
Topics: fertility, nutrition, HIV, malaria, maternal and child health, etc
Demographic and Health Surveys
5-year NIH grant (end of year 2)
Focus on Africa, with India
Partnership with ICF-International and USAID
IDHS Project
Motivation: DHS is incredibly valuable, but it’s hard to capitalize on its full potential.
Problem:
Data discovery
Dispersed documentation
Data management
Variable changes over time
Not unique to DHS: endemic to any survey that’s persisted over decades.
Why an Integrated DHS?
DHS Research Process Example: Find data on female genital cutting
Survey Search Tool
Recode notes
Data dictionary
Just the woman file – for one survey. 61 to go.
Still need Report (377 page pdf)
• Contains questionnaire and sample design information
• Errata file
DHS “Recode Variables” make it more harmonized than most surveys Consistent variable names Each DHS phase has a shared model questionnaire
But:
6 phases over 25+ years
Country control over final wording of surveys
Country-specific variables
The recode variables can be a two-edged sword
At least the DHS variables are alreadyharmonized, right?
100 Muslim/Islam 4 = Muslim 7 = Moslem 1 = Muslim 2 = Muslim200 Christian 2 = Christian 3 = Christian201 Catholic 2 = Catholic 1 = Catholic202 Protestant 1 = Protestant203 Anglican 2 = Anglican204 Methodist 3 = Methodist205 Presbyterian 4 = Presbyterian206 Pentacostal 5 = Pentecostal208 Other Christian 3 = Other Christian 6 = Other Christian300 Other301 Hindu 0 = Hindu 1 = Hindu302 Sikh 3 = Sikh 4 = Sikh303 Buddhist 5 = Buddhist302 Jain 6 = Jain305 Jewish 7 = Jewish306 Parsi/Zoroastrian 8 = Parsi/Zoroastrian307 Doni-Polo 10 = Donyi polo400 Traditional/spiritual 8 = Trad/spiritualist401 Traditional 5 = Traditional402 Spiritual403 Animist500 No religion 0 = No religion 9 = No religion 9 = No religion600 Other 96 = Other 4 = Other 96 = Other