Part II – Introduction to SILC Data Structure and Documentation DwB Training Course on EU-SILC Longitudinal data Paris, 19-21 February 2014 Heike Wirth
Mar 22, 2016
Part II – Introduction to SILC
Data Structure and Documentation
DwB Training Course on EU-SILC Longitudinal dataParis, 19-21 February 2014
Heike Wirth
2
• Introduce the rotational design• Explain the concept of the selected respondent• Explain the organisation of the data• Point out some reading: Documents of priority
Aims of this session
3
Illustration of the rotational design
4
Rotational design - Illustration
Initial sample
2006
5
Rotational design – Illustration cross-sectional
2006
6
Rotational design – Illustration longitudinal
7
Rotational design – Illustration longitudinal
e.g. longitudinal data 2011
2006
8
Rotational design – empirical Not equivalent to the number of years of participation
10
Rotational design – empirical tab DB075 HHYNR
HHYNR (number of hh-year)
HHYNR(= number of household year) is not included in the data, must be createdSource: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations
11
Rotational design - empirical
HHYNR(= number of household year) is not included in the data, must be createdSource: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations
HHYNR (number of hh-year)
tab HHYNR YEAR
12
Rotational design - empirical
HHYCOUNT (= count of household-years) is not included in the data, must be createdSource: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations
tab HHYCOUNT HHYNR
HHYCOUNTHHYNR
13
Observation UnitsConcept of the selected respondent
14
Selected respondent
Collection unit/data source
Type of information Observation unit Survey countries Register countries
Social exclusion, housing, childcare … Household (HH) HH-Respondent Registers/HH-R
Basic demographic personal data All HH-members HH-Respondent Registers/HH-R
Basic personal data on education, labour information, income …
All HH-members aged 16+
All HH-members aged
16+Registers/HH-R
Detailed personal data on health, access to health care, labour market activity …
All HH-members aged 16+ orSelected respondent
All HH-members aged
16+
Selected respondent(One person 16+ per
Household)
15
16
Example: PH030- Limitation in activities because of health problems (register countries)
Source: UDB_l11P_ver 2011-1 from 01-08-2013.dta
(mainly) not selected respondents (see PH030_F)
17
Organisation of the data
18
EU-SILC consists of 4 separate files for the cross-sectional data
Organisation of the data
Household Register FILE
Household Data FILE
Personal Register FILE
Personal Data FILE
19
… and of 4 separate data files for the longitudinal data
Organisation of the data
Household Register FILE
Household Data FILE
Personal Register FILE
Personal Data FILE
20
Household Files- longitudinal Household Register
D-File
• Includes every selected household (also those where the address could not be contacted or which could not be interviewed)
> 19 variables: household
identifier, sampling design information, region
Household DataH-File
• Only households which have been contacted and completed a hh interview and at least one hh member has complete data in the personal data file
> 180 variables (incl. flag-variables & imputation-factors): basic data, social exclusion, income, housing
UDB_l11D_ver 2011-1 from 01-08-2013: N = 542 942 households
UDB_l11H_ver 2011-1 from 01-08-2013: N = 411 189 households
21
Personal Files - longitudinalPersonal Register
R-File
• Every person currently living in hh or temporarily absent.
Longitudinal file: also persons registered in the R-File of the previous year or living at least 3 months in the hh during the income reference period.
> 50 variables (incl. flag variables): basic information e.g. relationship between household members
Personal DataP-File
• Only reference population (persons aged 16 and over) and only persons for whom the information could be completed by interview (personal/proxy) and/or register
> 190 variables (incl. flag variables & imputation factors): e.g. demographic, income, work and unemployment
UDB_l11R_ver 2011-1 from 01-08-2013 N=1,079,261 persons
UDB_l11P_ver 2011-1 from 01-08-2013; N= 879,720 persons
22
Personal Register
Personal Data
Household Register
Household Data
Depending on the research question: Use of separate datasets
23
Personal Register
Personal Data
Household Register
Household Data
…. or a combination of different datasets
24
While for both, c-s and longitudinal data all 4 files are linkable among each other, c-s and longitudinal data are not linkable
Organisation of the data
Personal Register
PersonalData
Household Register
Household Data
Personal Register
PersonalData
Household Register
Household Data
cross-sectional data longitudinal data
25
… as well as cross-sectional data are not linkable over time (HH-ID and related identifaction variables are randomized)
Organisation of the data
Personal Register
PersonalData
HHRegister
HHData
t
Personal Register
PersonalData
HHRegister
hhData
t+1
26
• In order to link (combine) the four files D, H, R and P among each others all observations must have a unique link to the respective three other files
This link is achieved by the following 4 key variables (1) Year of Survey (2) Country (3) Household ID (4) Personal ID
Organisation of the data… combine different datasets – Key Variables
27
Organisation of the data… combine different datasets – Key Variables
Personal Register
Personal Data
Household Register
Household Data
Year of SurveyCountry Household IDPersonal ID
Year of SurveyCountry Household ID
Year of SurveyCountry Household ID
28
• Household ID • Cross-sectional (max. 6 digits) = hh number 1-999999 • Longitudinal (max. 8 digits) = hh number 1-999999 + split number
Default split number = 00
• Personal ID• Cross-sectional = hh-id + personal number (max 2 digits)• Longitudinal = hh number + default split number (00) + personal number
In the longitudinal survey the Personal ID never changes, even if the person moves to a different household
in the cross-sectional survey, from year to year the Household ID and Personal ID may change
Organisation of the data Household ID – Personal ID
29
The 4 key variables – illustration (longitudinal data)year country hh_id pers_id year of birth2010 A 40017100 4001710001 19372010 A 40017100 4001710002 19392011 A 40017100 4001710001 19372011 A 40017100 4001710002 19392009 B 40017100 4001710001 19532009 B 40017100 4001710002 19562009 B 40017100 4001710003 19822009 B 40017100 4001710004 19842009 B 40017100 4001710005 19852010 B 40017100 4001710001 19532010 B 40017100 4001710002 19562010 B 40017100 4001710003 19822010 B 40017100 4001710004 19842010 B 40017100 4001710005 19852010 B 40017101 4001710003 19822010 B 40017101 4001710004 19842011 B 40017100 4001710001 19532011 B 40017100 4001710002 19562011 B 40017100 4001710005 19852011 B 40017101 4001710002 19562011 B 40017101 4001710003 19822011 B 40017101 4001710004 1984
30
Combining information from two separate files at a 1:1 level
31
Combined data
32
Combining information from two separate files at a 1:n level
33
Combined data
34
Create household level variables from personal level data, e.g.
• number of current household members• persons < 18 in household• age of the youngest child in household• Number of unemployed hh-members• Highest educational level in household • …
Use of separate sub datasets
35
new hh-level variables added from hh-datayear country hh_id pers_id RX010 hhsize numchild ychild HX0802010 a 6800 680001 36 3 1 17 02010 a 6800 680002 35 3 1 17 02010 a 6800 680003 17 3 1 17 02011 a 6800 680001 36 3 0 . 02011 a 6800 680002 36 3 0 . 02011 a 6800 680003 18 3 0 . 02011 b 6800 680001 69 2 0 . 02011 b 6800 680002 73 2 0 . 02010 b 7000 700001 80 2 0 . 02010 b 7000 700002 80 2 0 . 02008 c 7000 700001 42 3 1 2 12008 c 7000 700002 34 3 1 2 02008 c 7000 700003 2 3 1 2 02009 c 7000 700001 43 3 1 3 02009 c 7000 700002 35 3 1 3 02009 c 7000 700003 3 3 1 3 02010 c 7000 700001 44 3 1 4 12010 c 7000 700002 36 3 1 4 12010 c 7000 700003 4 3 1 4 12011 c 7000 700001 45 4 2 0 12011 c 7000 700002 37 4 2 0 12011 c 7000 700003 5 4 2 0 12011 c 7000 700004 0 4 2 0 1
Create new household level summary variables from person level information, e.g. household size, number of children, age of youngest child (< 18 years)
36
Some reading – Documents of priority
37
Guidelines_Doc65_2011.pdf • General technical information on sample design, weights, etc.• List of all variables included in the original EU-SILC data base• Description of (cross-sectional and longitudinal) variables
DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc • List of variables removed or added to Userdata Base (UDB)• Methods of anonymisation
SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls
National and EU Quality reports • http://epp.eurostat.ec.europa.eu/portal/page/portal/
income_social_inclusion_living_conditions/quality
Some reading – Documents of priority
38
Some reading – Documents of priorityGuidelines_Doc65_2011.pdf
Source: Guidelines_Doc65_2011.pdf
39
Some reading – Documents of priority
Flag VariableHH020_F
Source: Guidelines_Doc65_2011.pdf
40
Some reading – Documents of priority
Flag VariableHH021_F
Source: Guidelines_Doc65_2011.pdf
41
Some reading – Documents of priorityCross-sectional data 2011
Source: UDB_c11H_ver 2011-2 from 01-08-13.dta
42
Some reading – Documents of priorityLongitudinal data 2011
Source: UDB_l11H_ver 2011-1 from 01-08-2013.dta
New (HH021)
Old
(HH
020)
43
Some reading – Documents of priorityExample: variable included in the cross-sectional and longitudinal data
Source: Guidelines_Doc65_2011.pdf
44
Some reading – Documents of priorityExample: variable included in the cross-sectional only
Source: Guidelines_Doc65_2011.pdf
45
Some reading – Documents of priorityExample: variable included in longitudinal data only
Source: Guidelines_Doc65_2011.pdf
46
Some reading – Documents of priorityExample: selected respondent
Source: Guidelines_Doc65_2011.pdf
47
Some reading – Documents of priorityDifferences between data collected and Userdata Base (cross-sectional file)
48
Some reading – Documents of priorityDifferences between data collected and Userdata Base (longitudinal file)
Source: L2011 DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc
49
Some reading – Documents of priorityDifferences between data collected and Userdata Base (cross-sectional file)
50
Some reading – Documents of priorityDifferences between data collected and Userdata Base (longitudinal file)
51
Some reading – Documents of prioritySILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls
Source: SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls
52
Some reading – Documents of priorityQuality reports
53
Data Structure – Some readingNational quality reports
54
Data Structure – Some readingE.G. Austria: Final Quality Report Relating to the EU-SILC Operation 2007-2010
55Source: Austria, Final Quality Report Relating to the EU-SILC Operation 2007-2010, p. 7