Top Banner
Part II – Introduction to SILC Data Structure and Documentation DwB Training Course on EU-SILC Longitudinal data Paris, 19-21 February 2014 Heike Wirth
55

Part II – Introduction to SILC Data Structure and Documentation

Mar 22, 2016

Download

Documents

dore

Part II – Introduction to SILC Data Structure and Documentation. DwB Training Course on EU-SILC Longitudinal data Paris, 19-21 February 2014 Heike Wirth. Aims of this session. Introduce the rotational design Explain the concept of the selected respondent - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Part II –  Introduction to  SILC Data  Structure and Documentation

Part II – Introduction to SILC

Data Structure and Documentation

DwB Training Course on EU-SILC Longitudinal dataParis, 19-21 February 2014

Heike Wirth

Page 2: Part II –  Introduction to  SILC Data  Structure and Documentation

2

• Introduce the rotational design• Explain the concept of the selected respondent• Explain the organisation of the data• Point out some reading: Documents of priority

Aims of this session

Page 3: Part II –  Introduction to  SILC Data  Structure and Documentation

3

Illustration of the rotational design

Page 4: Part II –  Introduction to  SILC Data  Structure and Documentation

4

Rotational design - Illustration

Initial sample

2006

Page 5: Part II –  Introduction to  SILC Data  Structure and Documentation

5

Rotational design – Illustration cross-sectional

2006

Page 6: Part II –  Introduction to  SILC Data  Structure and Documentation

6

Rotational design – Illustration longitudinal

Page 7: Part II –  Introduction to  SILC Data  Structure and Documentation

7

Rotational design – Illustration longitudinal

e.g. longitudinal data 2011

2006

Page 8: Part II –  Introduction to  SILC Data  Structure and Documentation

8

Rotational design – empirical Not equivalent to the number of years of participation

Page 9: Part II –  Introduction to  SILC Data  Structure and Documentation

10

Rotational design – empirical tab DB075 HHYNR

HHYNR (number of hh-year)

HHYNR(= number of household year) is not included in the data, must be createdSource: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations

Page 10: Part II –  Introduction to  SILC Data  Structure and Documentation

11

Rotational design - empirical

HHYNR(= number of household year) is not included in the data, must be createdSource: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations

HHYNR (number of hh-year)

tab HHYNR YEAR

Page 11: Part II –  Introduction to  SILC Data  Structure and Documentation

12

Rotational design - empirical

HHYCOUNT (= count of household-years) is not included in the data, must be createdSource: UDB_l11D_ver 2011-1 from 01-08-2013.dta; own calculations

tab HHYCOUNT HHYNR

HHYCOUNTHHYNR

Page 12: Part II –  Introduction to  SILC Data  Structure and Documentation

13

Observation UnitsConcept of the selected respondent

Page 13: Part II –  Introduction to  SILC Data  Structure and Documentation

14

Selected respondent

Collection unit/data source

Type of information Observation unit Survey countries Register countries

Social exclusion, housing, childcare … Household (HH) HH-Respondent Registers/HH-R

Basic demographic personal data All HH-members HH-Respondent Registers/HH-R

Basic personal data on education, labour information, income …

All HH-members aged 16+

All HH-members aged

16+Registers/HH-R

Detailed personal data on health, access to health care, labour market activity …

All HH-members aged 16+ orSelected respondent

All HH-members aged

16+

Selected respondent(One person 16+ per

Household)

Page 14: Part II –  Introduction to  SILC Data  Structure and Documentation

15

Page 15: Part II –  Introduction to  SILC Data  Structure and Documentation

16

Example: PH030- Limitation in activities because of health problems (register countries)

Source: UDB_l11P_ver 2011-1 from 01-08-2013.dta

(mainly) not selected respondents (see PH030_F)

Page 16: Part II –  Introduction to  SILC Data  Structure and Documentation

17

Organisation of the data

Page 17: Part II –  Introduction to  SILC Data  Structure and Documentation

18

EU-SILC consists of 4 separate files for the cross-sectional data

Organisation of the data

Household Register FILE

Household Data FILE

Personal Register FILE

Personal Data FILE

Page 18: Part II –  Introduction to  SILC Data  Structure and Documentation

19

… and of 4 separate data files for the longitudinal data

Organisation of the data

Household Register FILE

Household Data FILE

Personal Register FILE

Personal Data FILE

Page 19: Part II –  Introduction to  SILC Data  Structure and Documentation

20

Household Files- longitudinal Household Register

D-File

• Includes every selected household (also those where the address could not be contacted or which could not be interviewed)

> 19 variables: household

identifier, sampling design information, region

Household DataH-File

• Only households which have been contacted and completed a hh interview and at least one hh member has complete data in the personal data file

> 180 variables (incl. flag-variables & imputation-factors): basic data, social exclusion, income, housing

UDB_l11D_ver 2011-1 from 01-08-2013: N = 542 942 households

UDB_l11H_ver 2011-1 from 01-08-2013: N = 411 189 households

Page 20: Part II –  Introduction to  SILC Data  Structure and Documentation

21

Personal Files - longitudinalPersonal Register

R-File

• Every person currently living in hh or temporarily absent.

Longitudinal file: also persons registered in the R-File of the previous year or living at least 3 months in the hh during the income reference period.

> 50 variables (incl. flag variables): basic information e.g. relationship between household members

Personal DataP-File

• Only reference population (persons aged 16 and over) and only persons for whom the information could be completed by interview (personal/proxy) and/or register

> 190 variables (incl. flag variables & imputation factors): e.g. demographic, income, work and unemployment

UDB_l11R_ver 2011-1 from 01-08-2013 N=1,079,261 persons

UDB_l11P_ver 2011-1 from 01-08-2013; N= 879,720 persons

Page 21: Part II –  Introduction to  SILC Data  Structure and Documentation

22

Personal Register

Personal Data

Household Register

Household Data

Depending on the research question: Use of separate datasets

Page 22: Part II –  Introduction to  SILC Data  Structure and Documentation

23

Personal Register

Personal Data

Household Register

Household Data

…. or a combination of different datasets

Page 23: Part II –  Introduction to  SILC Data  Structure and Documentation

24

While for both, c-s and longitudinal data all 4 files are linkable among each other, c-s and longitudinal data are not linkable

Organisation of the data

Personal Register

PersonalData

Household Register

Household Data

Personal Register

PersonalData

Household Register

Household Data

cross-sectional data longitudinal data

Page 24: Part II –  Introduction to  SILC Data  Structure and Documentation

25

… as well as cross-sectional data are not linkable over time (HH-ID and related identifaction variables are randomized)

Organisation of the data

Personal Register

PersonalData

HHRegister

HHData

t

Personal Register

PersonalData

HHRegister

hhData

t+1

Page 25: Part II –  Introduction to  SILC Data  Structure and Documentation

26

• In order to link (combine) the four files D, H, R and P among each others all observations must have a unique link to the respective three other files

This link is achieved by the following 4 key variables (1) Year of Survey (2) Country (3) Household ID (4) Personal ID

Organisation of the data… combine different datasets – Key Variables

Page 26: Part II –  Introduction to  SILC Data  Structure and Documentation

27

Organisation of the data… combine different datasets – Key Variables

Personal Register

Personal Data

Household Register

Household Data

Year of SurveyCountry Household IDPersonal ID

Year of SurveyCountry Household ID

Year of SurveyCountry Household ID

Page 27: Part II –  Introduction to  SILC Data  Structure and Documentation

28

• Household ID • Cross-sectional (max. 6 digits) = hh number 1-999999 • Longitudinal (max. 8 digits) = hh number 1-999999 + split number

Default split number = 00

• Personal ID• Cross-sectional = hh-id + personal number (max 2 digits)• Longitudinal = hh number + default split number (00) + personal number

In the longitudinal survey the Personal ID never changes, even if the person moves to a different household

in the cross-sectional survey, from year to year the Household ID and Personal ID may change

Organisation of the data Household ID – Personal ID

Page 28: Part II –  Introduction to  SILC Data  Structure and Documentation

29

The 4 key variables – illustration (longitudinal data)year country hh_id pers_id year of birth2010 A 40017100 4001710001 19372010 A 40017100 4001710002 19392011 A 40017100 4001710001 19372011 A 40017100 4001710002 19392009 B 40017100 4001710001 19532009 B 40017100 4001710002 19562009 B 40017100 4001710003 19822009 B 40017100 4001710004 19842009 B 40017100 4001710005 19852010 B 40017100 4001710001 19532010 B 40017100 4001710002 19562010 B 40017100 4001710003 19822010 B 40017100 4001710004 19842010 B 40017100 4001710005 19852010 B 40017101 4001710003 19822010 B 40017101 4001710004 19842011 B 40017100 4001710001 19532011 B 40017100 4001710002 19562011 B 40017100 4001710005 19852011 B 40017101 4001710002 19562011 B 40017101 4001710003 19822011 B 40017101 4001710004 1984

Page 29: Part II –  Introduction to  SILC Data  Structure and Documentation

30

Combining information from two separate files at a 1:1 level

Page 30: Part II –  Introduction to  SILC Data  Structure and Documentation

31

Combined data

Page 31: Part II –  Introduction to  SILC Data  Structure and Documentation

32

Combining information from two separate files at a 1:n level

Page 32: Part II –  Introduction to  SILC Data  Structure and Documentation

33

Combined data

Page 33: Part II –  Introduction to  SILC Data  Structure and Documentation

34

Create household level variables from personal level data, e.g.

• number of current household members• persons < 18 in household• age of the youngest child in household• Number of unemployed hh-members• Highest educational level in household • …

Use of separate sub datasets

Page 34: Part II –  Introduction to  SILC Data  Structure and Documentation

35

new hh-level variables added from hh-datayear country hh_id pers_id RX010 hhsize numchild ychild HX0802010 a 6800 680001 36 3 1 17 02010 a 6800 680002 35 3 1 17 02010 a 6800 680003 17 3 1 17 02011 a 6800 680001 36 3 0 . 02011 a 6800 680002 36 3 0 . 02011 a 6800 680003 18 3 0 . 02011 b 6800 680001 69 2 0 . 02011 b 6800 680002 73 2 0 . 02010 b 7000 700001 80 2 0 . 02010 b 7000 700002 80 2 0 . 02008 c 7000 700001 42 3 1 2 12008 c 7000 700002 34 3 1 2 02008 c 7000 700003 2 3 1 2 02009 c 7000 700001 43 3 1 3 02009 c 7000 700002 35 3 1 3 02009 c 7000 700003 3 3 1 3 02010 c 7000 700001 44 3 1 4 12010 c 7000 700002 36 3 1 4 12010 c 7000 700003 4 3 1 4 12011 c 7000 700001 45 4 2 0 12011 c 7000 700002 37 4 2 0 12011 c 7000 700003 5 4 2 0 12011 c 7000 700004 0 4 2 0 1

Create new household level summary variables from person level information, e.g. household size, number of children, age of youngest child (< 18 years)

Page 35: Part II –  Introduction to  SILC Data  Structure and Documentation

36

Some reading – Documents of priority

Page 36: Part II –  Introduction to  SILC Data  Structure and Documentation

37

Guidelines_Doc65_2011.pdf • General technical information on sample design, weights, etc.• List of all variables included in the original EU-SILC data base• Description of (cross-sectional and longitudinal) variables

DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc • List of variables removed or added to Userdata Base (UDB)• Methods of anonymisation

SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls

National and EU Quality reports • http://epp.eurostat.ec.europa.eu/portal/page/portal/

income_social_inclusion_living_conditions/quality

Some reading – Documents of priority

Page 37: Part II –  Introduction to  SILC Data  Structure and Documentation

38

Some reading – Documents of priorityGuidelines_Doc65_2011.pdf

Source: Guidelines_Doc65_2011.pdf

Page 38: Part II –  Introduction to  SILC Data  Structure and Documentation

39

Some reading – Documents of priority

Flag VariableHH020_F

Source: Guidelines_Doc65_2011.pdf

Page 39: Part II –  Introduction to  SILC Data  Structure and Documentation

40

Some reading – Documents of priority

Flag VariableHH021_F

Source: Guidelines_Doc65_2011.pdf

Page 40: Part II –  Introduction to  SILC Data  Structure and Documentation

41

Some reading – Documents of priorityCross-sectional data 2011

Source: UDB_c11H_ver 2011-2 from 01-08-13.dta

Page 41: Part II –  Introduction to  SILC Data  Structure and Documentation

42

Some reading – Documents of priorityLongitudinal data 2011

Source: UDB_l11H_ver 2011-1 from 01-08-2013.dta

New (HH021)

Old

(HH

020)

Page 42: Part II –  Introduction to  SILC Data  Structure and Documentation

43

Some reading – Documents of priorityExample: variable included in the cross-sectional and longitudinal data

Source: Guidelines_Doc65_2011.pdf

Page 43: Part II –  Introduction to  SILC Data  Structure and Documentation

44

Some reading – Documents of priorityExample: variable included in the cross-sectional only

Source: Guidelines_Doc65_2011.pdf

Page 44: Part II –  Introduction to  SILC Data  Structure and Documentation

45

Some reading – Documents of priorityExample: variable included in longitudinal data only

Source: Guidelines_Doc65_2011.pdf

Page 45: Part II –  Introduction to  SILC Data  Structure and Documentation

46

Some reading – Documents of priorityExample: selected respondent

Source: Guidelines_Doc65_2011.pdf

Page 46: Part II –  Introduction to  SILC Data  Structure and Documentation

47

Some reading – Documents of priorityDifferences between data collected and Userdata Base (cross-sectional file)

Page 47: Part II –  Introduction to  SILC Data  Structure and Documentation

48

Some reading – Documents of priorityDifferences between data collected and Userdata Base (longitudinal file)

Source: L2011 DIFFERENCES BETWEEN DATA COLLECTED AND UDB.doc

Page 48: Part II –  Introduction to  SILC Data  Structure and Documentation

49

Some reading – Documents of priorityDifferences between data collected and Userdata Base (cross-sectional file)

Page 49: Part II –  Introduction to  SILC Data  Structure and Documentation

50

Some reading – Documents of priorityDifferences between data collected and Userdata Base (longitudinal file)

Page 50: Part II –  Introduction to  SILC Data  Structure and Documentation

51

Some reading – Documents of prioritySILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls

Source: SILC L-2011 UDB PROBLEMS AND MODIFICATIONS.xls

Page 51: Part II –  Introduction to  SILC Data  Structure and Documentation

52

Some reading – Documents of priorityQuality reports

Page 52: Part II –  Introduction to  SILC Data  Structure and Documentation

53

Data Structure – Some readingNational quality reports

Page 53: Part II –  Introduction to  SILC Data  Structure and Documentation

54

Data Structure – Some readingE.G. Austria: Final Quality Report Relating to the EU-SILC Operation 2007-2010

Page 54: Part II –  Introduction to  SILC Data  Structure and Documentation

55Source: Austria, Final Quality Report Relating to the EU-SILC Operation 2007-2010, p. 7

Page 55: Part II –  Introduction to  SILC Data  Structure and Documentation

56

THANK YOU

[email protected]