Top Banner
Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07
49

Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Jan 19, 2016

Download

Documents

Jared Willis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Provision of access to data for secondary analysis

Louise Corti, Jo Wathan and Keith ColeEconomic and Social Data Service

E-society ProgrammeMarch 07

Page 2: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Overview of chapter

Why access secondary quantitative data? brief overview of the potential of secondary data

Finding, accessing and obtaining secondary data describes the ESDS distributed national on-line

data service designed

Case studies – the UK Economic and Social Data Service practical exemplars of how data can be re-used

Page 3: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Why access secondary quantitative data?

Quantitative methods have an important longstanding place in social research. Can identify:

typical characteristics and background description the amount of variation within a population of interest differences between groups how possible explanatory factors can account for

differences predictions and forecasts

Kinds of data:

Micro data resemble the sort of data obtained from a survey Longitudinal data follow the same individuals (or other

study unit) over time Macro or aggregate data contain records for much larger

units e.g countries or regions

Page 4: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Secondary analysis

reduces respondent burden

enables data linkage and the creation of new datasets

informs policy disputes about the interpretation of analyses

provides transparency within research

enables methodologists to learn from each other

allows students to engage with ‘real’ data, to obtain results which relate to the real world and to tackle real problems of data management (substantive social science and research methods teaching)

Page 5: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Data expensive

Collecting good quality, reliable, representative data is expensive and technically demanding

In 2001/2 the British General Household Survey (GHS) sample included all individuals in 8,989 households and cost £1.43 million

In 2001, the American Community Survey collected data from nearly 400,000 interviews in the year at an estimated to cost $131 million

Page 6: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Data historical - enabling trend analysis

In the UK the General Household Surveys (GHS) and Labour Force Surveys (LFS) date back to 1971 and 1973

In the United States, the General Social Survey series dates back to 1972 and Current Population Survey data dating back to 1964 (ICPSR)

Longitudinal studies US Panel Study of Income Dynamics, started in 1968 German Socioeconomic Panel in 1984 British Household Panel Study in 1991

Page 7: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Finding, accessing and obtaining secondary data

The development of secondary analysis has depended on the development and growth of social science data archives:

Inter-University Consortium in Political and Social Research (ICSPR) the UK Data Archive (UKDA) Zentralarchiv für Empirische Sozialforschung (ZA) Norwegian Social Science Data Services (NSD)

Now networked:

Council of European Social Service Data Archives (CESSDA) International Federation of Data Organisations (IFDO)

Page 8: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Changing provision early data archives predated e-social science, and

the internet as we know it….by decades

the gradual development of online data archives and dissemination services has varied across the world

the more mature archives have reached the point at which most users will interact with the data service wholly through the internet

Internet delivery has broadened the potential role of data services

Page 9: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Functions of the modern archive’s role

acquire - nurture, cajol, plead, evaluate

prepare, document and enhance data – check and add context

store data safely for ever – back up, store and migrate

distribute data - download, explore online

provide support for their use - promote, write, teach

improve resource discovery and data access - R&D

Page 10: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Acquisition and checking

data archives typically select and evaluate potential data collections against criteria designed to ensure that they are appropriate for re-use

assessed for their: research value, quality, degree of fit to meet existing

collection

data are checked and validated by the receiving archive by: examining the data values or text – validation and consistency

checking ensuring that, where requested, the data are anonymous

(where required) checking for Intellectual property and commericial ownership

rights in the data

Page 11: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Documentation and metadata

Documentation which enables users to understand the origins of the data and to correctly interpret outputs

user guides created - how the data were collected questionnairess, questionnaires, code books,

interviewer instructions, technical reports, original and subsequent publications and outputs

catalogue record, and full variable and value labels (standard used - DDI)

a few archives work closely with data creators in the early stages to ensure that good data management practices are adhered to

Page 12: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Online dissemination

first steps towards online data archiving and dissemination came with the development of archive websites increasingly sophisticated data catalogues

nowdays, searchable online data catalogues enables users to search and browse collections

and view documentation freely online

online registration – account management, data download

access data via a web browser

Page 13: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

New generation data services

online data exploration with tools Survey Documentation and Analysis (SDA), Nesstar,

Beyond 20:20, interactive (GIS) mapping tools

increasingly necessary to link to data sites, offsite support and related datasets as the complexity of the data infrastructure increases data services may be distributed services data need not be co-located social science increasingly looking to the potential of grid

technologies

Page 14: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Economic and Social Data Service (ESDS)

new generation distributed data service that provides a seamless integrated service

offers enhanced support for the secondary use of key economic and social data across the research, learning and teaching communities

value-added service goes far beyond the original role of traditional data archives as data storage and dissemination houses

brings together centres of expertise in data creation, dissemination, preservation and use

Page 15: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

UK Data archiving history Data Archive established in 1968 (as ‘Data Bank’)

funded by (then) SSRC to provide a service to UK HE sector

initial focus on academic surveys then government survey data

new distributed service established 1 January 2003 as the ESDS

core arching service plus four value added specialist services

Page 16: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Types of data

ESDS acquires mixed data types and formats

social surveys aggregate data administrative data textual data images audio visual data

UKDA hosts specialist Qualidata unit, Census unit, and History Data Service

since 2005 designated as ‘Place of Deposit’ by The National Archives (TNA)

New data types: Online surveys, interviews and focus groups social transaction data Linked admin data blogs and so on

Page 17: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Who produces the social science data held by ESDS?

government agencies increasing tendency for government agencies to

contract out survey work to private sector (NatCen) academic sector private sector local Government Research Council funded

ESRC, MRC, NERC, AHRB, Wellcome, Leverhulme increasing number of large digitisation projects

JISC, NOF access to international data via links with other data

archives worldwide IGOs

Page 18: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Core Service

run by UKDA

acquiring, processing, preserving and disseminating

data

data creation and deposit support

central registration service operating across the ESDS

central 'first stop' help desk service

front line user support

cataloguing and describing data

maintaining and developing web presence

publicity and training

Page 19: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Specialist data services

• ESDS Government • ESDS International• ESDS Longitudinal • ESDS Qualidata

Greater emphasis on:• value-added data and documentation• enhanced resource discovery• improved delivery services• support and training for the secondary use of

data for research, learning and teaching • outreach and promotion

Page 20: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Facts and figures: UKDA

4,000+ datasets in the collection

350+ new datasets and editions added

each year

30,000+ registered users

15,000+ datasets distributed worldwide

p.a.

100,000+ online sessions p.a.

15,000,000+ web hits p.a.

Page 21: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Data In Data acquisition

offers and proactive scoping of data formal data evaluation via committee

Data ingest checking, verifying converting, formatting, processing documenting and contextualising

Data preservation long-term data management Preservation Policy

Page 22: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Online exploration

Online data browsing, including

simple data analysis, visualisation, downloading and subsetting via Nesstar

ESDS Government Vital Statistics online International macro data via Beyond 20/20 and

visualisation interface ESDS Qualidata Online – interview transcripts Census data services

Page 23: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

1: Using Government microdata to explore health

UK is fortunate in its wealth of available major cross-sectional surveys

government surveys rich resources: large micro data files with a large number of detailed

variables series of repeated cross sections which enable

comparisons over time nationally representative United Kingdom or

constituent countries sample survey data, which may involve a degree of

complexity - structure ((hierarchical) and sampling strategy

data holdings and documentation are extensive

Page 24: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

1: Government data General Household Survey/Continuous Household Survey (NI) Labour Force Survey/NI LFS Health Survey for England/Wales/Scotland Family Expenditure Survey/NI FES British/Scottish Crime Survey Family Resources Survey National Food Survey/Expenditure and Food Survey ONS Omnibus Survey Survey of English Housing British Social Attitudes/Scottish Social Attitudes/Young People’s

Social Attitudes/NI Life & Times National Travel Survey Time Use Survey Vital Statistics for England and Wales

Page 25: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

1: Investigating smoking ESDS high web presence Google search ESDS pages ESDS catalogue – advanced searching on key words –

study and variable level information browse by subject major studies lists Government series pages theme guides publications database software and analysis

guides

Page 26: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

1: Accessing Data register with ESDS, using the online authentication system

ATHENS (currently moving towards a new system Shibboleth which provides a greater degree of differentiation in user types)

ESDS Users must specify the purpose for which they will use each data set

registered users can choose to download the whole file (typically SPSS, Stata and tab delimited) or undertake further analyses, including graphing, within Nesstar

more stringent conditions apply to more sensitive data such as detailed microdata with detailed geography (Special Licence)

Page 27: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

1: Online exploration Nesstar system - allows unregistered users to view

metadata and univariate distributions online

based on the DDI standard to describe data

permits users to specify subsets and download in a wide range of formats

ability to quickly browse data useful where particular subsets of cases in the data are of interest

GHS to undertake an analysis of people who would like to give up smoking - need to know whether there were a sufficiently large number of people in the dataset who smoke but would like to give up

Page 28: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.
Page 29: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

1: What can a user do with the data?

multivariate analysis that look within households and analyses that look at change over time

look at relationships between multiple individual characteristics

depth of many questionnaires, allows users to explore the validity of existing means of operationalising concepts, or to use new ones

Page 30: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

2: Analysing longitudinal health data true cohort analysis requires information about the same

individuals over time

explore the chronological ordering of behaviours or characteristics

ESDS Longitudinal specializes in supporting five major UK-based longitudinal data sets: British Household Panel Survey (BHPS) 1970 British Cohort Study (BCS70) National Child Development Study (NCDS) Millennium Cohort Study (MCS) English Longitudinal Study of Ageing (ELSA)

BHPS is a household hierarchical dataset - interviews all members of the households of panel members. Can explore household factors

Page 31: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

3: Providing a common user interface to international macro data to support comparative

research

researchers now require access to the key international evidence bases in order to contribute and comment on trans-national policy responses to global issues

ESDS International was established to address these needs through the provision of free web-based access to a portfolio of authoritative, high quality international databanks

high quality, regularly updated time series databanks - contain huge range of macro-economic and social indicators aggregated to national or regional level worldwide

Page 32: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

datasets supported produced by a number of key International Governmental Organisations (IGOs) such as the International Monetary Fund, the United Nations, the World Bank, the Organisation for Economic Cooperation and Development and the International Energy Agency

access via a common user interface to all the international aggregate datasets which makes it easy for users to obtain access to data

beyond 20/20 Web Data Server (WDS) to display, subset, visualize, chart and download data

Iraqi exports to the rest of the world 1980-2005 (Source International Monetary Fund (IMF), Direction of Trade Statistics (DOTS) July 2006)

Page 33: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

CommonGIS used to build a web-based data exploration interface to geographically referenced international data

CommonGIS provides standard GIS functionality and can be used as a tool for visualisation and exploratory analysis based on geographically referenced statistical data

CommonGIS visualization shows the relationship between birth and death rates in European countries in 2005 to CIA Word Factbook

the cross classification map shows those countries, such as Moldova, which have high birth and death rates

Page 34: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

4: Grid-enabling quantitative datasets to support more complex forms of analysis

Data Grids facilitate unimpeded and integrated use of distributed, heterogeneous, autonomous data resources

grid enabling a dataset creates new opportunities for its use: enables users to integrate it with other datasets makes it possible to analyse the dataset using techniques

that require the kind of computational power that it is only feasible using the Grid (e.g. more complex models, more data points).

standardisation of procedures and mechanisms used to access and update the dataset, increase its shareability

automated analyses (i.e. analyses can be re-run automatically when databases are updated)

Page 35: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

4: ConvertGrid – Key Objectives

a practical demonstration of how the Grid can be used to facilitate data integration and overcome a major barrier to research use of multiple datasets

demonstrates how to build a social science Data Grid by grid enabling a number of key geo-referenced socio-economic data sources

uses Grid technologies to extend the functionality of an existing web based data service (i.e. Convert) to exploit the existence of a Data Grid

demonstrates how Grid technologies can automate complex workflows and enhance the capacity to address substantive social science research questions;

builds a user interface to a Grid based service which is suitable for student/teaching use

Page 36: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

4: ConvertGrid – The Research Context

many research questions require the combination of a data from multiple geo-referenced datasets E.g. Linking post coded data to census geography

conversion of data relating to different geographies to a common target geography is complex time consuming task requires a range of data handling/processing skills

the data conversion process will require users to perform the following generic tasks: extract and download data in different formats from a number of

databases using different interfaces convert each dataset to the desired target geography using

geographical conversion tables combine the converted sets into a single dataset for analysis

these generic tasks can be automated!

Page 37: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

4: ConvertGrid – A Worked Example

what factors explain spatial variations in participation rates in higher education

study target geography –1991 Census Ward

data required: 1991 Census

total persons aged 16-17 & 18-19 (1991 Census Ward)

Neighbourhood Statistics number of applicants aged under 20 entering

university (1998 Electoral Ward) Experian

average house price sales Quarter 2 2000 to Quarter 1 2001 (1999 Postcode Sectors)

Page 38: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

4: ConvertGrid – Data Visualisation Interface

relationship between average house price sales (Experian) and percentage of 16-19 year olds entering university (Neighbourhood Statistics & Census aggregate statistics)

High average house price sales but low participation rates

Low average house price sales but high participation rates

Ten minutes from start to finish

Page 39: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

5: Mixed Methods Data

there is an increasing interest in and recognition of the value of re-using qualitative data

in the past few years there has been a significant move to utilise mixed methods strategies in research

ESDS has seen the deposit of multiple methods datasets combining quantitative and qualitative datasets

processed and supported by dedicated unit - ESDS Qualidata

Page 40: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

5: ESDS Qualidata range of qualitative datasets, hosted by the UK

Data Archive

data from National Research Council (ESRC) individual and programme research grant awards (Data Policy)

data from ‘classic’ social science studies

other funders/sources

focus on DIGITAL Collections, but also facilitate paper-based archiving

Page 41: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

5: Types of qualitative data

diverse data types: in-depth interviews ; semi-structured interviews; focus groups; oral histories; mixed methods data; open-ended survey questions; case notes/records of meetings; diaries/ research diaries

multimedia: audio, video, photos and text (most common is interview transcriptions)

formats: digital, paper, analogue audio-visual

data structures - differ across different ‘document types

Page 42: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

5: Classic study datasets

Townsend – Poverty, old age and Katherine Buildings

Thompson – oral history and Edwardians

Goldthorpe et al - The Affluent Worker

Jackson and Marsden – Education and the Working class

National Social Policy and Social Change Archive

Page 43: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

5: Online access to data

Page 44: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

5: schoolchildren’s attitudes towards risk-taking and health

typical example of a mixed methods study might be undertaking a sample survey and conducting ethnographic fieldwork (eg observation and in-depth interviews) based on the survey sample or on other cases

Incidents and the Health-related Behaviour of Schoolchildren, 1997, M. Denscombe

Studying critical incidents’ in the life of young people which act as crucial flashpoints in the generation of attitudes towards health-related behaviour

Page 45: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

5: schoolchildren’s attitudes towards risk-taking and health

the project used a mixture of quantitative and qualitative methodology survey of 1648 children eleven transcripts of focus group interviews eight transcripts of interviews - two students together

Denscombe in-depth interviews also cover a lot of detail about the role and pressure of exams at the age of 15/16, and future life ambitions

Page 46: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Secondary use?

qualitative aspect can offer a more detailed explanation of a quantitative analysis and possibly enable a more complex model to be built

sequencing of data collection methods or the selection of cases needs to be carefully considered in re-use

in larger data collections, the data types may have been collected by different teams with differing methodological agendas - researchers tend to prioritise one method because of familiarity with the data type and analytic methods

possibility that each method could show conflicting findings - re-users should be aware how they report findings and be reflexive about how the secondary data were selected, confronted and analysed

Page 47: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Collaboration - UK

Government agencies – work closely Research Councils on formal data sharing policies Research Centres and Programmes collecting data Other funding agencies e.g JISC on technical issues

authentication, digitisation, T&L resources TNA on records management and preservation

practise E-science on grid enabled data issues, ontologies Research Methods centres on data quality and

secondary analysis

Page 48: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Conclusion secondary analysis permits a range of valuable analyses to

be undertaken quickly, effectively, transparently and with minimal respondent burden

digital formats have enable users to easily consult full documentation, explore and analyse data online

and to make linkages between appropriate resources in a context of an increasingly complex data infrastructure

data access services themselves may be virtual centres, distributed across multiple sites

anticipate that grid developments will provide increased scope for harmonising access to different data types

Page 49: Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.

Contact

www.esds.ac.uk

[email protected]

[email protected] 872145