Provision of access to data for secondary analysis Louise Corti, Jo Wathan and Keith Cole Economic and Social Data Service E-society Programme March 07.
Post on 19-Jan-2016
212 Views
Preview:
Transcript
Provision of access to data for secondary analysis
Louise Corti, Jo Wathan and Keith ColeEconomic and Social Data Service
E-society ProgrammeMarch 07
Overview of chapter
Why access secondary quantitative data? brief overview of the potential of secondary data
Finding, accessing and obtaining secondary data describes the ESDS distributed national on-line
data service designed
Case studies – the UK Economic and Social Data Service practical exemplars of how data can be re-used
Why access secondary quantitative data?
Quantitative methods have an important longstanding place in social research. Can identify:
typical characteristics and background description the amount of variation within a population of interest differences between groups how possible explanatory factors can account for
differences predictions and forecasts
Kinds of data:
Micro data resemble the sort of data obtained from a survey Longitudinal data follow the same individuals (or other
study unit) over time Macro or aggregate data contain records for much larger
units e.g countries or regions
Secondary analysis
reduces respondent burden
enables data linkage and the creation of new datasets
informs policy disputes about the interpretation of analyses
provides transparency within research
enables methodologists to learn from each other
allows students to engage with ‘real’ data, to obtain results which relate to the real world and to tackle real problems of data management (substantive social science and research methods teaching)
Data expensive
Collecting good quality, reliable, representative data is expensive and technically demanding
In 2001/2 the British General Household Survey (GHS) sample included all individuals in 8,989 households and cost £1.43 million
In 2001, the American Community Survey collected data from nearly 400,000 interviews in the year at an estimated to cost $131 million
Data historical - enabling trend analysis
In the UK the General Household Surveys (GHS) and Labour Force Surveys (LFS) date back to 1971 and 1973
In the United States, the General Social Survey series dates back to 1972 and Current Population Survey data dating back to 1964 (ICPSR)
Longitudinal studies US Panel Study of Income Dynamics, started in 1968 German Socioeconomic Panel in 1984 British Household Panel Study in 1991
Finding, accessing and obtaining secondary data
The development of secondary analysis has depended on the development and growth of social science data archives:
Inter-University Consortium in Political and Social Research (ICSPR) the UK Data Archive (UKDA) Zentralarchiv für Empirische Sozialforschung (ZA) Norwegian Social Science Data Services (NSD)
Now networked:
Council of European Social Service Data Archives (CESSDA) International Federation of Data Organisations (IFDO)
Changing provision early data archives predated e-social science, and
the internet as we know it….by decades
the gradual development of online data archives and dissemination services has varied across the world
the more mature archives have reached the point at which most users will interact with the data service wholly through the internet
Internet delivery has broadened the potential role of data services
Functions of the modern archive’s role
acquire - nurture, cajol, plead, evaluate
prepare, document and enhance data – check and add context
store data safely for ever – back up, store and migrate
distribute data - download, explore online
provide support for their use - promote, write, teach
improve resource discovery and data access - R&D
Acquisition and checking
data archives typically select and evaluate potential data collections against criteria designed to ensure that they are appropriate for re-use
assessed for their: research value, quality, degree of fit to meet existing
collection
data are checked and validated by the receiving archive by: examining the data values or text – validation and consistency
checking ensuring that, where requested, the data are anonymous
(where required) checking for Intellectual property and commericial ownership
rights in the data
Documentation and metadata
Documentation which enables users to understand the origins of the data and to correctly interpret outputs
user guides created - how the data were collected questionnairess, questionnaires, code books,
interviewer instructions, technical reports, original and subsequent publications and outputs
catalogue record, and full variable and value labels (standard used - DDI)
a few archives work closely with data creators in the early stages to ensure that good data management practices are adhered to
Online dissemination
first steps towards online data archiving and dissemination came with the development of archive websites increasingly sophisticated data catalogues
nowdays, searchable online data catalogues enables users to search and browse collections
and view documentation freely online
online registration – account management, data download
access data via a web browser
New generation data services
online data exploration with tools Survey Documentation and Analysis (SDA), Nesstar,
Beyond 20:20, interactive (GIS) mapping tools
increasingly necessary to link to data sites, offsite support and related datasets as the complexity of the data infrastructure increases data services may be distributed services data need not be co-located social science increasingly looking to the potential of grid
technologies
Economic and Social Data Service (ESDS)
new generation distributed data service that provides a seamless integrated service
offers enhanced support for the secondary use of key economic and social data across the research, learning and teaching communities
value-added service goes far beyond the original role of traditional data archives as data storage and dissemination houses
brings together centres of expertise in data creation, dissemination, preservation and use
UK Data archiving history Data Archive established in 1968 (as ‘Data Bank’)
funded by (then) SSRC to provide a service to UK HE sector
initial focus on academic surveys then government survey data
new distributed service established 1 January 2003 as the ESDS
core arching service plus four value added specialist services
Types of data
ESDS acquires mixed data types and formats
social surveys aggregate data administrative data textual data images audio visual data
UKDA hosts specialist Qualidata unit, Census unit, and History Data Service
since 2005 designated as ‘Place of Deposit’ by The National Archives (TNA)
New data types: Online surveys, interviews and focus groups social transaction data Linked admin data blogs and so on
Who produces the social science data held by ESDS?
government agencies increasing tendency for government agencies to
contract out survey work to private sector (NatCen) academic sector private sector local Government Research Council funded
ESRC, MRC, NERC, AHRB, Wellcome, Leverhulme increasing number of large digitisation projects
JISC, NOF access to international data via links with other data
archives worldwide IGOs
Core Service
run by UKDA
acquiring, processing, preserving and disseminating
data
data creation and deposit support
central registration service operating across the ESDS
central 'first stop' help desk service
front line user support
cataloguing and describing data
maintaining and developing web presence
publicity and training
Specialist data services
• ESDS Government • ESDS International• ESDS Longitudinal • ESDS Qualidata
Greater emphasis on:• value-added data and documentation• enhanced resource discovery• improved delivery services• support and training for the secondary use of
data for research, learning and teaching • outreach and promotion
Facts and figures: UKDA
4,000+ datasets in the collection
350+ new datasets and editions added
each year
30,000+ registered users
15,000+ datasets distributed worldwide
p.a.
100,000+ online sessions p.a.
15,000,000+ web hits p.a.
Data In Data acquisition
offers and proactive scoping of data formal data evaluation via committee
Data ingest checking, verifying converting, formatting, processing documenting and contextualising
Data preservation long-term data management Preservation Policy
Online exploration
Online data browsing, including
simple data analysis, visualisation, downloading and subsetting via Nesstar
ESDS Government Vital Statistics online International macro data via Beyond 20/20 and
visualisation interface ESDS Qualidata Online – interview transcripts Census data services
1: Using Government microdata to explore health
UK is fortunate in its wealth of available major cross-sectional surveys
government surveys rich resources: large micro data files with a large number of detailed
variables series of repeated cross sections which enable
comparisons over time nationally representative United Kingdom or
constituent countries sample survey data, which may involve a degree of
complexity - structure ((hierarchical) and sampling strategy
data holdings and documentation are extensive
1: Government data General Household Survey/Continuous Household Survey (NI) Labour Force Survey/NI LFS Health Survey for England/Wales/Scotland Family Expenditure Survey/NI FES British/Scottish Crime Survey Family Resources Survey National Food Survey/Expenditure and Food Survey ONS Omnibus Survey Survey of English Housing British Social Attitudes/Scottish Social Attitudes/Young People’s
Social Attitudes/NI Life & Times National Travel Survey Time Use Survey Vital Statistics for England and Wales
1: Investigating smoking ESDS high web presence Google search ESDS pages ESDS catalogue – advanced searching on key words –
study and variable level information browse by subject major studies lists Government series pages theme guides publications database software and analysis
guides
1: Accessing Data register with ESDS, using the online authentication system
ATHENS (currently moving towards a new system Shibboleth which provides a greater degree of differentiation in user types)
ESDS Users must specify the purpose for which they will use each data set
registered users can choose to download the whole file (typically SPSS, Stata and tab delimited) or undertake further analyses, including graphing, within Nesstar
more stringent conditions apply to more sensitive data such as detailed microdata with detailed geography (Special Licence)
1: Online exploration Nesstar system - allows unregistered users to view
metadata and univariate distributions online
based on the DDI standard to describe data
permits users to specify subsets and download in a wide range of formats
ability to quickly browse data useful where particular subsets of cases in the data are of interest
GHS to undertake an analysis of people who would like to give up smoking - need to know whether there were a sufficiently large number of people in the dataset who smoke but would like to give up
1: What can a user do with the data?
multivariate analysis that look within households and analyses that look at change over time
look at relationships between multiple individual characteristics
depth of many questionnaires, allows users to explore the validity of existing means of operationalising concepts, or to use new ones
2: Analysing longitudinal health data true cohort analysis requires information about the same
individuals over time
explore the chronological ordering of behaviours or characteristics
ESDS Longitudinal specializes in supporting five major UK-based longitudinal data sets: British Household Panel Survey (BHPS) 1970 British Cohort Study (BCS70) National Child Development Study (NCDS) Millennium Cohort Study (MCS) English Longitudinal Study of Ageing (ELSA)
BHPS is a household hierarchical dataset - interviews all members of the households of panel members. Can explore household factors
3: Providing a common user interface to international macro data to support comparative
research
researchers now require access to the key international evidence bases in order to contribute and comment on trans-national policy responses to global issues
ESDS International was established to address these needs through the provision of free web-based access to a portfolio of authoritative, high quality international databanks
high quality, regularly updated time series databanks - contain huge range of macro-economic and social indicators aggregated to national or regional level worldwide
datasets supported produced by a number of key International Governmental Organisations (IGOs) such as the International Monetary Fund, the United Nations, the World Bank, the Organisation for Economic Cooperation and Development and the International Energy Agency
access via a common user interface to all the international aggregate datasets which makes it easy for users to obtain access to data
beyond 20/20 Web Data Server (WDS) to display, subset, visualize, chart and download data
Iraqi exports to the rest of the world 1980-2005 (Source International Monetary Fund (IMF), Direction of Trade Statistics (DOTS) July 2006)
CommonGIS used to build a web-based data exploration interface to geographically referenced international data
CommonGIS provides standard GIS functionality and can be used as a tool for visualisation and exploratory analysis based on geographically referenced statistical data
CommonGIS visualization shows the relationship between birth and death rates in European countries in 2005 to CIA Word Factbook
the cross classification map shows those countries, such as Moldova, which have high birth and death rates
4: Grid-enabling quantitative datasets to support more complex forms of analysis
Data Grids facilitate unimpeded and integrated use of distributed, heterogeneous, autonomous data resources
grid enabling a dataset creates new opportunities for its use: enables users to integrate it with other datasets makes it possible to analyse the dataset using techniques
that require the kind of computational power that it is only feasible using the Grid (e.g. more complex models, more data points).
standardisation of procedures and mechanisms used to access and update the dataset, increase its shareability
automated analyses (i.e. analyses can be re-run automatically when databases are updated)
4: ConvertGrid – Key Objectives
a practical demonstration of how the Grid can be used to facilitate data integration and overcome a major barrier to research use of multiple datasets
demonstrates how to build a social science Data Grid by grid enabling a number of key geo-referenced socio-economic data sources
uses Grid technologies to extend the functionality of an existing web based data service (i.e. Convert) to exploit the existence of a Data Grid
demonstrates how Grid technologies can automate complex workflows and enhance the capacity to address substantive social science research questions;
builds a user interface to a Grid based service which is suitable for student/teaching use
4: ConvertGrid – The Research Context
many research questions require the combination of a data from multiple geo-referenced datasets E.g. Linking post coded data to census geography
conversion of data relating to different geographies to a common target geography is complex time consuming task requires a range of data handling/processing skills
the data conversion process will require users to perform the following generic tasks: extract and download data in different formats from a number of
databases using different interfaces convert each dataset to the desired target geography using
geographical conversion tables combine the converted sets into a single dataset for analysis
these generic tasks can be automated!
4: ConvertGrid – A Worked Example
what factors explain spatial variations in participation rates in higher education
study target geography –1991 Census Ward
data required: 1991 Census
total persons aged 16-17 & 18-19 (1991 Census Ward)
Neighbourhood Statistics number of applicants aged under 20 entering
university (1998 Electoral Ward) Experian
average house price sales Quarter 2 2000 to Quarter 1 2001 (1999 Postcode Sectors)
4: ConvertGrid – Data Visualisation Interface
relationship between average house price sales (Experian) and percentage of 16-19 year olds entering university (Neighbourhood Statistics & Census aggregate statistics)
High average house price sales but low participation rates
Low average house price sales but high participation rates
Ten minutes from start to finish
5: Mixed Methods Data
there is an increasing interest in and recognition of the value of re-using qualitative data
in the past few years there has been a significant move to utilise mixed methods strategies in research
ESDS has seen the deposit of multiple methods datasets combining quantitative and qualitative datasets
processed and supported by dedicated unit - ESDS Qualidata
5: ESDS Qualidata range of qualitative datasets, hosted by the UK
Data Archive
data from National Research Council (ESRC) individual and programme research grant awards (Data Policy)
data from ‘classic’ social science studies
other funders/sources
focus on DIGITAL Collections, but also facilitate paper-based archiving
5: Types of qualitative data
diverse data types: in-depth interviews ; semi-structured interviews; focus groups; oral histories; mixed methods data; open-ended survey questions; case notes/records of meetings; diaries/ research diaries
multimedia: audio, video, photos and text (most common is interview transcriptions)
formats: digital, paper, analogue audio-visual
data structures - differ across different ‘document types
5: Classic study datasets
Townsend – Poverty, old age and Katherine Buildings
Thompson – oral history and Edwardians
Goldthorpe et al - The Affluent Worker
Jackson and Marsden – Education and the Working class
National Social Policy and Social Change Archive
5: Online access to data
5: schoolchildren’s attitudes towards risk-taking and health
typical example of a mixed methods study might be undertaking a sample survey and conducting ethnographic fieldwork (eg observation and in-depth interviews) based on the survey sample or on other cases
Incidents and the Health-related Behaviour of Schoolchildren, 1997, M. Denscombe
Studying critical incidents’ in the life of young people which act as crucial flashpoints in the generation of attitudes towards health-related behaviour
5: schoolchildren’s attitudes towards risk-taking and health
the project used a mixture of quantitative and qualitative methodology survey of 1648 children eleven transcripts of focus group interviews eight transcripts of interviews - two students together
Denscombe in-depth interviews also cover a lot of detail about the role and pressure of exams at the age of 15/16, and future life ambitions
Secondary use?
qualitative aspect can offer a more detailed explanation of a quantitative analysis and possibly enable a more complex model to be built
sequencing of data collection methods or the selection of cases needs to be carefully considered in re-use
in larger data collections, the data types may have been collected by different teams with differing methodological agendas - researchers tend to prioritise one method because of familiarity with the data type and analytic methods
possibility that each method could show conflicting findings - re-users should be aware how they report findings and be reflexive about how the secondary data were selected, confronted and analysed
Collaboration - UK
Government agencies – work closely Research Councils on formal data sharing policies Research Centres and Programmes collecting data Other funding agencies e.g JISC on technical issues
authentication, digitisation, T&L resources TNA on records management and preservation
practise E-science on grid enabled data issues, ontologies Research Methods centres on data quality and
secondary analysis
Conclusion secondary analysis permits a range of valuable analyses to
be undertaken quickly, effectively, transparently and with minimal respondent burden
digital formats have enable users to easily consult full documentation, explore and analyse data online
and to make linkages between appropriate resources in a context of an increasingly complex data infrastructure
data access services themselves may be virtual centres, distributed across multiple sites
anticipate that grid developments will provide increased scope for harmonising access to different data types
Contact
www.esds.ac.uk
help@esds.ac.uk
corti@essex.ac.uk01206 872145
top related