Top Banner
Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall
42

Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Mar 29, 2015

Download

Documents

Shyann Fell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Guidelines for data preparation

Social Science Data Archives: creating, depositing and using data

Swansea23 March 2005John Southall

Page 2: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

ESRC Datasets Policy –what is expected of award holders?

• to preserve and share data from ESRC-funded research

• funding allowed to prepare data for archiving

• all award holders must offer data for deposit to the ESDS within 3 months of the end of the award

• any potential problems should be notified to the ESDS at the earliest opportunity

• final payment will be withheld if dataset has not been offered to ESDS within 3 months of the end of the award, except where a waiver has been agreed in advance

• ESRC Datasets Policywww.esrc.ac.uk/esrccontent/researchfunding/sec17.asp

Page 3: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Longer-term data sharing

• data centres/archives make (selected) data created available to other bona fide researchers

• safeguards to protect the interests of the original collector, who may retain Intellectual Property Rights

• preserve data using up-to-date curation systems and keep apace with technology and data trends

Page 4: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Characteristics of a ‘good’ archived research collection

• high quality research design and data management

• accurate data, well organised and labelled files

• appropriate measurement of key concepts

• high quality supporting documentation created– major stages of research recorded – research/measurement instruments documented

• data that can be stored in user-friendly ‘dissemination’ formats, but can also be archived in a future-proof ‘preservation’ format

• consent, confidentiality and IPR resolved before fieldwork begins

Page 5: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Intellectual content

Examples of data acquired by ESDS:

• builds on previous research and methods

• repeated measures, time series

• comparative potential

• addresses new issues

• tried and trusted, or harmonised measures/scales used

• innovative approach to discipline and methodology

Page 6: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Extensive raw data

• types of research data assembled

– survey data – in-depth interviews– focus groups– field notes/participant observation– case study notes

• images and audio-visual materials (supports textual transcripts)

• range of material – broad focus

Page 7: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Supporting documentation• to produce catalogue record and user guide

– funding application– questionnaire/interview schedules– description of methodology (details of sample design, response

rate, etc)– ‘codebook’ (variable names, variable descriptions, code names

and variable formatting information)– technical report describing the research project– communication with informants on confidentiality– coding schemes / themes– end of award report– software description/versions used– bibliographies, resulting publications– code used to create derived variables or check data (e.g. SPSS,

STATA or SAS “command files”).

• anything that adds insight or aids understanding and secondary usage

Page 8: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Standardised description

(metadata) fields taken from DDI

specification for social science datasets

Catalogue record

Page 9: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Online User Guide

Page 10: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Survey data

Page 11: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Labelling of survey data I• all variables should be named

• variable names should not exceed 8 characters where possible, as the most common format for disseminating data is SPSS

• all variables should be labelled• labels should be brief (preferably < 80 characters), but

precise and always make explicit the unit of measurement for continuous (interval) variables

• where possible, all variable labels should reference the question number (and if necessary questionnaire), for example, the variable q11bhexc might have the label “q11b: hours spent taking physical exercise in a typical week”

• this gives the unit of measurement and a reference to the question number (q11b), so the user can quickly and easily cross-reference to it

Page 12: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Survey data - variables

Page 13: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Labelling of survey data II

• for categorical variables, all codes (values) should be given a brief label (preferably < 60 characters)

• for example, p1sex (gender of person 1) might have these value labels: • 1 = male• 2 = female• -8 = don’t know, -9 = not answered

• where possible, all such labelling should be created and supplied to the UKDA as part of the data file itself. This is expected with data supplied in one of the three major statistical packages - SPSS, STATA or SAS.

Page 14: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Accuracy of data: validation checkslarger computer-aided surveys (CAPI, CATI or CAWI)

• these are the most accurate way of gathering survey data, but the software (e.g. Blaise) and hardware (e.g. a laptop for every interviewer) may be beyond project resources 

• computer-aided surveys allow one to build in as many logical checks - on question routing and responses - as is possible at the point of data creation

non computer-aided surveys

• less control over initial responses, but checks can performed:

– at the point of data entry/transcription if ‘data entry’ software is used. However, there are few cheap data entry packages around

– the only feasible option may be to enter data without checks directly into a spreadsheet style interface (e.g. Excel worksheet, SPSS data view), and perform validation checks afterwards - via command files in statistical packages or Visual Basic code in MS Excel or Access

Page 15: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

An example of data seemingly untouched by the human eye:

Originating error in text variables:

Occupation Description of occupation‘sole trader’ ‘purveyor of seafood’

Propagated error in derived numeric variables:

• Respondent was coded under the standard occupational (SIC)code relating to food retailers:

5.2 Retail sale of food, beverages and tobacco in specialised stores

Page 16: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Online access to data

Nesstar:

• browse detailed information (metadata) about these data sources, including links to other sources

• do simple data analysis and visualisation on microdata

• bookmark analyses

• download the appropriate subset of data in one of a number of formats (e.g. SPSS, Excel)

• data ,must be ‘perfect’ - 100% labelled

Page 17: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Identifiers

'Direct' and 'indirect' identifiers may threaten confidentiality

• Direct identifiers may have been collected as part of the survey administration process and include names, addresses including postcode information, telephone number etc.

• Indirect identifiers are variables which include information that when linked with other publicly available sources, could result in a breach of confidentiality. This could include geographical information, workplace/organisation, education institution or occupation

Page 18: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Quantitative data• remove the identifier from the dataset

• aggregate/reduce the precision of a variable – record the year of birth rather than the day, month and year;

record postcode sectors (first 3 or 4 digits) rather than full postcode

• bracket a coded (categorical) variable – aggregated SOC up to 'minor group' codes by removing the

terminal digit

• generalise the meaning of a nominal (string) variable

• restrict the upper or lower ranges of a continuous variable

Page 19: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Derived and aggregated products

• permission to share and IPR is main issue

• range of potential parties with interest:– Owners, funders, data gatherers,

employers other stakeholders, etc.

• all original source information must be recorded

Page 20: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Qualitative data

Page 21: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Transcribing research

• integrated into the ongoing research – budget accordingly

• full transcriptions or summaries

• costs and benefits– self transcription– internal team transcription– external transcription

• full transcriptions– consistent layout– speaker tags– line breaks– header with identifier/other details – checked for errors

Page 22: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Identifiers removed• scheme devised – different for each dataset

• ideally should reflect any pseudonyms used in publications

• confidentiality respected

• anonymisation?

• problems of anonymisation– applied too weakly– applied to strongly– timing – potential for distortion– examples

• user undertakings

• appropriate and sympathetic

Page 23: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Options for preserving confidentiality

• anonymisation

• consent to archive at the time of field work

• researcher contacts informants retrospectively

• user undertakings

• in exceptional circumstances - permission to use or closure of material

Page 24: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Labelling and listing qualitative data

• e.g set of in-depth interviews

• data list: list of contents of research collection

• acts as a point of entry for secondary user

• qualitative data: excel template interviewee/case study characteristics

Page 25: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Online access to qualitative data

• new emphasis on providing direct access to collection content

– supports more powerful resource discovery

– greater scope for searching and browsing content of data (supplementary to higher level study-related metadata)

– since users can search and explore content directly… can retrieve data immediately

• providing access to qualitative data via common interface (EDSD Qualidata Online)

• supporting tools for searching, retrieval, and analysis across different datasets

Means that data must be

accurate and standardised

Page 26: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Enhanced User Guides - qualitative studies

• detailed notes on study methodology; ‘behind the scenes’ interviews with depositors; FAQs

• provide a deeper understanding of the study and research methods

• provide guidance on data resources and how to re-use them

• exemplars and case studies of re-use, including full bibliographies

Page 27: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.
Page 28: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Back up and security• digital, paper and audio media are fragile - digital

media are even easier to change/copy/delete!

• a good backup procedure will protect against a range of mishaps such as: – accidental changes to data– accidental deletion of data – loss of data due to media or software faults– virus infections and hackers– catastrophic events (such as fire or flood)

• backup frequently, retain off site copies

• consider storage conditions, fireproofing etc.

Page 29: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

ESDS in-house processing

• in-house data processing

– ‘cleaning up’ research data

– collating documentation received from depositor

– repairing minor errors

– meeting users’ expectations

– cannot engage in major processing tasks unless destined for publishing into online systems

Page 30: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Ethics and legal Issues:Up front

• issues of consent and confidentiality allowing archiving should be included in the project management plan and addressed before data collection starts

• longer-term rights management in place and IPR issues considered

• unless a waiver on deposition has been agreed, researchers should not make commitments to informants which preclude archiving their data

Page 31: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Consent for archiving• anonymity and privacy of research participants should be

respected

• explicit ‘informed’ consent gained

• information for research participants should be clear and coherent and include:

– purpose of research – what is involved in participation – benefits and risks – storage and access to data – usage of data (current and future uses)– withdrawal of consent at any time– Data Protection and Copyright Acts

• N.B. additional measures are needed when participants are unable to consent through incapacity or age

• reflect needs and views of all

• works in practice

Page 32: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Legal issues in data preparation

• ‘Duty of confidentiality’

• Law of Defamation

• Data Protection Act 1998 and EU Directive

• Copyright Act 1988

• Freedom of Information

Page 33: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Duty of Confidentiality

• disclosure of information may constitute a breach of confidentiality and possibly a breach of contract

• not governed by an Act of Parliament• not necessarily in writing• can be a legal contractual

• exemptions are:– relevant police investigations or proceedings– disclosure by court order– ‘public interest’ - defined by the courts– ethical obligations in cases of disclosure of child

abuse

Page 34: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Law of Defamation

• a defamatory statement is one which may injure the reputation of another person, company or business

Page 35: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Data Protection Act 1998

• eight principles:– fairly and lawfully processed – processed for limited purposes – adequate, relevant and not excessive – accurate – not kept longer than necessary – processed in accordance with the data subject's

rights – secure

– not transferred to countries without adequate protection

• allows for secondary use of data for research purposes under certain conditions

Page 36: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Copyright Act 1988• developed for the broadcasting industry not research!

• protection of author’s rights

• multiple copyrights apply:– automatically assigned to the speaker– researcher holds the copyright in the sound recording of an

interview obtain written assignment of copyright from

interviewee, or oral agreement (licence) to use– employer holds the copyright in research data

obtain copyright clearance from employer)• copyright lasts for 70 years after the end of the year in which

the author dies • copying work is an infringement unless it is for the purposes of

research, private study, criticism or review or reporting current events, and if the use can be regarded as being in the context of 'fair’ dealing

• seek legal advice on problem issues

Page 37: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Freedom of Information

• Freedom of Information Act 2000

A statutory right for individuals and organisations to request information held by public authorities.FOI specifically excludes environmental information which is covered by …

• Environmental Information Regulations 2004

• enables individuals and organisations to obtain environmental information held by public authorities

Page 38: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

What is the legislation?

• statutory rights of access to information

• apply to public authorities – e.g. ESRC and the universities are public authorities

• any one, anywhere can request copy of any information you hold – includes data sets

• not all information has to be released

• must respond to most requests in 20 days

Page 39: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Exemptions –information protected by law

• Don’t panic - not all information has to be made available under FoI and EIRs

• FOI and EIRs provide a number of exemptions that can be applied to the release of information

• the presumption is that information will be made available unless for good reason (a public interest test)

• exemptions protect scientific output, commercial business and personal information (through the Data Protection Act)

• exemptions can be complex and difficult to apply. If in doubt, ask….

Page 40: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Conclusion: archivable research

• suitable for electronic dissemination

• suitable formats for re-use and long-term preservation

• well documented to provide adequate context for new re-use

• consent and IPR enable long-term flexible sharing

• limited in-house data processing undertaken by ESDS

• meeting users needs– building an expansive and varied data portfolio– creating online exploratory/data browsing systems

good housekeeping = good research = good archives

Page 41: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Depositing data with ESDS• provide details of all data collected, together with three

samples of qualitative data, if applicable

• to do this, complete the Data Submission form on the ‘Deposit’ pages on ESDS web site

• dataset will then be formally reviewed for archiving by the UKDA Acquisitions Review Committee

• if accepted, complete the UKDA deposit and licence forms, and send the data, documentation and forms to the UK Data Archive within the required time-scales

• you will be notified when your data are being released via the UKDA online catalogue

• access to data will be granted to registered bona fide researchers only via Athens authentication

Page 42: Guidelines for data preparation Social Science Data Archives: creating, depositing and using data Swansea 23 March 2005 John Southall.

Creating or depositing dataCreating or depositing data

www.esds.ac.uk/aandp/create

[email protected]: 01206 872572/872974