Top Banner
Preparing Research Data for Sharing An overview for LSHTM students Gareth Knight & Victoria Cranna This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License LSHTM eThesis session Presented on 10 th and 18 th July 2013
22

Preparing research data for sharing

May 11, 2015

Download

Technology

Ian Timaeus

workshop session delivered alongside 'Making your thesis legal' workshop in July and September 2013 to PhD, MPhil, DrPh students who are completing their thesis. Discusses standards for sharing data, issues that need addressing, formats, data protection, usability, licenses
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Preparing research data for sharing

Preparing Research Data for Sharing

An overview for LSHTM students

Gareth Knight & Victoria Cranna

This work is licensed under aCreative Commons Attribution 2.0 UK:

England & Wales License

LSHTM eThesis sessionPresented on 10th and 18th July 2013

Page 2: Preparing research data for sharing

Data Sharing in the News

Page 3: Preparing research data for sharing

Research Data

“Data produced during the research activity should be managed appropriately, ensuring

that it is stored, organised and documented in a manner that allows it to be understood and

used for the intended purpose.”Research Degrees Handbook: Academic Year 2012-13

Page 4: Preparing research data for sharing

To Share or not to Share1. Is the Sharing justified?

• What benefits will it provide?• What are the risks associated with sharing data?

2. Do you have the ability to share?• Intellectual Property Rights (IPR)• Participant Consent• Other obligations, e.g. confidentiality

3. Are there any conditions associated with sharing?• What measures need to be in place to protect data? (e.g. record access

requests, specific use only)

Information Commissioner Office. Data Sharing Code of Practicehttp://www.ico.org.uk/for_organisations/data_protection/topic_guides/data_sharing

Page 5: Preparing research data for sharing

Reasons for

• Encourages validation of research findings

• Increase visibility of research findings through attribution and further analysis

• Comply with sponsor obligations

• Comply with journal publisher req.

• Simple way to deal with annoying data requests

Page 6: Preparing research data for sharing

Reasons against

• Ownership issues , e.g. 3rd party rights

• Participant Confidentiality - DPA 1998 –not apply to anonymised data

• Sensitivity - Implications of release (e.g. geo-references for animal migration).

• Commercial/Research exploitation

• Contractual, regulatory, & legislative

What are the risks of data release?

Page 7: Preparing research data for sharing

Protection ofResearch Participants

“ Researchers must ensure the confidentiality of personal information relating to research

participants”

“Prior to publication or depositing data in a public depository, data should be fully

anonymised”LSHTM Guidelines on Good Research Practice

Page 8: Preparing research data for sharing

Data Protection Act 1998

Personal DataInfo that can be used to identify individual in isolation, or in tandem with other info. E.g. Name, age, address, etc.

Sensitive Personal Dataracial or ethnic originpolitical opinionsreligious beliefstrade union membershipphysical or mental health sexual lifecriminal convictions

Protect living individual’s fundamental rights and freedoms in relation to storage, processing, and disclosure of information held about them

Page 9: Preparing research data for sharing

Data Protection PrinciplesEight principles which broadly state that personal data shall be:1. Fairly and lawfully processed

2. Obtained only for specified purposes, and shall not be further processed for other purposes that are incompatible with the original reason

3. Adequate, relevant and not excessive in comparison to original purpose

4. Accurate and where necessary, kept up to date

5. Held no longer than is necessary

6. Processed in accordance with the data subject’s rights

7. Kept securely and safely with appropriate measures to prevent unauthorised or unlawful processing of the data and against accidental loss, destruction or damage

8. Not transferred to countries without adequate protection

Page 10: Preparing research data for sharing

Potential ExemptionsNo blanket exemption, but...• Certain exemptions for research purposes including

statistical or historical purposes.• If the research processing is not targeted at particular

individual & does not cause substantial distress or damage to a data subject, then:

• 2nd principle - personal data can be processed for purposes other than for which they were originally obtained

• 5th principle - personal data can be held indefinitely• Analysis results do not identify data subjects

Information Commissioner Office: Guide to Data Protection http://www.ico.org.uk/for_organisations/data_protection/the_guide

Page 11: Preparing research data for sharing

Reducing Disclosure risk

Disclosure Types:• Identity: Identify person directly• Attribute: ID sensitive info on subject• Inferential: Determine value of a subject’s

characteristic more accurately than would have been otherwise possible

Techniques:• Remove obvious identifiers (DPA 1998)• Replace real data with synthetic• Limit variables that are made available• Sampling with a larger group• Group significant values / Top/bottom coding• Limit geographic detail

Avoiding inappropriate attribution of information to a data subject

Information Commissioner Office: Anonymisation Code of Practicehttp://www.ico.org.uk/for_organisations/data_protection/topic_guides/anonymisation

What about aggregated data?

Page 12: Preparing research data for sharing

Ensuring continued accessProblems:1. User doesn’t possess relevant

software package2. User runs a different operating

system than the creator (e.g. Linux, MacOS)

3. Software package is obsolete

Options:• Emulation of original

environment• Export to other format

Page 13: Preparing research data for sharing

Choosing File Formats

Format should be:• Accessible using wide-range of

software tools• In widespread use• Support relevant information

attributes without loss• Based upon a public specification• Able to be created without DRM or

other limitations

“turning [a] PDF into XML is like turning a hamburger into a cow”Peter Murray-Rust

Page 14: Preparing research data for sharing

Recommended FormatsQuantitative tabular:• Preferred: SPSS portable format (.por), delimited txt & command/setup file• Acceptable: SPSS (.sav), Stata (.dta), MS Access & other proprietary formats

Geospatial:• Preferred: ESRI Shapefile, Geo-referenced TIFF (.tif, .tfw)• Acceptable: SRI Geodatabase format (.mdb), MapInfo Interchange Format (.mif), Keyhole

Mark-up Language (KML) (.kml)

Qualitative text:• Preferred: XML-encoded text (e.g. DDI, TEI), Open Document Format (ODF), Rich Text

Format (RTF)• Acceptable: MS Word, NVivo

Still Images:• Preferred: TIFF, Uncompressed lossless JP2000• Acceptable: PNG, RAW, Compressed JP2000

Page 15: Preparing research data for sharing

Ensuring Understandability

Researcher Qs:• What does the variable mean?• How were the results produced?• What are the boundaries of the

measurement?• What instruments and measures

were used?

A user – a 3rd party or future self) has difficult understandingsome aspect of the research data

Source:• Lab notebooks & research protocols• Codebooks and data dictionaries• Equipment settings &

instrument calibration Approach:1. Check reqs in your field (e.g. Clinical)2. Look at other collections (e.g. UKDS)3. Consider Qs that user may have when accessing

Page 16: Preparing research data for sharing

Ensuring Usability

Scenarios:1. Uncertain if permitted to

analyse data – does not use.2. Researcher uses data in research

for non-permitted purpose

End user unsure on permitted use of data

Licence should specify:• Data that the licence applies to;• Who owns each component;• Who is permitted access & use;• Conditions associated with use

Page 17: Preparing research data for sharing

1. Standard licence modelCreative Commons

Attribution (BY): Creator must be creditedNo Derivatives (ND): No editing or manipulation

Non-Commercial (NC): Cannot be soldShare Alike (SA): Share under same licence

Open Data CommonsPublic Domain Dedication & License

(PDDL)Attribution License (ODC-By)

Open Database License (ODC-ODbL)Attribution Share-Alike

Various software Licence ModelsGNU General Public License (GPL)

GNU General Public License (LGPL)BSD license

Etc.

Page 18: Preparing research data for sharing

2. Tailored Licence form• National Cancer Research Institute - Data

and Material Transfer Agreement template• http://www.ncri.org.uk/default.asp?s=1&

p=8&ss=9• UK Data Service licence

http://ukdataservice.ac.uk/deposit-data/support/licence.aspx

• CELCIUS Data Access Agreementhttp://celsius.lshtm.ac.uk/documents/Data%20Access%20Agreement.doc

• Participant Consent form http://www.lshtm.ac.uk/research/ethicscommittees/

Digital Curation Centre: How to License Research Datahttp://www.dcc.ac.uk/resources/how-guides/license-research-data

Page 19: Preparing research data for sharing

LSHTM Data Repository

• Public: data made available for anonymous access

• Registered: End user required to register for time-limited access

• Approved: End user must state purpose they wish to use data for.

• Embargoed: Data associated withheld for a designated time period, e.g. 5 years.

• Request: Data not held in the repository may be requested from the creator

In-development service capable of curating, preserving, and sharing LSHTM research data

Page 20: Preparing research data for sharing

A Few Useful References• MANTRA – Data Management training for PhD students

http://datalib.edina.ac.uk/mantra/

• UK Data Archive – Managing and Sharing Datahttp://www.data-archive.ac.uk/media/2894/managingsharing.pdf

• LSHTM Information Management support materialhttp://intra.lshtm.ac.uk/infoman/

• Data Protection web pages: http://intra.lshtm.ac.uk/infoman/data/

• Guidelines on good research practice: Implementing research governance: http://www.lshtm.ac.uk/research/ethicscommittees/good_research_practice.pdf

• Research Degrees Handbook: http://www.lshtm.ac.uk/study/currentstudents/studentinformation/rd_handbook_12_13.pdf

• Information Management and Security Policy: http://intra.lshtm.ac.uk/infoman/security/index.html

Page 22: Preparing research data for sharing

Image References• “Sharing” (CC BY-NC 2.0) http://www.flickr.com/photos/tobanblack/3773116901/• "Women slicing tomatoes for food preparation" (CC BY-NC 2.0) • http://www.flickr.com/photos/45796762@N03/7999269493/• “Warned” (CC BY 2.0) http://www.flickr.com/photos/figgenhoffer/2598487764/• “Day 114, Project 365 - 2.13.10” (CC BY 2.0)• http://www.flickr.com/photos/93841400@N00/4355611690/• "license" (CC BY 2.0) • http://www.flickr.com/photos/flowizm/3861998999/• Rosetta Stone (CC BY-NC 2.0) http://www.flickr.com/photos/65713088@N00/6268592919/• “Obsolete Packages” (CC BY-SA 2.0) http://www.flickr.com/photos/floydwilde/160475157/• “Activity SpreadSheet. Aug. 1” (CC BY-NC 2.0).

http://www.flickr.com/photos/bitchcakes/7993211140/• "2006-06-14 012 - Cow" (CC BY-NC 2.0) http://www.flickr.com/photos/chrisq/167074953/• My favorite (CC BY-SA 2.0)• http://www.flickr.com/photos/erwss/3129884643/