Top Banner
Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management and Sharing workshop Edinburgh, 17 June 2008
12

Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Mar 28, 2015

Download

Documents

Aaron McBride
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Dealing with confidential research information

-Anonymisation techniques and other

measures to enable using and sharing research data

Data Management and Sharing workshopEdinburgh, 17 June 2008

Page 2: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Using and sharing confidential

research data…obtained from people as participants

Requires a combination of:• discussing consent and confidentiality with

participants / respondents (dialogue)• anomymisation of data• user access restrictions

researchers only; use licence with confidentiality agreement; data unavailable for certain time period

What is required depends on:• nature of research• planned data uses• is study specific

Page 3: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Identity disclosure

A person’s identity can be disclosed through: • direct identifiers

name, address, postcode, telephone number, voice, picture

usually NOT essential research information (administrative)

• indirect identifiers – possible disclosure in combination with other information occupation, geography, unique or exceptional values

(outliers) or characteristics

Page 4: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Why anonymise data?

• Ethical reasons – protect identity (sensitive, illegal, confidential

info)– disguise research location

• Commercial reasons

• Legal reasons – protect personal data (DPA)

Page 5: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Essential points

• Never disclose personal data (unless specific consent)

• Reasonable / appropriate level of anonymity

• Maintain maximum meaningful info

• Where possible replace rather than remove

• Identifying info may provide context, do not over-anonymise

• Re-users of data have the same legal and ethical obligation to NOT disclose confidential info as primary users

Page 6: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Anonymising quantitative data

• Remove direct identifiersnames, address, institution

• Reduce the variable precision through aggregationpostcode sector vs full postcode, birth year vs date of birth, occupational categories

• Generalise meaning of text occupational expertise

• Restrict upper / lower ranges to hide outliersincome, age

Page 7: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Relational data

Extra care needed - combinations of related datasets or a dataset in combination with publicly available info can disclose information e.g. businesses studied are mapped in publication

Page 8: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Geo-referenced data

Point data may reveal position of individuals, organisations, businesses, etc.

• Remove point coordinates – loss of all geographical info

• Reduce precision - replace point coordinates with line or polygon of larger areakm2 area, postcode district, ward, road

• Reduce precision - replace point coordinate with meaningful variable typifying the geographical position catchment area, poverty index, population density

But: geo-referenced data are valuable for re-use. Maintaining geo-references and imposing access restrictions is better

Page 9: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Anonymising qualitative data

• Plan or apply editing at startanonymise during transcription, highlight sensitive info for

later anonymising• Except: longitudinal studies - anonymise when data

collection complete (linkages)• Avoid blanking out information• Use pseudonyms or codes• Removing or aggregating identifiers in text can distort

data, make them unusable and unreliable or misleading - avoid over-anonymising

• Consistency within research team and throughout project

• [bracket] replacements for clarity • XML mark-up can be used for anonymisation (TEI tag)

<seg type="anonymised">word to be anonymised</seg>

Page 10: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Tips

• Always consider anonymisation together with consent agreements and user access restrictions

• Regulating / restricting user access may offer a better solution than anonymising

• Remove, mask, change identifiers

• Maintain maximum information

• Create log of all anonymisations

• Keep copy of original data

• Plan at start of research, not at the end

Example: Anonymisation log interview transcripts

Interview / Page Original Changed toInt1p1 Spain European p1 E-print Ltd Printing p2 20th J une June

p2 Amy MoiraInt2p1 Francis my friend

Page 11: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Sources

• Clark, A. 2006. Anonymising research data. NCRM Working Paper Series 7/06. ESRC National Centre for Research Methods.[http://www.ncrm.ac.uk/research/outputs/publications/WorkingPapers/2006/0706_anonymising_research_data.pdf]

• Economic and Social Data Services (ESDS) guidelines, UK Data Archive

• Inter-University Consortium for Political and Social Research (ICPSR). 2005. Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle. 3rd Edition. ICPSR, Ann Arbor.

• Timescapes meetings & discussions

Page 12: Dealing with confidential research information - Anonymisation techniques and other measures to enable using and sharing research data Data Management.

Exercises / scenarios

• Anonymising qualitative data: – Foot & mouth study Cumbria 2001-2003 (5407)– Conflicts and violence in prison (4596)

• Anonymising quantitative data: Labour Force Survey• Confidential relational and geo-referenced data: British

Household Panel Survey