Top Banner
Application of probabilistic linkage methods to join infectious disease surveillance records to death registrations T Lamagni, N Potz, D Powell, N Hinton, A Grant, E Sheridan, R Pebody Healthcare-Associated Infection & Antimicrobial Resistance Department
17

Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

May 11, 2015

Download

Health & Medicine

PowerPoint Presentation from May 2011 Personal Validation and Entity Resolution Conference. Presenters: T. Lamagni, N. Potz, D. Powell, N. Hinton, A. Grant, E. Sheridan, R. Pebody. Presentation Title: Application of probabilistic linkage methods to join infectious disease surveillance records to death registrations
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

Application of probabilistic linkage methods to join infectious disease surveillance records to death registrationsT Lamagni, N Potz, D Powell, N Hinton, A Grant, E Sheridan, R Pebody

Healthcare-Associated Infection & Antimicrobial Resistance Department

Page 2: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

overview

1. data sharing between organisations

2. use of probabilistic linkage methods for study on infectious disease deaths

3. further uses of probabilistic linkage

4. summary and conclusions

Page 3: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

data sharing between public bodies•multitude of potential benefits to sharing of

data between agencies including:

1.accessing new information

2.reducing demands on suppliers of data

•Data Protection Act 1998 (UK) allows data sharing, depending on owner / recipient legal status

•Department for Constitutional Affairs ‘Information sharing vision statement’, 2006:

“Government is committed to more information sharing between public sector

organisations and service providers”

Page 4: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

challenges of data sharingethical concerns

• disclosure of personal information between organisations raises concerns over potential erosion of rights to privacy

• in UK, data sharing is regulated through Information Commissioner’s Office

technical barriers

• size of datasets often very large

• datasets may lack common unique identifier

Page 5: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

Collaborative project between Health Protection Agency and Office for National Statistics (2005-07).

research study on mortality associated with MRSA infection

aims of linkage study• estimate case fatality following meticillin-resistant Staphylococcus

aureus (MRSA) bacteraemia• undertake analysis of death certification practice• provide sampling frame for confidential investigation

objectives1. develop mechanism to match death registrations to MRSA records 2. carry out an independent evaluation of method3. use linked data in fulfilment of aims given above

Page 6: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

matching death registrations to infection records 2004-05Method needed to link datasets taking into account:

• lack of unique identifier in majority of infection records • errors in patient identifiers• size of datasets (MRSA: n=10,305; death registrations: n=1,153,221)

Variables available for matching:

variable coding formatcompletion of variable (%)

infection records death registrations

NHS number 10-digit (validity checked) 29.6 99.9

Forename initial Single letter (A–Z) 96.8 100

Surname Soundex Letter + 3 digits 97.7 100

Sex 1 (male), 2 (female) 97.9 100

Date of birth DD/MM/YYYY 99.0 100

Postcode Letter prefix only 51.4 99.8

Page 7: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

probabilistic matching method

Testing of matching undertaken using invasive Streptococcus pneumoniae infection (n=1252) to allow independent evaluation using NHS Central Register Tracing (patient surname needed for tracing).

Total weight of record pair

good matches

query matches

non- matches

Method developed to link large volumes of data that contain errors and omissions using the cumulative value of information available.

Matching steps:1. Acquisition of infection

and mortality data2. Pre-match preparation

including blocking (to reduce computational demand) & weighting of matching variables

3. Match records (SQL server) and calculate total weights for each record pair

4. Build linked file

Page 8: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

blocking and weighting variables

Infection data Mortality dataFormatBlockA1 A2 1941 19421941 1942 …… ……

Match

19411941

Weight

A112 A112+17.2

A112 A420-8.0

1941 1941+6.8

+ …

blocked by SOUNDEX* blocked by year of birthblocked by SOUNDEX* blocked by year of birth

Weight of matched SOUNDEX* Weight of matched year of birth

A1 A1

A1A2

weights are based on the likelihood of each value representing a true match

matching variables (e.g. patient identifiers) compared within each matched pair of records

* code based on surname

Page 9: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

post-matching stages

merge and de-duplicate

set threshold for auto accept/reject

manually check pairs in ‘grey zone’

final matched dataset

matched record pairs from SOUNDEX

blocking

matched record pairs from year of birth

blocking

Page 10: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

evaluation of probabilistic matching vs NHS Central Register Tracing

Potz N et al. Probabilistic record linkage of infection records and death registrations: a tool to strengthen surveillance. Stat Commun Infect Dis 2010; 2(1):article 6.

manual checking

zone

Page 11: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

probability of true match according to distribution of total weight scores

Potz N et al. Probabilistic record linkage of infection records and death registrations: a tool to strengthen surveillance. Stat Commun Infect Dis 2010; 2(1):article 6.

Page 12: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

evaluation of probabilistic matching vs NHS Central Register Tracing

+ve predictive value 97.7% (465/476) to 99.8% (465/466)

NHS CR Tracing

Traced Dead

Traced Not dead

Not traced

Probabilistic record

linkage

Matched to a death record

465 1 10 476

Not matched to

a death record

15 692 60 767

480 693 70 1243

-ve predictive value 90.2% (692/767) to 97.9% (692/707)

Potz N et al. Probabilistic record linkage of infection records and death registrations: a tool to strengthen surveillance. Stat Commun Infect Dis 2010; 2(1):article 6.

Page 13: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

interval between diagnosis of MRSA bacteraemia and death England 2004-5

30 day case fatality rate = 38%7 day case fatality rate = 20%

Lamagni TL, et al. Mortality in patients with MRSA bacteraemia, England 2004-05. J Hosp Infect 2011;77:16-20.

Page 14: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

Kaplan-Meier time to death following invasive S. pyogenes infection England & Wales 2003-04

Lamagni TL et al. Predictors of death after severe Streptococcus pyogenes infection. Emerg Infect Dis 2009;15(8):1304-7.

Page 15: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

further application of probabilistic linkage

De-duplication of routine surveillance data new probabilistic matching system implemented in July 2009

Linkage to other health datasets surveillance data linkage to external health datasets to augment routine monitoring/ provide platform for research (Hospital Episode Statistics, clinical patient networks, primary care surveillance)

e.g. project linking patients on UK Renal Registry (all patients undergoing renal dialysis) to bacteraemia surveillance data to identify risk factors and impact on mortality

Page 16: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

summary & conclusions

• probabilistic linkage offers a viable technique to link ‘difficult’ datasets

• method can me amended depending on intended use e.g. use of single threshold to accept/reject matches where absolute certainty of match not needed

• data sharing between health sector organisations is providing unique opportunities for public health research (powerful studies at relatively low cost + pursuit of novel research questions through access to new information)

• ensuring public trust and confidence in security of data and demonstrating public benefit essential

Page 17: Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011

acknowledgements

Study Team Nicki Potz, Senior Scientist; David Powell, Database Manager; David Bridger, Research Nurse

Additional members of the Project Board Andrew Chronias, HPA; Clare Griffiths, Office for National Statistics (ONS); Nourieh Hoveyda, ONS; Cleo Rooney, ONS; Levin Wheller, ONS; Jennie Wilson, HPA; Richard Pebody, ONS/HPA

Steering Group Georgia Duckworth, HPA; Joy Dobbs, ONS; Peter Goldblatt, ONS; Andrew Phillips, University College London; Sarah Scobie, National Patient Safety Agency; Robert Spencer, Hospital Infection Society.

Funders Department of Health for England

Enhanced S. pneumonia surveillance provided courtesy of HPA Respiratory and Systemic Infection Laboratory HPA Immunisation Department

We thank our microbiology colleagues in laboratories across the UK for their continued reporting of infectious diseases.