Application of probabilistic linkage methods to join infectious disease surveillance records to death registrations T Lamagni, N Potz, D Powell, N Hinton, A Grant, E Sheridan, R Pebody Healthcare-Associated Infection & Antimicrobial Resistance Department
17
Embed
Application of Probabilistic Linkage Methods_Join Infectious Disease Surveillance Records-Death Registrations_PVERConf_May2011
PowerPoint Presentation from May 2011 Personal Validation and Entity Resolution Conference. Presenters: T. Lamagni, N. Potz, D. Powell, N. Hinton, A. Grant, E. Sheridan, R. Pebody. Presentation Title: Application of probabilistic linkage methods to join infectious disease surveillance records to death registrations
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Application of probabilistic linkage methods to join infectious disease surveillance records to death registrationsT Lamagni, N Potz, D Powell, N Hinton, A Grant, E Sheridan, R Pebody
Healthcare-Associated Infection & Antimicrobial Resistance Department
overview
1. data sharing between organisations
2. use of probabilistic linkage methods for study on infectious disease deaths
3. further uses of probabilistic linkage
4. summary and conclusions
data sharing between public bodies•multitude of potential benefits to sharing of
data between agencies including:
1.accessing new information
2.reducing demands on suppliers of data
•Data Protection Act 1998 (UK) allows data sharing, depending on owner / recipient legal status
•Department for Constitutional Affairs ‘Information sharing vision statement’, 2006:
“Government is committed to more information sharing between public sector
organisations and service providers”
challenges of data sharingethical concerns
• disclosure of personal information between organisations raises concerns over potential erosion of rights to privacy
• in UK, data sharing is regulated through Information Commissioner’s Office
technical barriers
• size of datasets often very large
• datasets may lack common unique identifier
Collaborative project between Health Protection Agency and Office for National Statistics (2005-07).
research study on mortality associated with MRSA infection
aims of linkage study• estimate case fatality following meticillin-resistant Staphylococcus
aureus (MRSA) bacteraemia• undertake analysis of death certification practice• provide sampling frame for confidential investigation
objectives1. develop mechanism to match death registrations to MRSA records 2. carry out an independent evaluation of method3. use linked data in fulfilment of aims given above
matching death registrations to infection records 2004-05Method needed to link datasets taking into account:
• lack of unique identifier in majority of infection records • errors in patient identifiers• size of datasets (MRSA: n=10,305; death registrations: n=1,153,221)
Variables available for matching:
variable coding formatcompletion of variable (%)
infection records death registrations
NHS number 10-digit (validity checked) 29.6 99.9
Forename initial Single letter (A–Z) 96.8 100
Surname Soundex Letter + 3 digits 97.7 100
Sex 1 (male), 2 (female) 97.9 100
Date of birth DD/MM/YYYY 99.0 100
Postcode Letter prefix only 51.4 99.8
probabilistic matching method
Testing of matching undertaken using invasive Streptococcus pneumoniae infection (n=1252) to allow independent evaluation using NHS Central Register Tracing (patient surname needed for tracing).
Total weight of record pair
good matches
query matches
non- matches
Method developed to link large volumes of data that contain errors and omissions using the cumulative value of information available.
Matching steps:1. Acquisition of infection
and mortality data2. Pre-match preparation
including blocking (to reduce computational demand) & weighting of matching variables
3. Match records (SQL server) and calculate total weights for each record pair
4. Build linked file
blocking and weighting variables
Infection data Mortality dataFormatBlockA1 A2 1941 19421941 1942 …… ……
Match
19411941
Weight
A112 A112+17.2
A112 A420-8.0
1941 1941+6.8
+ …
…
blocked by SOUNDEX* blocked by year of birthblocked by SOUNDEX* blocked by year of birth
Weight of matched SOUNDEX* Weight of matched year of birth
A1 A1
A1A2
weights are based on the likelihood of each value representing a true match
matching variables (e.g. patient identifiers) compared within each matched pair of records
* code based on surname
post-matching stages
merge and de-duplicate
set threshold for auto accept/reject
manually check pairs in ‘grey zone’
final matched dataset
matched record pairs from SOUNDEX
blocking
matched record pairs from year of birth
blocking
evaluation of probabilistic matching vs NHS Central Register Tracing
Potz N et al. Probabilistic record linkage of infection records and death registrations: a tool to strengthen surveillance. Stat Commun Infect Dis 2010; 2(1):article 6.
manual checking
zone
probability of true match according to distribution of total weight scores
Potz N et al. Probabilistic record linkage of infection records and death registrations: a tool to strengthen surveillance. Stat Commun Infect Dis 2010; 2(1):article 6.
evaluation of probabilistic matching vs NHS Central Register Tracing
+ve predictive value 97.7% (465/476) to 99.8% (465/466)
NHS CR Tracing
Traced Dead
Traced Not dead
Not traced
Probabilistic record
linkage
Matched to a death record
465 1 10 476
Not matched to
a death record
15 692 60 767
480 693 70 1243
-ve predictive value 90.2% (692/767) to 97.9% (692/707)
Potz N et al. Probabilistic record linkage of infection records and death registrations: a tool to strengthen surveillance. Stat Commun Infect Dis 2010; 2(1):article 6.
interval between diagnosis of MRSA bacteraemia and death England 2004-5
30 day case fatality rate = 38%7 day case fatality rate = 20%
Lamagni TL, et al. Mortality in patients with MRSA bacteraemia, England 2004-05. J Hosp Infect 2011;77:16-20.
Kaplan-Meier time to death following invasive S. pyogenes infection England & Wales 2003-04
Lamagni TL et al. Predictors of death after severe Streptococcus pyogenes infection. Emerg Infect Dis 2009;15(8):1304-7.
further application of probabilistic linkage
De-duplication of routine surveillance data new probabilistic matching system implemented in July 2009
Linkage to other health datasets surveillance data linkage to external health datasets to augment routine monitoring/ provide platform for research (Hospital Episode Statistics, clinical patient networks, primary care surveillance)
e.g. project linking patients on UK Renal Registry (all patients undergoing renal dialysis) to bacteraemia surveillance data to identify risk factors and impact on mortality
summary & conclusions
• probabilistic linkage offers a viable technique to link ‘difficult’ datasets
• method can me amended depending on intended use e.g. use of single threshold to accept/reject matches where absolute certainty of match not needed
• data sharing between health sector organisations is providing unique opportunities for public health research (powerful studies at relatively low cost + pursuit of novel research questions through access to new information)
• ensuring public trust and confidence in security of data and demonstrating public benefit essential
acknowledgements
Study Team Nicki Potz, Senior Scientist; David Powell, Database Manager; David Bridger, Research Nurse
Additional members of the Project Board Andrew Chronias, HPA; Clare Griffiths, Office for National Statistics (ONS); Nourieh Hoveyda, ONS; Cleo Rooney, ONS; Levin Wheller, ONS; Jennie Wilson, HPA; Richard Pebody, ONS/HPA
Steering Group Georgia Duckworth, HPA; Joy Dobbs, ONS; Peter Goldblatt, ONS; Andrew Phillips, University College London; Sarah Scobie, National Patient Safety Agency; Robert Spencer, Hospital Infection Society.
Funders Department of Health for England
Enhanced S. pneumonia surveillance provided courtesy of HPA Respiratory and Systemic Infection Laboratory HPA Immunisation Department
We thank our microbiology colleagues in laboratories across the UK for their continued reporting of infectious diseases.