Top Banner
Data re-use in the CALIBER programme Anoop Shah ([email protected]) Clinical Epidemiology Group, University College London 14 th November 2013
24

Data re-use in the CALIBER programme

May 11, 2015

Download

Technology

Ian Timaeus

An overview of work being performed to make research data easier to manage, analyse and use in the CALIBER programme. Presentation given by Anoop Shah of UCL at the Data Management in Practice workshop which took place on Nov 14th at the London School of Hygiene and Tropical Medicine
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data re-use in the CALIBER programme

Data re-use in the CALIBERprogramme

Anoop Shah ([email protected])

Clinical Epidemiology Group, University College London

14th November 2013

Page 2: Data re-use in the CALIBER programme

1 The CALIBER programme

2 Why make research data re-usable?

3 The CALIBER approach

4 Summary

Page 3: Data re-use in the CALIBER programme

The CALIBER programme

UCL & LSHTM collaboration

HospitalEpisode Statistics

MINAP registryGeneral practice

Deathregistrations

CALIBERlinked research database

Funded by NIHR and Wellcome Trust

Page 4: Data re-use in the CALIBER programme

CALIBER data

Page 5: Data re-use in the CALIBER programme

Defining continuous variables

clinical e.g. blood pressure, laboratory e.g. white cellcount

� Recorded in CPRD (primary care)

� Identified by ‘entity code’ and medcode (moregranular)

� Lab data now electronically transferred� Problems:

� Missing units� Erroneous values� Inconsistent recording� Missing data

Page 6: Data re-use in the CALIBER programme

Medcodes associated with a test resultExample: neutrophil counts (a type of white bloodcell) – may be absolute or percentage

Medcode Percent Term

18 89.6 Neutrophil count

17622 9.9 Percentage neutrophils

23114 0.3 Granulocyte count

23115 0.1 Percentagegranulocytes

13777 0.1 Neutrophil count NOS

Page 7: Data re-use in the CALIBER programme

Distribution of values for different units

Page 8: Data re-use in the CALIBER programme

Most common units

Page 9: Data re-use in the CALIBER programme

Analysis issues

� Extraction algorithm� Remove biologically implausible extreme values

� In a huge dataset with no restriction on possiblevalues, there will be some errors

� Standardise units� Decide how to analyse

� Timing e.g. relative to index date� Repeat measures� Transformation, splines, categories etc.� Missing data (e.g. multiple imputation)

Page 10: Data re-use in the CALIBER programme

Observation time in GP practice

� Observation time – when registered at GPpractice

� Practice ‘up to standard date’ – date afterwhich we expect that data are recorded

� If nothing recorded while registered at GP:� Patient may be abroad� Patient may be genuinely healthy

� Excluding observation time with no recordsrisks bias

Page 11: Data re-use in the CALIBER programme

Defining a diagnosis, e.g. atrial fibrillation

Page 12: Data re-use in the CALIBER programme

Defining a diagnosis

� Cross-map against different datasets� Individual data sources may miss cases, so

consider using linked datasets� Important for accurate measures of incidence� May be less important for associations between

disease and risk factor, as long as the risk factordoes not influence recording

Page 13: Data re-use in the CALIBER programme

Non-fatal myocardial infarction – allsources miss cases

8%

6% 7%

20%18% 10%Primarycare(CPRD)

MINAPdiseaseregistry

HospitalEpisodeStatistics

Page 14: Data re-use in the CALIBER programme

Motivations for re-using data

� Time taken to prepare data and definevariables

� Cost

� Different definitions used by different groups� Lack of transparency and reproducibility

Page 15: Data re-use in the CALIBER programme

Possible approaches

� Ad hoc sharing of codelists and algorithmswithin a group

� Publish codelists and algorithms with papers� The CALIBER approach

� Repository of codelists and algorithms� Web portal for researcher access

Page 16: Data re-use in the CALIBER programme

CALIBER ‘LEGO’ data access model

1001, 2000-01-01, 23,1,NULL,I481001, 1994-08-11,1234,1,3,7L1H3001001, 1993-01-01, 253,1,1,793Mz001231, 2012-03-03, 23,1,123,K651121, 2013-05-04, 7,1,3,5,14AN.001121, 2011-05-21, 81,1,9, G5731001511, 1993-01-11, 91,1,6,9hF1.00 1511, 199-03-11, 91,1,6, G5731009913, 2012-05-21, 81,1,9, G57310067222, 1994-11-01,1234,1,3,7L1H30067222, 1995-12-21,1234,1,3,7L1H30067222, 1991-03-03,1234,1,3,7L1H310682444, 1993-01-01, 253,1,1,793Mz00

1001, 2000-01-01, af_gprd=1 1231, 2012-03-03, af_hes=31121, 2013-05-04, af_procs_gprd=11511, 1993-01-11, heart_valve_gprd=29913, 2012-05-21, af_hes=167222, 1994-08-11, af_hes=1682444, 1993-01-01, heart_valve_hes=2

af=1, af_diag_date=2001-12-01

Page 17: Data re-use in the CALIBER programme

CALIBER phenotypes (research variables)

� Consistent definitions for multiple studies (over300 variables curated)

� Read, ICD-9, ICD-10, OPCS codelists

� Web portal to view variable definitions, andregistered users can view codelists (https://www.caliberresearch.org/portal)

� Future: able to download scripts (e.g. Stata, R,SQL)

Page 18: Data re-use in the CALIBER programme

CALIBER data portal

Page 19: Data re-use in the CALIBER programme

Open data

Page 20: Data re-use in the CALIBER programme

CALIBER data portal

� Encourage researchers to define variables in away that will be of use to others

� Final validated versions of codelists andvariables

� Review by clinician and researcher

Page 21: Data re-use in the CALIBER programme

CALIBER analysis software

� R packages for managing codelists and datapreparation (http://caliberanalysis.r-forge.r-project.org/)

� Lookup tables and data dictionaries

� Functions to simplify / automate commonsteps in data preparation

Page 22: Data re-use in the CALIBER programme

CALIBER expects researchers tocontribute to the resource

Researchcoordinator

Website content

Analysis

Publication

Impacts

InvestigatorsNon-

investigators Industry

ExperiencedNon-

experienced

Website form

Approvals

Data

Unified data access form

LEGO data access modelContribute phenotyping algorithms, linkages

Project feasibility and prioritization

Open access

Contribute to knowledge base

Advancement of knowledgeTranslationLegislation, policy, guidelinesEconomic benefit, industry

Page 23: Data re-use in the CALIBER programme

Difficulties encountered

� Setting up the data portal takes time, needsdedicated staff

� Researchers need to think outside their ownproject

� Variables are updated / corrected; need tostore different versions

Page 24: Data re-use in the CALIBER programme

Summary

� When analysing routine data think about howthe data were collected, and cross-checkdifferent sources of information

� Data sharing and re-use can bring benefits butneeds time and resources to manage