Top Banner
 Data Cleaning
37

Data Cleaning

Jul 14, 2015

Download

Documents

Maulik Chauhan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 1/37

Data Cleaning

Page 2: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 2/37

Understanding Discrepancies

Discrepancies are ³Inconsistencies´

found in the clinical trial data which

need to be corrected as per the study

protocol (the guiding document)

For better query management, one has

to know the source or origin of 

discrepanciesData in clinical trial should be congruent

with the study protocol

Page 3: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 3/37

Data Inconsistencies

Data

consistent?Data legible?

Data complete?

Data correct?

Data logical?

Data

Inconsistencies

Page 4: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 4/37

Discrepancy Types

Completeness &

Consistency:

Checks for empty

fields Looking for all

related items

Cross-checking

values

R eal-world checks:

Range checks

Discrete valuechecks

One valuegreater/lessthan/equal toanother 

Page 5: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 5/37

Discrepancy Types

Qu ality control: Are dates in logical

sequence?

Is header information

consistent?

Any missing visits or 

pages?

Compliance & Safety: Are visits in compliance

with protocol?

Inclusion/exclusion

criteria met? Checking lab values

against normals

Comparison of values

over time

Page 6: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 6/37

Validating/Cleaning Data

Data cleaning or validation is a collection of 

activities used to assure validity & accuracy

of data

Logical & Statistical checks to detect

impossible values due to data entry errors

coding

inconsistent data

Page 7: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 7/37

Who Cleans the Database?

Data Management Plan

through SOPs

clearly defines tasks

& responsibilities

involved in database

cleaning

Please see

SOPs« who is

Responsible for 

Cleaning!

Page 8: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 8/37

Cleaning Data

 Activities include addressing discrepancies by:

Manual review of data and CR

Fs by clinicalteam or data management

Computerized checks of data by the clinical

data management system:     Validation/discrepancy designing and build-up

into the database

Page 9: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 9/37

Making sure that raw data were accuratelyentered into a computer-readable file

Checking that character variables contain only

valid values

Checking that numeric values are withinpredetermined ranges

Checking for & eliminating duplicate data entries

Checking if there are missing values for variables

where complete data are necessary

Definition of data cleaning

Page 10: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 10/37

Definition of data cleaning«contd

Checking for uniqueness of certain values,

such as subject IDs

Checking for invalid data values & invaliddate sequences

Verifying that complex multi-file (or cross

panel) rules have been followed. For e g., if 

an  AE of type X occurs, other data such as

concomitant medications or procedures might

be expected

Page 11: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 11/37

Documents to be Followed

Protocol

Guidelines ± General & Project-Specific

SOPs Subject Flowcharts

Clean Patient Check Lists

Tracking Spreadsheets

Page 12: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 12/37

Clean Data Checklist

Refers to a list of checks to be performed bydata management while cleaning database

Checklist is developed & customized as per 

client specifications Provides list of checks to be performed both on

ongoing/periodic basis

towards end of study

Strict adherence to checklist prevents missingout on any of critical activities

Page 13: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 13/37

Point-by-

point checks

Textual

data

Continuingevents

Query

generation

Queryintegration

Missing

data

Duplicate

data

Protocol

violation

Coding

S AE Recon

Data

consistency

Ranges

External

data

Visit

sequence

Data

Cleaning

Page 14: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 14/37

Point-by-Point Checks

Refers to cross checking between CRF &database for every data point

Constitutes a ³second-check´ apart from

data entry Incorrect entries/entries missed out by

Data Entry are corrected during cleaning

Special emphasis to be given for  Dates

Numerical values

Header information (including indexing)

Page 15: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 15/37

Missing Data Checks

Missing responses to be queried for,

unless indicated by investigator as

not done

not available

not applicable

Validations to be programmed to flag

missing field discrepancies

Missing

Data«!!

Page 16: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 16/37

Missing Page Checks

Expected pages identified during setup

of studies

Tracking reports of missing pages to be

maintained to identify

CRFs misrouted in-house

CRFs never sent from Investigator¶s site

AERecord

Page 17: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 17/37

Protocol Violation Checks

Protocol adherence to be reviewed &

violations, if any, to be queried

Primary safety & efficacy endpoints to

be reviewed, to ensure protocol

compliance

Page 18: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 18/37

Key protocol violations

Inclusion & exclusion criteria adherence

Age

Concomitant medications/antibiotics

Medical condition

Study drug dosing regimen adherence

Study or drug termination specifications

Switches in medications

Page 19: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 19/37

Continuity of Data Checks

Refers to checking continuity of eventsthat occur 

across study

across visits

Includes Adverse Events

Medications

Treatments/Procedures

Overlapping Start/Stop Dates &

Outcomes to be checked across visits

Page 20: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 20/37

Continuity of Data Checks

Overlapping dates across visits:

Scenario: Per protocol,  AEs are to be recorded on

Visits 1, 2 & 3

³Headache´ is recorded as follows:

Visit Start Date Stop Date Outcome

1 01-Jan-2004 12-Jan-2004 Continues

2 01-Jan-2004 12-Jan-2004 Resolved

3 20-Jan-2004 20-Jan-2004 Resolved

Page 21: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 21/37

Consistency Checks

Designed to identify potential dataerrors by checking sequential order of dates

corresponding events missing data (indicated as existing elsewhere)

Involves cross checking between data

points across CRFs

within same CRF

Page 22: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 22/37

Consistency ChecksCross check across different CRFs:

 AE reported with action ³concomitant medication´ ( AE Record)

Ensure corresponding concomitant medication reported in

appropriate timeframe (Concomitant Medication Record)

Event Start Date Stop Date Outcome

Fever 13-Jun-2005 20-Jun-2005 Resolved

Medication Start Date Stop Date Outcome

Paracetamol 14-Jun-2005 20-Jun-2005 Stopped

Page 23: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 23/37

Consistency Checks

Cross check within same CRF:

1st DCM: Report doses of antibiotics taken ³before´ intake of 

first dose of study drug

2nd DCM: Report doses of antibiotics taken ³after´ intake of first

dose of study drug:

NOTE: First dose of study drug is taken on 15-May-2001

Antibiotic Dose Route Start Date Stop Date

 Amoxicillin 6 mg Oral 11-May-2001 14-May-2001

Antibiotic Dose Route Start Date Stop Date

Streptomycin 7 mg IV 16-May-2001 17-May-2001

Page 24: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 24/37

Coding Checks

Textual or free text data collected & reported( AEs, medications) must be coded before

they can be aggregated & used in summary

analysis

Coding consists of matching text collected onCRF to terms in a standard dictionary

Items that cannot be matched, or coded

without clarification from site

Ulcers, for example, require a location (gastric,

duodenal, mouth, foot, etc.) to be coded

code

Page 25: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 25/37

Range Checks

Designed to identify statistical outliners

values that are physiologically impossible

values that are outside normal variation of 

population under study

Ensure that appropriate range values areapplied For eg., ranges for WBCs can be applied either in

µpercentage¶ or in µabsolute¶

Ensure that appropriate ranges are applieddepending on whether lab used is Primary

Secondary

Page 26: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 26/37

Range Checks

Cross check between Hematology record &  AE record:

Hematology

Test

Date Result Normal

Range

WBC 05-Jan-2006 13,710

cells/ L/cu mm

4,300 -

10,800

cells/ L/cu

Event Start Date Stop Date Outcome

Streptococcal

infection

04-Jan-2006 07-Jan-2006 Resolved

Page 27: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 27/37

External Data Checks

Ensure receipt of all required external datafrom centralized vendors:

Laboratory Data

Device Data (ECG, Bioimages)

Missing e-data records to be tracked &

requested from vendor on a periodic basis

Missing data to be noted & corresponding

values to be µre-loaded¶ by vendor 

Page 28: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 28/37

External Data Checks

Examples of missing data/values:

Missing collection time of blood sample Missing date of ECG

Missing location of chest radiograph

Missing systolic blood pressure Missing microbiological culture transmittal ID

Page 29: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 29/37

External Data Checks

Examples of invalid data/values:

Incorrect loading of visit number 

Incorrect loading of subject number 

Incorrect loading of date/time of collection

Page 30: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 30/37

Duplicate Data Checks

Refers to duplicate entries within a single CRF

across similar CRFs

Duplicate entries & duplicate records tobe deleted per guideline specifications

Examples:

Treatment µphysiotherapy¶ on µ30- Aug-2001¶reported twice on either same TreatmentRecord or across two different TreatmentRecords

Page 31: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 31/37

Duplicate Data Checks

Examples:

Both Visit 4 & Visit 10 Blood Chemistry CRFs

(with different collection dates) are updated with

same values for all tests performed

Both µprimary¶ & µadditional¶ Medical History

CRFs at Screening are reported with same

details of abnormalities

Which one to

Retain«?

Page 32: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 32/37

Textual Data Checks

All textual data to be proofread & checked

for spelling errors

Obvious mis-spellings to be corrected per 

Internal Correction (as specified by

guidelines)

Common examples of textual data: Abnormalities/pre-existing conditions in Medical History

record Adverse Events

Medications/ Antibiotics

Project & study-specific data

Page 33: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 33/37

Visit Sequence Checks

Sequence of visits should be reviewed

& if out of sequence, should be either 

queried

corrected per Internal Correction (as per guidelines)

Either a single CRF or a group of CRFs

could be out of sequence with thatparticular visit

Page 34: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 34/37

Visit Sequence Checks

Visit Visit date

1 01-Jan-2000

2 02-Jan-2000

3 03-Jan-2000

4 04-Jan-2000

Visit Vitals Record

Date of Vitals

1 01-Jan-2000

2 03-Jan-2000

3 02-Jan-2000

4 04-Jan-2000

Screening

Record Visit date

Demography 20-Feb-2006

Med. History 20-Feb-2005

Inclusion

Criteria

20-Feb-2006

 AE 20-Feb-2006

Page 35: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 35/37

S AE Reconciliation Checks

All S AEs reported on CRFs should be

reconciled with those reported on S AE 

Reports & vice versa

Communication to be maintained with

Sponsor 

Clinical Scientist

Page 36: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 36/37

Questions ?

Page 37: Data Cleaning

5/12/2018 Data Cleaning - slidepdf.com

http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 37/37

Thank you!