Data Cleaning
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 1/37
Data Cleaning
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 2/37
Understanding Discrepancies
Discrepancies are ³Inconsistencies´
found in the clinical trial data which
need to be corrected as per the study
protocol (the guiding document)
For better query management, one has
to know the source or origin of
discrepanciesData in clinical trial should be congruent
with the study protocol
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 3/37
Data Inconsistencies
Data
consistent?Data legible?
Data complete?
Data correct?
Data logical?
Data
Inconsistencies
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 4/37
Discrepancy Types
Completeness &
Consistency:
Checks for empty
fields Looking for all
related items
Cross-checking
values
R eal-world checks:
Range checks
Discrete valuechecks
One valuegreater/lessthan/equal toanother
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 5/37
Discrepancy Types
Qu ality control: Are dates in logical
sequence?
Is header information
consistent?
Any missing visits or
pages?
Compliance & Safety: Are visits in compliance
with protocol?
Inclusion/exclusion
criteria met? Checking lab values
against normals
Comparison of values
over time
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 6/37
Validating/Cleaning Data
Data cleaning or validation is a collection of
activities used to assure validity & accuracy
of data
Logical & Statistical checks to detect
impossible values due to data entry errors
coding
inconsistent data
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 7/37
Who Cleans the Database?
Data Management Plan
through SOPs
clearly defines tasks
& responsibilities
involved in database
cleaning
Please see
SOPs« who is
Responsible for
Cleaning!
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 8/37
Cleaning Data
Activities include addressing discrepancies by:
Manual review of data and CR
Fs by clinicalteam or data management
Computerized checks of data by the clinical
data management system: Validation/discrepancy designing and build-up
into the database
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 9/37
Making sure that raw data were accuratelyentered into a computer-readable file
Checking that character variables contain only
valid values
Checking that numeric values are withinpredetermined ranges
Checking for & eliminating duplicate data entries
Checking if there are missing values for variables
where complete data are necessary
Definition of data cleaning
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 10/37
Definition of data cleaning«contd
Checking for uniqueness of certain values,
such as subject IDs
Checking for invalid data values & invaliddate sequences
Verifying that complex multi-file (or cross
panel) rules have been followed. For e g., if
an AE of type X occurs, other data such as
concomitant medications or procedures might
be expected
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 11/37
Documents to be Followed
Protocol
Guidelines ± General & Project-Specific
SOPs Subject Flowcharts
Clean Patient Check Lists
Tracking Spreadsheets
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 12/37
Clean Data Checklist
Refers to a list of checks to be performed bydata management while cleaning database
Checklist is developed & customized as per
client specifications Provides list of checks to be performed both on
ongoing/periodic basis
towards end of study
Strict adherence to checklist prevents missingout on any of critical activities
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 13/37
Point-by-
point checks
Textual
data
Continuingevents
Query
generation
Queryintegration
Missing
data
Duplicate
data
Protocol
violation
Coding
S AE Recon
Data
consistency
Ranges
External
data
Visit
sequence
Data
Cleaning
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 14/37
Point-by-Point Checks
Refers to cross checking between CRF &database for every data point
Constitutes a ³second-check´ apart from
data entry Incorrect entries/entries missed out by
Data Entry are corrected during cleaning
Special emphasis to be given for Dates
Numerical values
Header information (including indexing)
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 15/37
Missing Data Checks
Missing responses to be queried for,
unless indicated by investigator as
not done
not available
not applicable
Validations to be programmed to flag
missing field discrepancies
Missing
Data«!!
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 16/37
Missing Page Checks
Expected pages identified during setup
of studies
Tracking reports of missing pages to be
maintained to identify
CRFs misrouted in-house
CRFs never sent from Investigator¶s site
AERecord
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 17/37
Protocol Violation Checks
Protocol adherence to be reviewed &
violations, if any, to be queried
Primary safety & efficacy endpoints to
be reviewed, to ensure protocol
compliance
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 18/37
Key protocol violations
Inclusion & exclusion criteria adherence
Age
Concomitant medications/antibiotics
Medical condition
Study drug dosing regimen adherence
Study or drug termination specifications
Switches in medications
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 19/37
Continuity of Data Checks
Refers to checking continuity of eventsthat occur
across study
across visits
Includes Adverse Events
Medications
Treatments/Procedures
Overlapping Start/Stop Dates &
Outcomes to be checked across visits
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 20/37
Continuity of Data Checks
Overlapping dates across visits:
Scenario: Per protocol, AEs are to be recorded on
Visits 1, 2 & 3
³Headache´ is recorded as follows:
Visit Start Date Stop Date Outcome
1 01-Jan-2004 12-Jan-2004 Continues
2 01-Jan-2004 12-Jan-2004 Resolved
3 20-Jan-2004 20-Jan-2004 Resolved
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 21/37
Consistency Checks
Designed to identify potential dataerrors by checking sequential order of dates
corresponding events missing data (indicated as existing elsewhere)
Involves cross checking between data
points across CRFs
within same CRF
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 22/37
Consistency ChecksCross check across different CRFs:
AE reported with action ³concomitant medication´ ( AE Record)
Ensure corresponding concomitant medication reported in
appropriate timeframe (Concomitant Medication Record)
Event Start Date Stop Date Outcome
Fever 13-Jun-2005 20-Jun-2005 Resolved
Medication Start Date Stop Date Outcome
Paracetamol 14-Jun-2005 20-Jun-2005 Stopped
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 23/37
Consistency Checks
Cross check within same CRF:
1st DCM: Report doses of antibiotics taken ³before´ intake of
first dose of study drug
2nd DCM: Report doses of antibiotics taken ³after´ intake of first
dose of study drug:
NOTE: First dose of study drug is taken on 15-May-2001
Antibiotic Dose Route Start Date Stop Date
Amoxicillin 6 mg Oral 11-May-2001 14-May-2001
Antibiotic Dose Route Start Date Stop Date
Streptomycin 7 mg IV 16-May-2001 17-May-2001
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 24/37
Coding Checks
Textual or free text data collected & reported( AEs, medications) must be coded before
they can be aggregated & used in summary
analysis
Coding consists of matching text collected onCRF to terms in a standard dictionary
Items that cannot be matched, or coded
without clarification from site
Ulcers, for example, require a location (gastric,
duodenal, mouth, foot, etc.) to be coded
code
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 25/37
Range Checks
Designed to identify statistical outliners
values that are physiologically impossible
values that are outside normal variation of
population under study
Ensure that appropriate range values areapplied For eg., ranges for WBCs can be applied either in
µpercentage¶ or in µabsolute¶
Ensure that appropriate ranges are applieddepending on whether lab used is Primary
Secondary
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 26/37
Range Checks
Cross check between Hematology record & AE record:
Hematology
Test
Date Result Normal
Range
WBC 05-Jan-2006 13,710
cells/ L/cu mm
4,300 -
10,800
cells/ L/cu
Event Start Date Stop Date Outcome
Streptococcal
infection
04-Jan-2006 07-Jan-2006 Resolved
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 27/37
External Data Checks
Ensure receipt of all required external datafrom centralized vendors:
Laboratory Data
Device Data (ECG, Bioimages)
Missing e-data records to be tracked &
requested from vendor on a periodic basis
Missing data to be noted & corresponding
values to be µre-loaded¶ by vendor
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 28/37
External Data Checks
Examples of missing data/values:
Missing collection time of blood sample Missing date of ECG
Missing location of chest radiograph
Missing systolic blood pressure Missing microbiological culture transmittal ID
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 29/37
External Data Checks
Examples of invalid data/values:
Incorrect loading of visit number
Incorrect loading of subject number
Incorrect loading of date/time of collection
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 30/37
Duplicate Data Checks
Refers to duplicate entries within a single CRF
across similar CRFs
Duplicate entries & duplicate records tobe deleted per guideline specifications
Examples:
Treatment µphysiotherapy¶ on µ30- Aug-2001¶reported twice on either same TreatmentRecord or across two different TreatmentRecords
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 31/37
Duplicate Data Checks
Examples:
Both Visit 4 & Visit 10 Blood Chemistry CRFs
(with different collection dates) are updated with
same values for all tests performed
Both µprimary¶ & µadditional¶ Medical History
CRFs at Screening are reported with same
details of abnormalities
Which one to
Retain«?
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 32/37
Textual Data Checks
All textual data to be proofread & checked
for spelling errors
Obvious mis-spellings to be corrected per
Internal Correction (as specified by
guidelines)
Common examples of textual data: Abnormalities/pre-existing conditions in Medical History
record Adverse Events
Medications/ Antibiotics
Project & study-specific data
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 33/37
Visit Sequence Checks
Sequence of visits should be reviewed
& if out of sequence, should be either
queried
corrected per Internal Correction (as per guidelines)
Either a single CRF or a group of CRFs
could be out of sequence with thatparticular visit
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 34/37
Visit Sequence Checks
Visit Visit date
1 01-Jan-2000
2 02-Jan-2000
3 03-Jan-2000
4 04-Jan-2000
Visit Vitals Record
Date of Vitals
1 01-Jan-2000
2 03-Jan-2000
3 02-Jan-2000
4 04-Jan-2000
Screening
Record Visit date
Demography 20-Feb-2006
Med. History 20-Feb-2005
Inclusion
Criteria
20-Feb-2006
AE 20-Feb-2006
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 35/37
S AE Reconciliation Checks
All S AEs reported on CRFs should be
reconciled with those reported on S AE
Reports & vice versa
Communication to be maintained with
Sponsor
Clinical Scientist
5/12/2018 Data Cleaning - slidepdf.com
http://slidepdf.com/reader/full/data-cleaning-55a4d8086337c 36/37
Questions ?