Top Banner
Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics – training course—data management series offered by the Regional East African Centre for Health Informatics (REACH-Informatics) in Eldoret, Kenya. Funding was made possible by NIH’s Fogarty Center. The training was held at the Academic Model Providing Access to Healthcare (AMPATH) , a USAID- funded program, supported by the Regenstrief Institute at Indiana University. The moduleswere created in collaboration with the School of Informatics at IUPUI. Creative Commons Attribution-ShareAlike 3.0 Unported License
13

Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Dec 26, 2015

Download

Documents

Vivien Dawson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Data Quality Data Cleaning

Beverly Musick, M.S.May 20, 2010

1

This module was recorded at the health informatics –training course—data management series offered by the Regional East African Centre for Health Informatics (REACH-Informatics) in Eldoret, Kenya. Funding was made possible by NIH’s Fogarty Center. The training was held at the Academic Model Providing Access to Healthcare (AMPATH) , a USAID-funded program, supported by the Regenstrief Institute at Indiana University. The moduleswere created in collaboration with the School of Informatics at IUPUI.

Creative Commons Attribution-ShareAlike 3.0 Unported License

Page 2: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Quality Control

• Quality Control is the process of monitoring and maintaining the reliability, accuracy, and completeness of the data during the conduct of the project.

• Requires a multidisciplinary team which includes clinicians, data entry staff, statisticians, systems administrations, and data managers.

• Requires sharing knowledge about disease progression, clinical practice patterns, effects of medical treatments, relationships between variables and expected timing of events.

2

Page 3: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Ensuring Data Quality

• Point of Assessment –Collection: review form before patient

leaves the clinic–Entry: range restrictions, logical checks–Post-entry clean-up queries–Statistical Analysis: data trends

3

Page 4: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Ensuring Data Quality (cont.)

• To ensure data quality the data manager needs to understand:– Goals of program– Standards of operation– Impact of intervention or program– Relationships between variables– Expected timing of events

4

Page 5: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Clean-up Queries

Missing Data• Generate reports regarding the percent of

missing data for each item on the data collection forms

• Highlight differences between programs or specific groups of patients in order to identify methods to minimize missing data

5

Page 6: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Date Comparison • Ensure that the date of birth precedes all

other dates.• Calculate age and verify that the date of birth

makes sense.• For patients who have died, ensure that the

date of death follows all other dates.

6

Clean-up Queries

Page 7: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Date Comparison (cont.)• Generate a clean-up list for observation dates

that are after today’s date or, preferably, the date of data entry.

• Generate a similar list for observation dates that precede the date of inception of your program.

• Examine the interval between observation/visit dates to ensure that the expected time frame is reflected.

7

Clean-up Queries

Page 8: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Checks on Numeric Data • Confirm all values are within the expected

range.• Investigate possible outliers by verifying

against source document, comparing with other values for same subject, or cross-referencing with other variables such as current illnesses in the case of elevated lab result

• Confirm that values make sense with respect to patient’s age, gender, disease status, etc.

8

Clean-up Queries

Page 9: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Checks on Adult Heights/Weights

• Calculate BMI from height and weight (BMI=weight (kg) / height(m))

• Most should be between 10 and 40

• Flag unexpected weight fluctuations

9

Clean-up Queries

Page 10: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Checks on Pediatric Heights/Weights• Calculate weight-for-age Z-scores using Epi

Info NutStat software (http://www.cdc.gov/epiinfo/) or SAS software (http://www.cdc.gov/nccdphp/dnpao/growthcharts/resources/sas.htm)

• Review date of birth, visit date, age and weight for Z-scores less than -5 or greater than 5.

• Similar checks can be made with height-for-age and weight-for-height Z-scores.

10

Clean-up Queries

Page 11: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Checks on Numeric Data (cont.)• Review longitudinal data. • If special missing values are coded, ensure

that the codes do not overlap with valid data.• For lab results, a qualifier such as < or >

should be stored in a separate variable.

11

Clean-up Queries

Page 12: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.

Cross-Variable Checks• Confirm that there is consistency between

gender and other variables such as pregnancy. • Look for contraindicated medication

combinations.• Look for data that may have been recorded

under the wrong patient ID.

12

Clean-up Queries

Page 13: Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 1 This module was recorded at the health informatics –training course— data management series.