© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 1 Octagon Research Solutions, Inc. Leading the Electronic Transformation of Clinical R&D © 2009 Octagon Research Solutions, Inc. All Rights Reserved.
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.1
Octagon Research Solutions, Inc.Leading the Electronic Transformation of Clinical R&D
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.2
Data Profiling
Octagon Research Solutions
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.3
Metadata Profiling
• Metadata (structure)– Likeness of nomenclature among study
databases– Answer some planning questions:
• Claim: “The studies are 90% identical.” Are they?
• If they indeed are, can you to create pool(s) of source data to gain efficiency?
Not our main focus today
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.4
Data Profiling
• Data (content)– Statistics, e.g., min, max, average– Relationship– PatternFact: Data are often “bad, worse, or ugly”
Goal: Get a realistic pulse on quality of the data
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.5
Case Study(“Slightly” Altered for Illustration Purposes)
• Background– Central lab, i.e., eDT
• CHEM for biochemistry (20807 records), along with 4 other labs
– No annotated CRF• Mapping document initially authored using
variable label
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.6
Case Study (con’t)
• Sponsor decisions:– Match standard results with original results,
i.e., no unit conversion; therefore, LBSTRSC = LBORRES
– LPARM to (LBTEST and LBTESTCD) will be done through a sponsor-supplied lookup table
Easy enough, right?
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.8
Case Study (con’t)
• Programmer noticed errors– LBSTRESN is a numeric variable, but
CHEM.LVALUE contains non-numeric data
• Programmer determined the mapping specifications document is not detailed enough, began to involve the analyst
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.9
Case Study (con’t)
• Let’s look some options at their disposal (novice to veteran):– SAS System Viewer– A creative method by an Excel-savvy– SAS PROC FREQ
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.10
Case Study (con’t)
• SAS System Viewer– Read-only, great for displaying data– Unreliable as a data browser
• Analyze data in Excel– Very manual– Changes of data ownership, possible “lost in
translations”?• “Smart” behaviors, e.g., “01JAN2009 12:00” to “1/1/2009
12:00:00 PM”, auto-trimming, etc
• SAS PROC FREQ– CHEM.LVALUE: 20807 records reduced to
1237 unique values
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.11
Case Study (con’t)
• 4th option– A data pattern analyzer
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.12
Case Study (con’t)
– Reduced 20807 records to only 11 patterns
Aha, we found the needle in the haystack! 0.3% of LVAULE is not numeric.
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.13
Case Study (con’t)
– Drilled down to the actual values with non-numeric data patterns
Through issue/resolution with the sponsor, addeddetailed instructions for LVALUE to accommodatethe non-numeric values
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.15
Another Data Pattern Example #1
• Source: Character variable AEV.STOP (AE stop date), being mapped to AE
• Realized source is “somewhat” a free-form field– Critical data point, must
handle case-by-case using regular expression (regex) technique
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.16
Another Data Pattern Example #2
• Source: Character variable DOSE.DOSE_ACT (Actual dose), being mapped to EX
• Realized source does not always contain numbers– Used both
EX.EXDOSE and EX.EXDOSTXT
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.17
Wrapping Up
• Integrated data profiling – a tool demo
• The bigger picture:– Data rules (e.g., pre-defined business
rules, data standards, etc)– Data corrections
• Although ETL is a solution platform for CDISC SDTM data conversion, too much of it is symptom of a problem
© 2009 Octagon Research Solutions, Inc. All Rights Reserved.18
Thank you!
Anthony Chow
(610) 535-6500 x5526