SDTM Validation: Methodologies and Tools Bay Area CDISC Implementation Network Meeting Friday, April 30 th , 2010 Dan Shiu
SDTM Validation: Methodologies and Tools
Bay Area CDISC
Implementation Network
Meeting
Friday, April 30th, 2010
Dan Shiu
Disclaimer
The ideas and examples presented here do NOT imply:– They have been or will be implemented at Amgen
– They have not been or will not be implemented at Amgen
– Amgen agrees or disagrees with them
The ideas and examples presented here DO represent:– My personal views
– My sweat and blood
Regulations, Guidance, and Expectations on SDTM Validation
FDA 21 CFR Part 11 applies to computer systems (e.g.
Base SAS) but not to use/output of the systems (e.g.
SAS programs/datasets)
FDA Guidance for Industry: Study Data Specifications
for electronic submission – data tabulation datasets
should follow SDTMIG
FDA website: SDTM Validation Specifications –
validation checks from FDA software tools (Janus)
Data submitted to regulatory agency is expected to be
complete and accurate, regardless of the regulatory
requirement
SDTM Validation Categories
SDTM Mapping Validation
– Raw Data → Mapping Specifications/aCRF → Programming
→ SDTM Data
– Verify raw data is CORRECTLY and TRUTHFULLY converted
to SDTM data
SDTM Compliance Checks
– Rules have been developed to ensure the software used by
FDA (WebSDM™ by PhaseForward) can check and load the
submitted SDTM data into their data warehouse (Janus)
– Each rule carries a degree of severity for non-compliance – in
the worst case may result in refusal to file
SDTM Mapping Validation vs. Compliance Checks
SDTM
Mapping
Validation
SDTM
Compliance
Checks
The QS domain is not intended for use in submitting diaries capturing routine study data
Measurement, Test, or Examination values must have consistent standard unit value (--STRESU) across all records in EG, LB, QS, VS
Start Date/Time of Observation (--STDTC) must be less than or equal to End Date/Time of Observation (--ENDTC)
SDTM Validation Methodologies
SDTM Mapping Validation
– Full Independent-programming
– Risk-based QC Process
– Characteristics-based QC Process
SDTM Compliance Checks
– WebSDM (v1.5/v2.6/v3.0)
– Janus (v1.0 Draft)
– Other SDTMIG custom checks
Full Independent-Programming
Create SDTM mapping specifications/aCRFs
Programmer creates production SDTM datasets based on mapping specifications/aCRFs
QC role creates QC SDTM datasets based on the same mapping specifications/aCRFs
PROC COMPARE production vs. QC SDTM datasets
Resolve discrepancies until production SDTM matches with QC SDTM
Issues with Full Independent-Programming
Result is still dependent and biased
Inconsistent QC process across
products/studies/milestones
QC not based on risk – spend more time on
less important/risky issues
Double resources – programmers, codes,
datasets, documentation
Inefficiency – delayed deliverables
Risk-based QC
Not all uses of SDTM data are equally
important
Not all programming steps are equally error-
prone
Align QC efforts with the intended use of
SDTM as well as the programming steps used
to produce data
Spend most of your QC resources on data with
the greatest business/quality risk!
Risk-based QC Concept
Risk Assessment Examples –Complexity
Programming Complexity
Low - No pooling or merging of data
- No calculations or derivations
- Basic data steps and sorting
Medium - Simple data merges
- Simple pre-processing of data, sub-setting, where/if
clauses, retains, arrays, transposing
- Steps involving validated/standard macros
High - Complex merging data across various source data
- Complex derivation and calculation of data
Risk Assessment Examples –Intended Use
Intended Use of SDTM Data
Low - Internal use only
- Not to be used for major business decisions
Medium - Data/safety review
- Non-endpoint data
High - Regulatory submission
- Primary analysis/final CSR
- Endpoint safety and efficacy data
Risk-based QC Method Examples
Method Responsibility Time Needed
Log Review – use automated
log checking utility to detect
potential errors
Programmer, QC Role Short
Code Review – line-by-line
review of code and log
QC Role, designated
groupMedium
Requirements/Specifications
Review – comparison of SDTM
data with specifications/aCRF
Programmer, QC Role,
StatisticianMedium
Spot Check Review – ad hoc
programming/visual checks on
SDTM/raw data
QC Role, Statistician Medium
Independent Programming –
programming to produce
matching datasets
QC Role Long
Risk Matrix Examples
High 1. Log Review
2. Requirements/
Specifications
Review
3. Code Review
1. Log Review
2. Requirements/
Specifications Review
3. Spot Check Review
4. Code Review
1. Log Review
2. Requirements/
Specifications Review
3. Independent Programming
Medium 1. Log Review
2. Requirements/
Specifications
Review
1. Log Review
2. Requirements/
Specifications Review
3. Spot Check Review
1. Log Review
2. Requirements/
Specifications Review
3. Spot Check Review
4. Code Review
Low 1. Log Review
2. Requirements/
Specifications
Review
1. Log Review
2. Requirements/
Specifications Review
3. Spot Check Review
1. Log Review
2. Requirements/
Specifications Review
3. Spot Check Review
Low Medium High
Co
mp
lexity
of P
rog
ram
Intended Use (Business Risk/Impact of Error)
Characteristics-based QC
SDTM Mapping Validation:
Full Independent-programming Risk-based QC
Raw Data Mapping Specifications / aCRF / Programming
SDTM Data
Are these the best ways?
Characteristics-based QC Concept
Each data element has characteristics
Characteristics describe a data element as whole
If all characteristics match, data elements match
If all data elements match, raw data is CORRECTLY and TRUTHFULLY converted to SDTM
"Grandma, what big eyes you have!”
“Grandma what big ears you have!“
“Grandma what big teeth you have!"
Data Element Examples
Data Element: a group of data, regardless of datasets, variables, records, attributes, that together represent a precise meaning or semantics
– CDISC SHARE Project: The vision for CDISC SHARE is to build a global, accessible electronic library, which through advanced technology, enables precise and standardized data element definitions that can be used in applications and studies to improve biomedical research and its link with healthcare.
Age Element: USUBJID, AGE
Race Element: USUBJID, RACE, SUPPDM.QNAM=“RACEOTH”, QVAL
AE Term Element: USUBJID, AETERM, AEDECOD
SF36 Score Element: USUBJID, QSCAT=“SF36”, QSORRES, QSSTRESC, QSSTRESN, QSSTAT, QSREASND
Data Element Characteristics
Numeric Characteristics – Descriptive Statistics: can be generated from PROC SUMMARY, PROC MEANS, PROC UNIVARIATE
– N, NMISS, MIN, MAX, MEAN, MODE
– SUM, RANGE, VAR, STD, STDMEAN
– Coefficient of Variation, Skewness, Kurtosis
Character Characteristics– FREQ, NOBS, min/max length
– Checksum: e.g. odd parity bit – a simplified algorithm “Pain”=01010000011000010110100101101110
Count the number of 1s 14
To keep odd parity pit, add 1 to 14 checksum=1
If all checksums match all character values match
If statistics of all checksums match all character values match
Characteristics-based QC Examples
QC on Age Element– From raw data: demog.age_raw
– From SDTM: DM.AGE
– Compare: N, MIN, MAX, MEAN, MODE, SUM, STD
QC on AE Term Element– From raw data: adverse.subjectid, adverse.aevt, adverse.aept
– From SDTM: AE.USUBJID, AE.AETERM, AE.AEDECOD
– Compare: FREQ, NOBS, min/max length, checksum
QC on SF36 Score Element: – From raw data: sf36.subjectid, sf36.score_raw, sf36.cmt
– From SDTM: QS.USUBJID, QS.QSCAT=“SF36”, QS.QSORRES, QS.QSSTRESC, QS.QSSTRESN, QS.QSSTAT, QS.QSREASND
– Compare numeric: N, NMISS, MIN, MAX, MEAN, MODE, SUM, STD, RANGE
– Compare character: FREQ, NOBS, min/max length
Characteristics-based QC Benefits
Data element characteristics exist as soon as
data is created/refreshed
Characteristics-based QC is an extension of
risk-based QC in a more consistent way
Characteristics-based QC can be applied to all
end-to-end data conversion processes (e.g.
raw to SDTM, SDTM to ADaM)
Characteristics-based QC can be automated!
SDTM Compliance Checks
Raw Data SDTM
Mapping Validation
SDTM Validation and Loading at FDA
FDA Electronic Document RoomJANUS Data Repository
WebSDM
ChecksJANUS Checks
Sponsor:
SDTM
Define.xml
eCTD
FDA Review Tools:
JMP
J-Review
WebSDM
Etc.
Electronic Submission
Data Validation and
Loading
Communication /
Refuse to File
Pass
Communication
Review
Pass
WebSDM v3.0 Checks
154 rules based on SDTMIG 3.1.2
Checks apply to data (classes, domains, variables, values) and metadata (define.xml, SDTM Terminology.xls)
Severity (Low, Medium, High) is only an indicator of potential problems or anomalies in the data. There is no direct correlation between a severity value and a FDA decision about whether the data is acceptable for review or not.
Janus v1.0 (Draft) Checks
109 rules based on SDTMIG 3.1.1
Overlap with WebSDM rules but with different
definition of the severity levels
Severity Description
High The error is serious and will prevent the study data from being loaded
successfully into the Janus repository. The SDTM study will not be loaded
into the Janus repository.
Medium The error may impact the reviewability of the submission, but will not have
an impact on loading the study data into the Janus repository. The SDTM
study will be loaded into the Janus repository.
Low The error may or may not impact the reviewability or the integrity of the
submission but will not have an impact on loading the study data into the
Janus repository. The SDTM study will be loaded into the Janus repository.
WebSDM vs. Janus – Severity
WebSDM and Janus
may assign different
severity levels for the
same rule
Custom SDTMIG Compliance Checks
WebSDM/Janus checks cannot cover all of the
explicit/implicit rules in SDTMIG:
– 8/40/200 character limitation check
– USUBJID value must be unique for each subject
across all trials in the submission
– IDVAR (variable), IDVARVAL (record) reference
check against parent domain for CO
– ISO 8601 format check on Duration, Elapsed Time,
and Interval values
– And many more ……
Tools for SDTM Compliance Checks
Proprietary Software: WebSDM™ from Phase Forward, …., etc.
Free Software:– OpenCDISC Validator
Direct-download and installation on PC
Graphic user interface
Reporting in Excel, CSV, and HTML
– SAS Clinical Standards Toolkit PC/UNIX installation support from IT
Interactive/Batch SAS programming interface
Reporting functions not provided but can be custom-built
SAS CST is a framework including:– Directory structure
– Metadata: datasets, format catalog, XML, Excel
– Data: datasets, format catalog, XML, Excel
– Source code: SAS programs/macros
Tools Comparison
OpenCDISC Validator SAS Clinical Standards Toolkit
Installation User direct-download
PC/USB flash drive, tweak on UNIX
IT/SAS administrator support
PC (9.1.3/9.2) and UNIX (9.2)
Interface Graphic user interface Interactive/Batch SAS programming
interface
Supported
Standards /
Features
Validate SDTMIG 3.1.1/3.1.2 based on
WebSDM v3/Janus v1 draft
Additional custom checks
CDISC-NCI Terminology
Generate/Validate define.xml based on
CRTDDS v1
Validate SDTMIG 3.1.1 based on
WebSDM v2.6/Janus v1 draft
Additional custom checks
CDISC-NCI Terminology
Generate/Validate define.xml based
CRTDDS v1
Reporting Excel/CSV/HTML reports
Can only limit number of occurrence per
rule
WebSDM/Janus rule ID on website but
not on reports
Severity levels follow Janus
Results in SAS datasets
Can limit number of occurrence per
rule/dataset/actual value
WebSDM/Janus ID in results
Severity levels follow
WebSDM/Janus
Tools Comparison (Cont’d)
OpenCDISC Validator SAS Clinical Standards Toolkit
Processing Real memory
Check on SAS transport XPT or other
delimited text files
Disk and real memory
Redundant processing steps
Check on SAS datasets
Performance Fair (hours) for small studies but
potential memory crash for large
studies
To be improved (1+ day)
Maintenance Open XML code for configuration
Open Java code on website
Standard/Custom metadata in
XML/Excel
Open source SAS code/configuration
Standard/Custom metadata in SAS datasets
Flexibility Need XML/Java expertise for any
customization/enhancement
Select/Deselect rules to check in SAS code
Build custom checks with SAS code
Build graphic user interface in SAS/Excel
Documentation Website Instructions Installation Instructions
IQ/OQ document
Examples/Exercises
User’s Guide
Technical Support Website forum SAS technical support from phone/email/website
References and Contact
FDA Guidance for Industry, Part 11, Electronic Records; Electronic Signatures – Scope and Application http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm072322.pdf
FDA Guidance for Industry, Study Data Specifications (v1.5.1): http://www.fda.gov/downloads/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/ElectronicSubmissions/UCM199759.pdf
WebSDM Checks: http://www.phaseforward.com/products/cdisc
Janus Checks: http://www.fda.gov/ForIndustry/DataStandards/StudyDataStandards/ucm155327.htm
OpenCDISC Validator: http://www.opencdisc.org
SAS Clinical Standards Toolkit: http://ftp.sas.com/techsup/download/hotfix/12clintlkt.html
Contact Information: [email protected]