A Day in the Life of an RDF Curator Josephine Anne Gough Information Architect Team Lead F. Hoffmann-La Roche Ltd Switzerland About Work on the Roche Global Data Standards Repository
A Day in the Life of an RDF Curator Josephine Anne Gough Information Architect Team Lead F. Hoffmann-La Roche Ltd Switzerland
About Work on the Roche Global Data Standards Repository
Who am I ?
• My name is Amy, Didier, Robin, Ivan
• I am an Information Architect working in the Roche Data Standards Office
• I used to work in Data Management / Statistical Programming
• I found out about RDF and Semantic technology from the GDSR
• I took a leap of faith and changed my career direction to work in RDF
What is RDF ?
• RDF is a really cool technology
• It’s been around for 15 years
• Resource Description framework means we can describe all of our Data Standards in abstract models and then fill those models with content
• RDF means we can really describe our standards in a machine readable format
• RDF means we can uniquely identify things
• RDF means that we can link pieces of information and even whole models together
• RDF….is just really cool when you get to know it…you get hooked.
Demo of RDF
• Short demo of RDF in TopBraid to show linking of information through an RDF network.
What is the GDSR?
• The GDSR is Roche’s RDF Dataset where all Data Standards are defined
• GDSR stands for Global Data Standards Repository
• The GDSR holds CDISC, Roche and other standards such as Questionnaires, Biomarkers, Lab data.
• It has a browser and web services….but I’ll get onto that later
• It is accessed by anyone in Roche who needs information or download products about Clinical Data Standards
• It is available 365/24/7 to the business
• Our job is basically to look after the content and exports from the GDSR
Roche GDSR
GDSR Menu
Data Collection homepage
SDTM VS Domain
CDISC Controlled Terminology
Labs filtering for Albumin
Asthma Questionnaire
GDSR Search
Today
• My boss “Cheffe” keeps me really busy
• But she’s nice
She makes sure that training is part of the job
and boy is there a lot to learn !
TRAINING
Technical Training
SPARQL SPARQL Protocol and RDF Query Language
RDF Resource Description Framework
HTTP REST Representational State Transfer XML Extensible Markup Language
URIs Uniform Resource Identifier
RDFS RDF Schema
XSLT Extensible Stylesheet Language
OWL Web Ontology Language
SKOS Simple Knowledge organisation System
Tools Training
TopBraid RDF development
Graph Validation
oXygen XML editor XSLT development
Graph Adminsitration
GDSR Browser
GDSR
Administering GDSR Search
GDSR UI Models
Content Training…so glad I knew some of these already !
Roche Data Collection
SDTM
Roche extensions to CT Roche EDC
ISO 11179
Roche extensions to SDTM
Questionnaires ePRO
CDISC Controlled Terminology
ADaM
Roche Data Analysis
Biomedial Concepts
Other skills
User Requirements
Documentation UAT
Customer Facing
Validation and Testing
GDSR Overall Architecture
W3C Semantic Standards RDF - OWL - SKOS
ISO 11179 for Metadata Registries
CDISC Foundational Standards
Sponsor Extensions
Protocol Submission DC DT DA �
HTTP REST Application Services
Overwhelming at first!
• A lot to learn
• Business needs to be aware of the learning curve
• Continuous reading/training program
• Simple steps
• Progression to more complex parts
• Each step is absorbing and interesting
• Keep the training wheel turning
My job broadly splits into three
• Day to day changes to the GDSR metadata
• Product development
• Modeling
CHANGES TO METADATA
Daily content changes - Jira requests
• A user sends us a request to change metadata via a JIRA ticket system.
• Today I’ve got a bunch of changes for example:
• They want to change the CRF label on one of the Medical History Forms
• The SDTM group have some new annotations
• There is a new Questionnaire to be uploaded.
Example Jira request to IA team
Which graph do I change ?
Changing the CRF field label for Medical History in TopBraid
Admin Panel to load content into the GDSR
Medical History Form in GDSR browser
Adding new standards: bulk uploads example CRF
PRODUCT DEVELOPMENT
Product development
1. Web Services give XML
2. Transform XML into other formats using XSLT
Product development oXygen
Product for SAS programmers: Lab unit conversions in csv
GDSR Export Products in Production Today
• Operational CRFs in pdf
• SDTM and Codelists in Excel
• Lab Analytes in excel and csv
• Questionnaires in excel and csv
• RAVE Medidata ALS (EDC build)
• PK test codes in csv and excel format
Operational CRF
Product downloads from the GDSR
• Demo of SDTM spreadsheet export if time
GDSR Export Products in progress
• Non-CRF contracts with standard specifications
• Conformance checks
• Data Analysis TLGs
• Data Analysis VAD specifications
• eSubmission
eSubmission CRF
MODELING
RDF Modelling
• Currently Models separated logically by domain but all Linked Data
• 40+ existing models in GDSR
• New content can require new models and/or extensions
• Team has learnt from scratch
How to learn to modeling in RDF ?
RDF Apprenticeship
Know the domain
Explore existing models
Make extensions to
existing models.
Experiment designing new
models
Guidance and review from experienced
modelers
Iterate, review with peers, try
out
Modeling: Case history in apprenticeship
1. Make changes to existing Data Collection models via jira
2. Create a whole new CRF for a new TA
3. Learn and explore CDISC SDTM models and code lists
4. See if same model could be applied to non-CRF data
5. Do gap analysis of use cases between SDTM and non CRF
6. Propose any new Classes or Predicates for the model
7. Present ideas to team – “Live” modeling workshops
8. Finalise and fill
RDF at the heart of….
RDF gives us the reality (finally) that we can really link together:
• Standards to Standards (Via Models and Linked Data)
• Standards to Governance (in an MDR with Admin and Validation)
• Standards to User Interface (configurable Browser and Search)
• Standards to Web Services (configurable schema downloads in XML)
• Standards to exports (word, PDF, Excel etc)
Thanks go to the Roche IA Team:
• Amy Klopman (SDTM Models, SOA Models)
• Robin Köger (our XSLT and Web Service goto, Data Analysis Models)
• Didier Clement (Data Collection Models, non CRF and Biomarker Models)
• Ivan Robinson (UI Models, Controlled Terminology Models)
Special thanks for our most interesting jobs goes to:
Frederik Malfait creator of the GDSR
Doing now what patients need next