Synchronising Diversely Implemented Databases to Support Administration of Clinical Research Stuart Anderson Mark Hartswood Conrad Hughes CRISP (Clinical Research Information Systems Project) School of Informatics University of Edinburgh
Jan 18, 2016
Synchronising Diversely Implemented Databases to Support Administration of
Clinical Research Stuart AndersonMark HartswoodConrad Hughes
CRISP (Clinical Research Information Systems Project)School of Informatics
University of Edinburgh
Many administrative databases,much the same data
ResearchOrganisation
ResearchOrganisation
ResearchOrganisation
NHSR&D
Project R#30Title “A very important study”…
Project R#30Title “A very important study”…
Share the data automatically!
ResearchOrganisation
ResearchOrganisation
ResearchOrganisation
NHSR&D
Project R#30Title “A very important study”…
Project R#30Title “A very important study”…
Research organisations
• Research Organisations (ROs) in NHS Lothian administering clinical research projects:– NHS Research & Development Office– Welcome Trust Clinical Research Facility (WTCRF)– Scottish Cancer Research Network (SCRN)– Experimental Cancer Medicine Centre (ECMC)
• NHS R&D involved in all projects, at least in terms of handling approvals
Project meta-information
• Project title• Project start and end dates• Project ethics status and research approval• Project sponsors, funders and finance data• Project personnel• Sponsor and personnel contact details• Patient lists and activity records• …
Supposedly the same data, but in different databases
A CRISPy Opportunity
• We could:– Reduce data entry costs– Improve data quality – Improve awareness of activity
• ...if we find ways to share common data between databases
• Suits government “bureaucracy busting”
Options
• Looked at commercial solutions:– Some didn’t understand the complexity and
risks (e.g. rsync in two directions)– Competent-sounding ones were prohibitively
expensive (e.g. £170k per site)
• Our solution: DIY approach using free software
Harmony
• Document synchronisation framework– By Benjamin Pierce et al.:
http://www.seas.upenn.edu/~harmony • Reconciles changes made to multiple
disconnected structured documents containing the same data (or subsets thereof - the “view update” problem), e.g.– Internet browser bookmarks files– Calendar applications
• Strong theoretical approach with emphasis on provable safety: changes only propagated under well-defined circumstances
Overview of Harmony
RO1’sDocument
X
RO1’sDocument
XRO2’s
DocumentX
RO2’sDocument
X
Harmony
Log of changes
and conflicts
Log of changes
and conflicts
New Archive
New Archive
Archive(~Old X)
Archive(~Old X)
Updated DocumentX (RO1)
Updated DocumentX (RO1)
UpdatedDocumentX (RO2)
UpdatedDocumentX (RO2)
Harmony operation: Equality
Archive RO1’s Document X RO2’s Document X
<project> <rd_id>R#30</rd_id> <title>A very important study</title> <ethics>yes</ethics> </project>
<project> <rd_id>R#30</rd_id> <title>A very important study</title> <ethics>yes</ethics> </project>
<project> <rd_id>R#30</rd_id> <title>A very important study</title> <ethics>yes</ethics> </project>
After running Harmony:
New Archive New Document X (RO1) New Document X (RO2) Log
<project> <rd_id>R#30</rd_id> <title>A very important study</title> <ethics>yes</ethics> </project>
<project> <rd_id>R#30</rd_id> <title>A very important study</title> <ethics>yes</ethics> </project>
<project> <rd_id>R#30</rd_id> <title>A very important study</title> <ethics>yes</ethics> </project>
“Documents are equal”
Harmony operation: Changes
Archive RO1’s Document X RO2’s Document X
<project> <rd_id>R#30</rd_id> <title>A very important study<title> <ethics>no</ethics> </project>
<project> <rd_id>R#30</rd_id> <title>A very important study<title> <ethics>yes</ethics> </project>
<project> <rd_id>R#30</rd_id> <title>A very very important study<title> <ethics>no</ethics> </project>
After Running Harmony:New Archive New Document X (RO1) New Document X (RO2) Log
<project> <rd_id>R#30</rd_id> <title>A very very important study</title> <ethics>yes</ethics> </project>
<project> <rd_id>R#30</rd_id> <title>A very very important study</title> <ethics>yes</ethics> </project>
<project> <rd_id>R#30</rd_id> <title> A very very important study </title> <ethics>yes</ethics> </project>
“Project R#30: title change propagated from RO2 to RO1; ethics change propagated from RO1 to RO2”
Harmony operation: Conflict
Archive RO1’s Document X RO2’s Document X
<project> <rd_id>R#30</rd_id> <title>A very important study<title> <ethics>yes</ethics> </project>
<project> <rd_id>R#30</rd_id> <title>A very unimportant study</title> <ethics>yes</ethics> </project>
<project> <rd_id>R#30</rd_id> <title>A very very important study</title> <ethics>yes</ethics> </project>
After Running Harmony:New Archive New Document X (RO1) New Document X (RO2) Log
<project> <rd_id>R#30</rd_id> <title>A very important study</title> <ethics>yes</ethics> </project>
<project> <rd_id>R#30</rd_id> <title>A very unimportant study</title> <ethics>yes</ethics> </project>
<project> <rd_id>R#30</rd_id> <title>A very very important study</title> <ethics>yes</ethics></project>
“Project R#30 conflict over title fields – R#30 title changes not propagated”
Conflicts
Id Field NHS R&D WTCRF
E06377 Project titleGlobal Registry of Acute Coronary Events (GRACE)
GRACE: Global Registry of Acute Coronary Events
E01058 Project titlePROCARDIS (Precocious Coronary Artery Disease) study
PROCARDIS - Precocious coronary artery disease study - a study to identify inherited causes of heart disease
E01033 Project titleVITATOPS: a randomised controlled trial of vitamins to prevent stroke.
Vitatops: A randomised controlled trial of vitamins to prevent sroke.
600 pre-roll-out conflicts to resolve; these examples are fairly trivial
Provenance issues?
• Trust
• Alignment
• Form and meaning
• Authority
• Control
Trust
• Organisations are allowing other participants to write to their databases
• Do you trust them?– Alignment of goals– Need to establish confidence in each other’s
procedures and practices– Established through regular meetings– Others might know more than you do
Alignment: record identity
• Need to identify which records in different databases refer to the same project, funding body or person
• Use R&D Number, assigned by NHS R&D, for projects– Creation complicated because projects may initially
be entered (without R&D#) by ROs– Deletion complicated because some projects may
leave scope but no projects should really be deleted• Funding bodies and persons are handled more
loosely– Identity and duplication less critical here
Establishing identity
Syncing two database tables
SK SK
Database 1 Database 2
7
3
79
9
3
Unique Shared Keys identify records across databases
Synchronising tables across two databases depends on having a unique shared key. This value has to be guaranteed to be unique within each table, and to identify corresponding records uniquely across databases.
Do they have the same meaning?
• Start/end date– Approval? Funding? Recruitment? Analysis?– Often driven by reporting requirements
• Some fields too contentious, not useful to share, so not included in sync– Option to synchronise separate meanings as separate
fields
• Get parties to agree on common meanings– Valuable communications exercise among
participants
Shared meaning = shared form?
• Field types/sizes• Field values
– N/A na None No Pending– Funder classification varies from DB to DB
• Personnel roles– One column per role or one row per role?
• Some adjustment and convergence possible to participants’ databases
• Transform data to “standard” on export/import
Authority
• Harmony is symmetric: no peer to a sync gets priority• Some information should only be sourced by R&D
(responsible for approvals)• Some information is best sourced by ROs (personnel,
funding)• But:
– Databases involved don’t record sources of information– Strict rules impair usability and make for an unpopular (and
unused) system
• Solution:– Emphasise audit over control– But provide limited inter-site control at data import stage
Control
• Each database contains organisation-specific (and private) information
• Some content is just irrelevant to others
• Some patient data!
• Solution: import/export script run locally by each organisation only exports a chosen subset of tables, rows and columns
Benefits
• Data only entered once for all
• Everyone takes responsibility for data they’re most expert in
• Disagreement (“conflict”) is permitted, and may be resolved through human-human communication
• Limited (inter-site) audit operating so expect/hope for responsible behaviour
Conclusion
• Real data synchronisation application has been far from the theoretical ideal– Issues of alignment, scope, identity, policy, trust, data
quality, form and meaning
• Solutions to problems encountered aren’t just technical: organisational engagement and trust have been essential in keeping the task tractable
• Rolling out now, so reality yet to be seen– Depends on fair balance of effort and reward among
participants