Semantic Interoperability for Health Network Deliverable 4.3: Ontology / Information models covering the public health use cases [Version 1, March 12, 2014] Call: FP7-ICT-2011-7 Grant agreement for: Network of Excellence (NoE) Project acronym: SemanticHealthNet Project full title: Semantic Interoperability for Health Network Grant agreement no.: 288408 Budget: 3.222.380 EURO Funding: 2.945.364 EURO Start: 01.12.2011 - End: 31.05.2015 Website: www.semantichealthnet.eu Coordinators: The SemanticHealthNet pro- ject is partially funded by the European Commission.
44
Embed
Semantic Interoperability for Health Network - i~HD D4_3... · 2014-05-15 · SemanticHealthNet (SHN) faces the challenge of improving semantic interoperability of clinical in-formation.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Semantic Interoperability for Health Network
Deliverable 4.3: Ontology / Information models covering the public health use cases
[Version 1, March 12, 2014]
Call: FP7-ICT-2011-7
Grant agreement for: Network of Excellence (NoE)
Project acronym: SemanticHealthNet
Project full title: Semantic Interoperability for Health Network
Grant agreement no.: 288408
Budget: 3.222.380 EURO
Funding: 2.945.364 EURO
Start: 01.12.2011 - End: 31.05.2015
Website: www.semantichealthnet.eu
Coordinators:
The SemanticHealthNet pro-ject is partially funded by the European Commission.
D4.3 Ontology / Information models covering the public health use cases Page 2 of 44
Document description
Deliverable: D4.3
Publishable summary:
SemanticHealthNet (SHN) assumes that several standards and proprietary imple-mentations for representing the content of electronic health records (EHRs) will co-exist for a long time. Thus, the project focuses on providing an integrative semantic abstraction on top of them that is able to act as mediator. As practical exemplars, SHN has set the focus on chronic heart failure and cardiovascular prevention, which drives the development of semantic resources by the work package four (WP4). In the two WP4 previous deliverables, we provided the basis of the interoperability approach proposed: an ontological framework and an initial set of semantic pat-terns obtained by following a bottom-up approach. As a result of our second deliv-erable (4.2) we created the Heart Failure Summary (HFS), a minimal dataset that contains essential information in order to optimize heart failure management. In this deliverable, we describe an extension of the underlying semantic architec-ture and we focus on the semantic interoperability challenges that exist when clini-cal practice data (e.g. EHRs) are used by public health systems. We exemplify it by extending the HFS with two risks factors of cardiovascular diseases (CVDs), which are of interest for public health studies, viz. tobacco and alcohol use. Thus, the HFS should be understood as a global resource that can be used by both stakeholder communities. A dedicated instrument (e.g. a questionnaire) for public health re-porting on heart failure would be very similar to the HFS, from a modelling point of view. Public health use cases require the access to heterogeneous information sources that include data described at different level of detail and generated within heter-ogeneous contexts (e.g. smoking cessation clinic record vs. primary care record) and attending at different requirements. We have demonstrated how semantic patterns can be used to improve semantic interoperability across them. Semantic patterns are based on a reference ontology model (i.e. SHN ontological framework) and facilitate the mapping of clinical model data into their semantic representation. Some query exemplars have been provided in chapter 4 in order to demonstrate: (1) that data from heterogeneous models can be homogeneously queried and (2) data can be retrieved within their context which help public health systems to interpret them right. There are also limits to semantic interoperability when codes such as Ex-smoker are used for referring both to someone who quit last month but had smoked for 30 years, and to a person who quit 20 years ago after having smoked two years only. Thus, there are limits to comparability and semantic interoperability that are fun-damental to differences in clinical processes and facts about the world that cannot be resolved at the level of information systems. At best, the degree of uncertain-ty/vagueness can be estimated.
D4.3 Ontology / Information models covering the public health use cases Page 3 of 44
Table of contents
Table of contents ..................................................................................................................................... 3
1 Introduction and objectives ............................................................................................................ 4
4.2 Description of the models ..................................................................................................... 31
4.3 Mapping of clinical model data to their semantic representation by using semantic patterns
as bridge ............................................................................................................................................ 35
4.4 Homogeneous query of data from the three clinical models: Tobacco Use (openEHR),
Meaningful Use (HL7 C-CDA) and Tobacco Detailed Use (DCM- HL7v3) .......................................... 40
5 Summary and conclusions ............................................................................................................. 43
D4.3 Ontology / Information models covering the public health use cases Page 4 of 44
1 Introduction and objectives
1.1 Background
SemanticHealthNet (SHN) faces the challenge of improving semantic interoperability of clinical in-
formation. It assumes that several standards and proprietary implementations for representing the
content of electronic health records (EHRs) will co-exist for a long time. Thus, the project focuses on
providing an integrative semantic abstraction on top of them, representing a homogeneous view that
is able to mediate across the heterogeneous underlying representations.
The approach targets the whole range of health-related information about all medical domains. As
practical exemplars, SHN has set the focus on chronic heart failure and cardiovascular prevention,
which drives the development of semantic resources by the work package four (WP4).
The two previous deliverables had two main focuses. Firstly, to clarify the distinction between infor-
mation entities and clinical entities in order to know what things have to be represented by infor-
mation models and by ontologies (with focus on SNOMED CT, as an ontology-based clinical terminol-
ogy). As a result we proposed an ontology engineering approach based on a top-level ontology, de-
scription logics and Semantic Web standards and investigated how the semantic equivalence of iso-
semantic expressions could be ascertained when this distinction was properly done. The second fo-
cus was the application of this approach to the semantic representation of the Heart Failure Sum-
mary (HFS), a minimal dataset that contains essential information in order to optimize heart failure
management. As a result, repetitive modelling issues were identified and a set of semantic patterns
were provided by following a bottom-up approach.
In this deliverable, we address the semantic interoperability challenges that exist when clinical prac-
tice data (e.g. EHRs) are used by public health systems. We exemplify it by focusing on two risks fac-
tors of cardiovascular diseases (CVDs), which are of interest for public health studies, viz. tobacco
and alcohol use. Public health and clinical practice requirements are not the same; and EHR data are
usually not appropriate for many public health tasks. The Heart Failure Summary should be under-
stood as a global resource that can be used by both stakeholder communities. A dedicated instru-
ment (e.g. a questionnaire) for public health reporting on heart failure would be very similar to the
HFS, from a modelling point of view. Therefore, we decided to extend the HFS with detail infor-
mation about both risk factors, which will be described in the following. Furthermore, we describe an
extension of the underlying semantic architecture that will be applied to the public health use case.
1.2 Objectives
The goal of this work is to demonstrate that the semantic interoperability solution proposed can be
applied to improve semantic interoperability between public health systems and clinical practice
records (EHRs). In order to do so we will extend the HFS with a set of good quality representations of
additional information required from a public health perspective. Based on this extension as working
example, we will provide and apply semantic patterns to that information in order to make it seman-
tically interoperable.
The main purpose of using semantic patterns is to facilitate the mapping of existing structured data
into an ontology-based representation, in a way that can be interpreted by computers independently
D4.3 Ontology / Information models covering the public health use cases Page 5 of 44
of the degree of granularity in which it has been provided. These patterns are based on a formal
model of meaning (reference model), constituted by a set of OWL DL ontologies under a highly con-
strained top-level ontology, which assists the modelling task.
The set of semantic patterns we create will be produced together with the formal representation of
their meaning in OWL DL. Some patterns will be instantiated with fictitious data for demonstration.
The patterns produced will be based on a subset of top-level patterns, which can be reused in differ-
ent use cases.
We will exemplify the use of semantic patterns by applying them to encode clinical data rendered
from heterogeneous sources. This will demonstrate that patient information can be homogeneously
retrieved for a particular public health use case, independently on the underlying clinical model rep-
resentation.
1.3 The public health perspective
The focus of public health is to improve health and quality of life through the prevention and treat-
ment of disease and through the promotion of healthy behaviours. While the clinical approach fo-
cuses on the treatment of individual patients, public health considers the characteristics of groups of
people. Knowledge of the characteristics of the population can be derived not only from the aggrega-
tion of clinical records but from other sources such as social care records, personal health records,
service payments, health surveys, etc. Each of these information systems attend to different infor-
mation requirements and contexts. Thus, their homogeneous access by public agencies would signifi-
cantly improve and facilitate their work. In the following, we will concentrate on public health as-
pects of cardiovascular diseases (CVD) and risks.
As reported by deliverable 2.1, there is considerable evidence that more than 50% of the recent re-
duction in CVD is due to decreases in risks factors. The CVD risk is frequently the result of multiple
interacting risk factors and is usually expressed in terms of: developing CVD (incidence); experiencing
an event (e.g. heart attack); or dying.
The European guidelines on cardiovascular disease prevention (ESC)1 provide a set of recommenda-
tions to reduce the CVD risk factors. Many of the recommendations are related to behavioural fac-
tors such as: (i) provide smokers advice to quit and offer assistance; (ii) avoid exposure to passive
smoking; (iii) facilitate lifestyle change with cognitive-behavioural strategies; (iv) a healthy diet; (v)
reducing weight in overweight or obese patients; (vi) physical exercise; etc. Others are related to the
use of different prophylactic drug therapies such as using statins or antihypertensive treatment.
Although there is overwhelming evidence that most of the reduction in CVD is due to decrease in
risks factors, there are still major evidence gaps (e.g. how to help citizens to achieve lifestyle chang-
es) for which “real-world evidence”, i.e. outside the artificial environments of clinical trials, will be
crucial and which requires large numbers of EHRs and other applications to be analysed. The ESC
report highlights this fact and the lack of knowledge about people who do not usually participate in
clinical trials, as well as long-term outcomes of interventions.
Semantic patterns act as bridge between structured data and their semantic representation based on
the above ontological framework, and can be used to guide the mapping process between both. The
main rationale for semantic patterns is to facilitate recurring content modelling tasks. Examples of
recurrent issues are: Who does what, when and where? – What is the location of something (an ob-
ject or a process)? – What is the time frame of something, e.g. the duration of a process (like a clini-
cal situation) or the life of a material entity? – etc. In their attempt to provide clinical information
within a standardized structure that addresses the requirements of some particular use case9 seman-
tic patterns are not too different from clinical archetypes and their use in templates.
Semantic patterns are based on the above ontological framework and are employed to insulate users
– more precisely those who semantically annotate clinical models – from the underlying ontology
formalisms. Semantic patterns are characterized by the following:
They are based on the Closed World Assumption (CWA), which implies that everything we do
not know is false, while the open world assumption (OWA) states that everything we do not
know is undefined. As an example, the question “Was the heart failure caused by a heart at-
tack?” will look for that statement in our model, and since we do not have that statement,
the two systems will interpret it differently: false for a closed world approach and not com-
putable for an open world approach.
Patterns can be specialized by following a similar paradigm to the object-oriented design and
frame systems, in which the properties defined in a parent class are inherited by all its child
classes, which can further constrain them. For instance, if we specialise a diagnosis pattern
8 S. Schulz, A. Rector, J. Rodrigues, C. Chute, B. Üstün, K. Spackman, Ontology-based convergence of medical terminologies.
SNOMED CT and ICD-11. Proc. of eHealth2012. Vienna, Austria: OCG, 2012 9 E. Blomqvist, E. Daga, A. Gangemi, V. Presutti, Modelling and using ontology design patterns. [http://www.neon-
Table 2-2 Diagnosis example based on I_CS_PT pattern
Top-level pattern I_NCS_PT (cf. Table 2-3):
The information about no clinical situation pattern (I_NCS_PT) is defined as a specialisation of
I_CS_PT and can be used to represent the absence of a particular patient clinical situation. It consists
of the following SPO triples:
11
S. Schulz, A. Rector, J. Rodrigues, C. Chute, B. Üstün, K. Spackman, Ontology-based convergence of medical terminologies. SNOMED CT and ICD-11. Proc. of eHealth2012. Vienna, Austria: OCG, 2012.
D4.3 Ontology / Information models covering the public health use cases Page 13 of 44
Table 2-14Heart failure situation example based on CS_PT
Finally, Figure 2-1 depicts the hierarchical graphical representation of the top-level patterns de-
scribed above. The clinical process and clinical situation semantic patterns can only occur as compo-
nents of other patterns since semantic patterns encode information entities. They will be used
through composition by other patterns such as I_CS_PT, I_NCS_PT, etc. The UML notation has been
used for representing specialisation and composition.
Figure 2-1 Hierarchy of top-level patterns
2.1.3 OWL DL representation
The representation of the above top-level patterns into OWL 2 DL allows the precise formalization of
the ontological framework proposed and the use of DL reasoning. DL reasoning is useful for the
achievement of two important goals.
On the one hand, it can be used for detecting equivalent clinical information from iso-semantic mod-
els. This includes the ability to compare different distributions of content between information mod-
els and ontologies/terminologies, and to test whether they are semantically equivalent. For instance,
there are two possible representations to encode a breast cancer diagnosis when using SNOMED CT:
(1) using one diagnosis information model element and the concept Breast cancer or (2) using two
information model elements for representing the disease diagnosed Cancer and the disease location
Breast structure. An appropriate representation, supported by a DL reasoner should discover that
both representations are semantically equivalent.
On the other hand, DL reasoning can be used to provide an advanced exploitation of clinical infor-
mation by means of semantic query possibilities such as retrieving patients who use tobacco, inde-
pendently of the form of the tobacco (e.g. cigar, pipe, etc.) and of the type of consumption (e.g. snuff
or smoking).
Table 2-15 and Table 2-16 depict the translation of the patterns into OWL DL, according to the pro-
posed ontological framework. The first table depicts the translation of the predicates into OWL DL
D4.3 Ontology / Information models covering the public health use cases Page 18 of 44
expressions. By following the triple-based pattern representation of the patterns, the subject (SUB)
and object (OBJ) correspond to ontology classes and the predicate to an OWL DL expression. These
DL expressions use one or more object properties from our ontologies, together with different quan-
tifiers, as a result of the underlying ontological model.
Predicate OWL DL expression
'describes situation' SUBJ subClassOf shn:isAboutSituation only OBJ
'describes quality' SUBJ subClassOf shn:isAboutQuality only OBJ
'results from process' SUBJ subClassOf btl:isOutcomeOf some OBJ
'has attribute' SUBJ subClassOf btl:hasInformationAttribute some OBJ
'is quality of' SUBJ subClassOf btl:inheresIn some OBJ
'has observed value' SUBJ subClassOf btl:Quality and btl:projectsOnto some OBJ
'has value' SUBJ subClassOf btl:isRepresentedBy only
(shn:hasValue some OBJ)
'has units' SUBJ subClassOf btl:isRepresentedBy only
(shn:hasInformationAttribure some OBJ)
'has scale' SUBJ subClassOf btl:isRepresentedBy only
(shn:hasInformationAttribure some OBJ)
‘has participant’ SUBJ subClassOf btl:hasParticipant some OBJ
‘occurs at’ SUBJ subClassOf btl:projectsOnto some OBJ
‘happens at’ SUBJ subClassOf btl:isIncludedIn some OBJ
‘follows’ SUBJ subClassOf btl:isPrecededBy some OBJ
Table 2-15 OWL DL representation of the top-level patterns predicates
Note that the quantifiers used are different. For connecting information entities with clinical entities
we have used the universal quantifier (='only'), because we cannot assume that an instance of X
exists where there is an instance of "information on X". We are aware that there are technical diffi-
culties in such models, as they lead to the possibility of statements that are not about anything at all.
But to the best of our knowledge, adverse reasoning consequences are avoided as long as we use
very specific object properties (here is_about_situation instead of represents.) On the other hand, a
false diagnosis is an existing statement that is not about any clinical individual situation at all. A sec-
ond possibility is to allow hypothetical entities. More satisfactory but more complicated would be
the use of a higher order logic so that the uncertainty can be correctly targeted on the statement or
belief rather than on the underlying state of the world. However, this is likely to remain beyond the
scope of easily used computational logics for the near future, so that approximations using descrip-
tion logic are required.
The second table (cf. Table 2-16) provides the result of applying the predicates OWL DL translation
on each pattern. The OWL snippets can be understood as equivalent to the predicates. As it can be
observed, apart from the predicates translation, for some patterns such as the one that refer to the
past history of clinical situation (PH_CS_PT) or which include negation (I_NCS_PT, NPH_CS_PT), addi-
tional changes are required.
D4.3 Ontology / Information models covering the public health use cases Page 19 of 44
Predicate OWL DL expression
I_CS_PT
shn:InformationItem and shn:isAboutSituation only shn:ClinicalSituation and btl:isOutcomeOf some btl:Process and shn:hasInformationAttribute some shn:InformationAttribute
PH_CS_PT
shn:InformationItem and shn:isAboutSituation only (btl:BiologicalLife and btl:hasPart some shn:ClinicalSituation) and btl:isOutcomeOf some btl:Process and shn:hasInformationAttribute some shn:InformationAttribute
I_NCS_PT
shn:InformationItem and shn:isAboutSituation only (shn:ClinicalSituation and not btl:hasPart some shn:ClinicalSituation) and btl:isOutcomeOf some btl:Process and shn:hasInformationAttribute some shn:InformationAttribute
NPH_CS_PT
shn:InformationItem and shn:isAboutSituation only (btl:BiologicalLife and not btl:hasPart some shn:ClinicalSituation) and btl:isOutcomeOf some btl:Process and shn:hasInformationAttribute some shn:InformationAttribute
OB_CS_PT
shn:ObservationResult
and shn:isAboutQuality only (btl:ProcessQuality
and btl:inheresIn some shn:ClinicalSituation
and btl:projectsOnto some (btl:ValueRegion
and btl:isRepresentedBy only (
shn:hasInformationAttribute some shn:MeasurementUnits
and shn:hasValue some xml:datatype
and shn:hasInformationAttribute some shn:Scale)))
CP_PT
btl:Process
and btl:hasParticipant some btl:MaterialObject
and btl:projectsOnto some btl:TemporalRegion
and btl:isIncludedIn some btl:MaterialObject
CS_PT
shn:ClinicalSituation
and btl:hasParticipant some btl:MaterialObject
and btl:projectsOnto some btl:TemporalRegion
and btl:isIncludedIn some btl:MaterialObject
and btl:isPrecededBy only shn:ClinicalSituation
Table 2-16 OWL DL representation of the top-level patterns
Example of how the above two tables are used to transform data modelled according to the seman-
tic patterns into their OWL DL representation, as well as how they can be queried and some of the
implications with regards to negation are commented at the end of Chapter 4.
D4.3 Ontology / Information models covering the public health use cases Page 20 of 44
3 Extending the Heart Failure Summary (HFS) to support Public
Health
In the following, we will describe the two models created for recording the tobacco and alcohol use
respectively. For each model, we provide its graphical representation as provided by the openEHR
HFS template12 for readability purposes. This template uses a subset of OpenEHR archetypes, where
textual descriptions for each of data elements included are provided. Figure 3-1 depicts the main
structure of the template. Then, we will apply the semantic patterns described in Chapter 2 to the
modelling of the tobacco and alcohol use information. The patterns proposed do not cover aspects of
information presentation such as order or indentation but the semantic of the information they rep-
resent.
Figure 3-1 Main structure of the OpenEHR Heart Failure Summary extended to support public health
3.1 Tobacco Use Summary
Figure 3-2 depicts the graphical representation of the Tobacco Use Summary Model. The model can
be subdivided into two sub-models, depending on how the tobacco is consumed (i.e. smoked or
snuff). Table 3-1 and Table 3-11 summarize the data elements and value restrictions for each one.
SNOMED CT terms were used when possible for encoding the value restrictions. Next, we analyse
Table 3-2 Re-naming of the SNOMED CT concepts used in the smoking tobacco model
Besides, Figure 3-3 shows the original SNOMED CT hierarchy of the above concepts and how it would
look like after their re-modelling taking into consideration the two aspects described above: mislead-
ing term names and incomplete or faulty underlying model of meaning.
Figure 3-3 Re-modelling of the original Smoking Tobacco hierarchy
In the re-naming suggestion of the SNOMED CT concepts, we have not considered aspects such as
“over 20 per day” or “20 or less per day” which are part of concepts such as heavy smoker and mod-
erate smoker, respectively. On the one hand, this kind of knowledge is not formally represented in
SNOMED CT and is only included in the name of a primitive concept. On the other hand, the defini-
tion of (heavy / light / moderate) smoker can probably not be universally defined. We consider that
this kind of knowledge might vary across institutions or depend on study purposes and therefore
should not be included in the terminology. It could be implemented as a separate set of rules that
could be shared in an interoperability scenario. For instance, if in our local implementation a heavy
cigarette smoker means “>= 10 cigarettes per day”, we could represent it with the following general
OWL DL ontology axiom:
shn:ObservationResult and shn:isAboutQuality some (shn:MassIntake and btl:inheresIn some sct:CigaretteTobaccoSmokingSituation and btl:projectsOnto some (btl:ValueRegion
and btl:IsRepresentedBy only (shn:hasInformationObjectAttribute some sct:PerDay and shn:hasValue some double[> =10])))
SubClassOf (shn:InformationItem and shn:isAboutSituation only sct:HeavyCigaretteTobaccoSmokingSituation)
D4.3 Ontology / Information models covering the public health use cases Page 24 of 44
Taking into consideration what has been explained we will apply the mentioned top-level patterns to
the tobacco smoking model. Its application will consist of constraining them by following the cardi-
nality and value restrictions provided by the semantic pattern.
The most frequently used pattern is I_CS_PT. We will use it for representing all data element / value
combinations shown in left column of Table 3-3. Two triple of the patterns are used, S1 and S2, which
represents a clinical situation and the process performed to acquire that information respectively
and whose predicate cardinality has been constrained to 1 and their value range (object part of the
triple) to a subclass of clinical finding (i.e. clinical situation) and to an evaluation procedure according
Table 4-1 Data elements and values (SNOMED CT) of an excerpt of DCM and C-CDA tobacco models
The Meaningful Use model provides only one data element for recording the tobacco smoking status.
The status value is constrained to a set of SNOMED CT codes to meet the certification criteria in sup-
port of Meaningful Use Stage 2 (e.g. Current every day smoker).
For the DCM we have represented the amount of tobacco consumed per day and the type of tobacco
used. Figure 4-1 and Figure 4-2 depict an excerpt of each clinical model represented in XML.
From the above we can state that it is not possible to impose a single model representation across
diverse clinical communities (e.g. public health vs. primary care vs. specialised care) and clinical prac-
tices, and that the requirements will dictate the level of information detail needed. As we can see,
14
HL7 IG for CDAR2: IHE Health Story Consolidation, R1", Consolidated CDA, C-CDA: http://www.hl7.org/implement/standards/ (Last accessed Jan. 2014) 15
US Meaningful Use Stage 2: http://www.cms.gov/Regulations-and-Guidance/Legislation/EHRIncentivePrograms/Downloads/Stage2_Guide_EPs_9_23_13.pdf (Last accessed Jan. 2014)
D4.3 Ontology / Information models covering the public health use cases Page 33 of 44
each of the models provides a different level of information granularity apart from being represented
by following different EHR standards. Table 4-2 shows a summary of the tobacco data items names of
each model that will be used in the interoperability use case:
Then, by considering these clinical limits, the immediate question is which degree of semantic in-
teroperability we can offer, or up to which degree we can make the above models semantically in-
teroperable. At the beginning of this chapter we mentioned that we had two goals: (i) demonstrate
that data from heterogeneous models can be homogeneously queried, and (ii) data can be retrieved
within their context which help public health systems to interpret them right.
Assuming that we have provided the means for recording the contextual information previously de-
scribed (e.g. information provider, site of care, etc.) (cf. Section 4.1) and it can be retrieved as answer
of each query, the next step is demonstrating that data rendered according to each model can be
homogeneously queried and that we get an answer even when information is provided at different
granularity level by each model.
Tobacco Use (openEHR) Meaningful Use (C-CDA) Tobacco Detailed Use (DCM)
Status Status Number per day
Form Type of tobacco use
Typical Smoked Amount
Pattern of Use
Date Ceased
Pack Years Pack years
Status Table 4-2 Summary of the data items of each of the three clinical models used for the interoperability use case
D4.3 Ontology / Information models covering the public health use cases Page 34 of 44
Table 4-5 HL7v3 DCM “Cigar tobacco smoker and smokes 15 cigar per day” )”; Correspondences and Pattern triples
D4.3 Ontology / Information models covering the public health use cases Page 40 of 44
4.4 Homogeneous query of data from the three clinical models: Tobacco Use (openEHR), Meaningful Use (HL7 C-CDA) and Tobacco Detailed Use (DCM- HL7v3)
Once the patient data obtained from the three heterogeneous systems have been mapped into their
semantic pattern-based representation (cf. Table 4-3, Table 4-4 and Table 4-5). By applying the cor-
respondences provided by Table 2-15 and Table 2-16, we can get data OWL DL representation in or-
der to allow their homogeneous query and the use of DL reasoning.
Next, we will formulate a set of queries that could be interesting for a public health system in order
to aggregate data about heart failure diagnosed patients, and then assess the incidence of smoking
tobacco in developing heart failure.
(This use case is fictitious and it only aims at demonstrating the value of the approach proposed with regards to the exploitation of EHR information for public health purposes.)
Question exemplars: How many heart failure diagnosed patients…:
(Q1) … are currently heavy smokers (>10 / day)
(Q2) … quitted smoking within the last 20 years
(Q3) … never used tobacco in any of its forms (smoke, snuff, etc.)
Table 4-6 depicts the DL representations of the above questions:
The first question (Q1) should retrieve patient Y and Z. Patient Z data specifically state that smokes
“15 cigar / day”; Patient Y data state that is a heavy cigarette smoker. The term is used according to
Meaningful Use, meaning “>=10 /day” (here >= 10 cigarette / day). The query does not specify the
smoking form, just that they smoke tobacco, thus, both patients smoke more than 10 / day (ciga-
rettes or cigars) and both information instances will be retrieved together with its contextual infor-
mation.
The second question (Q2) should retrieve patient X, since he has a past smoking history and quitted
in March 2010 (within the last twenty years). The question asks for all the past tobacco smokers, in-
dependently of the form, whose end smoking date is greater or equal than 1994, March (within the
last twenty years).
Finally the third query (Q3) will not retrieve any patient. Two of them are current smokers, and the
other one smoked in the past. We consider that if it has not been explicitly recorded that the patient
never smoked we cannot infer it so (absence of information does not mean negation). However, on-
tologies, as opposite to traditional databases where complete information is assumed, assume in-
complete information, which affects how data can be queried. If something has not explicitly negat-
ed it is not considered false as opposed to traditional databases. This has to be considered in the
formulation of queries, which constitutes a barrier for most users. However, based on our present
experience, we think that the use of query patterns using principles similar to the semantic patterns
described above could hide that complexity.
D4.3 Ontology / Information models covering the public health use cases Page 41 of 44
In the case of (Q3), it explicitly asks for those patients of whom the records state that they do not
have a past history of smoking (Q3). We will only count those in the result and will not consider ab-
sence of information as result. Where we know that this information can be derived from other
statements and is generalizable for all patients, then a specific data instance about absence will have
to be created, probably in an automatic way whenever certain data are checked for presence. How-
ever, it is possible that the decision whether missing data for a specific patient characteristic can be
interpreted as absent condition can only be made individually. In this case, only mechanisms that
make easier taken such decision can be provided.
As it can be observed in the rendering of the DL queries, they also follow the semantic patterns, in
this case after their OWL DL transformation. Q1 follows the observation result semantic pattern, Q2
the past history situation, and Q3 the no past history of clinical situation. The three of them will re-
trieve the data within context, this last encoded as part of the evaluation procedure (cf. Table 2-12).
#Q1 shn:ObservationResult
and shn:isAboutQuality only (shn:MassIntake and btl:inheresIn some sct:TobaccoSmokingSituation and btl:projectsOnto some (btl:ValueRegion and btl:isRepresentedBy only
(shn:hasInformationAttribute some sct:PerDay and shn:hasValue some int[>10])))
#Q2 shn:InformationItem and btl:isOutcomeOf some sct:Evaluation and shn:isAboutSituation only (btl:BiologicalLife and btl:hasPart some (sct:TobaccoSmokingSituation and btl:projectsOnto some (btl:PointInTime
and btl:isRepresentedBy some dateTime[>="1994-03-01T00:00:00Z"])))
and shn:hasInformationAttribute some sct:InThePast
#Q3 shn:InformationItem and shn:isAboutSituation only (btl:BiologicalLife and not btl:hasPart some sct:TobaccoSmokingSituation) and btl:isOutcomeOf some sct:Evaluation and shn:hasInformationAttribute some sct:InThePast
Table 4-6 DL Query examples
We have demonstrated that although data are expressed at different detail levels, they can be ac-
cessed homogeneously thanks to the underlying model of meaning (i.e. ontological framework).
We have formulated the above queries as DL queries for facilitating the reader the understanding of
what is being queried. However, query languages (QLs) based on DL are computationally expensive
and therefore have a limited scalability. Other query languages based on RDF graphs such as SPARQL
are more powerful and performs better but are agnostic with regards OWL DL semantics, not allow-
ing generally DL reasoning. However, combined solutions that perform better and use partially DL
reasoning capabilities exist. Besides, QLs as SPARQL are more expressive, query functionality closer
to traditional databases QLs such as SQL (ordering, count, filter, string matching, etc.).
Table 4-7, shows the rendering of Q2 in SPARQL using Turtle syntax and by using the count operation
for getting the number of patients retrieved by the query (here: ?count = “2”), instead of each indi-
vidual data instance as we would retrieve with the DL query (cf. Table 4-8).
D4.3 Ontology / Information models covering the public health use cases Page 42 of 44
Table 4-7 Q2 rendered in SPARQL, including COUNT option
Finally Table 4-8 shows a data exemplar retrieved as answer to query two:
Individual: Instance_Tobacco_Smoking_Situation_Patient Types: shn:InformationItem and shn:isAboutSituation (btl:BiologicalLife and btl:hasPart some (sct:TobaccoSmokingSituation and btl:projectsOnto some (shn:EndPointInTime
and btl:isRepresentedBy value "1994-03-01T00:00:00Z"^^dateTime)))
and btl:isOutcomeOf value Instance_Evaluation_Process and shn:hasInformationAttribute some sct:InThePast
Individual: Instance_Evaluation_Process Types: sct:HistoryTaking and shn:informationProvider value Instance_Patient_X and shn:informationSubject value Instance_Patient_X and shn:siteOfCare value Instance_Outpatient_Consultation_01 …
Table 4-8 OWL DL rendering of example data instance retrieved for Q2. There are two instances, the one that represent the tobacco smoking situation and the end point in time and the one that provides some contextual information
D4.3 Ontology / Information models covering the public health use cases Page 43 of 44
5 Summary and conclusions
In this document (SemanticHealthNet deliverable 4.3), we have addressed semantic interoperability
challenges that exist when EHRs are used for public health specific data analysis. Our objective has
not been addressing interoperability challenges due to the use of different EHR standards and repre-
sentations. We have explained that granularity, completeness and data quality required for public
health inquiries is often not provided by EHRs. Besides, clinical information mostly has to be re-
trieved within its context in order to be interpreted safely. So is it not the same whether some infor-
mation is provided by a clinician, a patient, or by a machine. In addition, data acquisition and record-
ing can be highly biased in cases the data is not considered of primary importance by the clinician, as
it is often the case with data on recreational drugs like tobacco and alcohol. Thus, all this contextual
information is required in order to know the degree of trust of data. Smoking and alcohol use data
are usually measured approximately and data items such as pack years are often used in public
health for assessing CVDs risks. There are also limits to semantic interoperability when codes such as
Ex-smoker are used for referring both to someone who quit last month but had smoked for 30 years,
and to a person who quit 20 years ago after having smoked two years only. Thus, there are limits to
comparability and semantic interoperability that are fundamental to differences in clinical processes
and facts about the world that cannot be resolved at the level of information systems. At best, the
degree of uncertainty/vagueness can be estimated.
We have focused on the reuse of data from EHRs for public health purposes, assuming that at least
parts of the EHRs are sufficiently standardized and quality-assured. This can be the case with sum-
maries like the HFS. Since the objective of this deliverable is to show how the EHR can be used to
satisfy public health goals, we have expanded the Heart Failure Summary (HFS) with two relevant
behavioural factors, tobacco and alcohol use. Furthermore, we have described an extension of the
underlying semantic architecture, which has been applied to the public health use case in order to
provide the semantic representation of the tobacco and alcohol use data and demonstrate its benefit
to improve semantic interoperability.
The use of semantic patterns to support and guide the mapping process of structured data into their
semantic representation, introduced in D4.2 has been expanded and supported by more examples.
The selection of the appropriated semantic pattern could be supported by the use of keywords such
as “past situation”, “absence”, “observation result”, “test result”, “assessment”, “symptom” etc.
together with taking into consideration the semantic category of each of the SNOMED CT concepts
used within the clinical model (i.e. finding, substance, qualifier value, etc.).
We have provided evidence that SNOMED CT, as the underlying domain ontology, bears a considera-
ble risk of being misled by the textual description of the concept (e.g. bumetanide substance vs.
product). For these cases, the use of a semantic pattern could also help to guide the terminology
binding. We have also shown that SNOMED CT concepts are placed under a wrong semantic category
(e.g. representing information, but placed under a clinical entity category), that SNOMED CT con-
cepts are underspecified (e.g. no explicit relationship between substance consumption and the relat-
ed substance), or are not expressive enough for our use case (e.g. no concept for representing occa-
sional tobacco snuff user but there is one for occasional tobacco user).
D4.3 Ontology / Information models covering the public health use cases Page 44 of 44
Given the above, the use of SNOMED CT as domain terminology within the EHR requires some deci-
sions to be made such as (i) to create value sets to be used across medical specialties; (ii) to clarify
the role of the terminology within the EHR by defining what kind of entities should be represented
(information vs. clinical entities); and (iii) to add axioms to hitherto underspecified concepts. To this
end, the approach proposed does not aim at forbidding the use of complex terms, but to correctly
place them in the ontology in a way that can be consistently used within information models. This
contrasts with other approaches which are normative and disallow the use of complex terms. The
SemanticHealthNet approach, instead, is rather descriptive than normative, and it is suited to medi-
ate across heterogeneous normative approaches, as we have demonstrated.
Although not perfect, SNOMED CT is increasingly being adopted by different countries in their na-
tional healthcare systems. It is increasingly grounded on formal ontological principles and redesigned
according to these principles. SemanticHealthNet is keeping pace with SNOMED CT’s progress, which,
however, should not preclude the use of other medical terminologies. The fact that other termino-
logical standards like ICD, LOINC and ICNP are in the process of being harmonized with SNOMED CT
via a common ontology demonstrate an increasing concern to clarify the meaning of medical terms
by formal-ontological grounding, thus following a similar route as SemanticHealthNet.
Given the state of the art of semantic technologies, the biggest challenge consists in finding interme-
diate scalable solutions which leave the door open to include the upcoming progress in the field. This
might influence the use of a logic- or non-logic-based language, as well as the type of reasoning
done, if any, which may depend on the technological state of the art and the semantic interoperabil-
ity requirements. Moreover, several renderings of a rich representation might be required, each at-
tending a different purpose, e.g. data validation vs. data query.
In the third project year, SemanticHealthNet WP 4 will open its scope to the semantic annotation of
formalised clinical practice guidelines. It will also try to explore the relationships with existing model-
ling approaches (e.g. CIMI, SIAMM, etc.) and standards such as ContSys (CEN ISO/DIS 13940). This
last defines a system of concepts from and enterprise / clinical perspective, which represent both the
content and context of the health care services under a process view and a generic clinical process
model.16 It seems to be complementary to the approach proposed. It pays attention to the modelling
of healthcare and clinical processes, considering their workflows, while in the SemanticHealthNet
approach as it has been developed by now, the sequence of healthcare processes that end up pro-
ducing the data have not been modelled in detail. Yet there is some overlap, e.g. by using concepts
like “health condition”, “observed condition”, “observed condition value” or “health component
specification” for which we have found a parallelism with the use of “clinical situation”, “observation
result about a clinical situation quality”, “result of the observation of the clinical situation quality”.
We will have a deeper insight in this standard in order to identify ways of harmonization.