Swat4 ls fca_slides

Refining Health Outcomes of Interest using Formal Concept Analysis and Semantic Query

Expansion

Olivier Curé1, Henri Maurer2, Paea Le Pendu3, Nigam Shah3

1: CNRS LIGM lab, UPEM, France2: Edinburgh University, IK3: BMIR lab, Stanford University, USA

2

Problem setting

● Applications need to select, extract, compare and analyze groups of patients using Electronic Health Records (EHRs)

● This require to define Health Outcomes of Interests (HOI), e.g. myocardial infarction, chronic obstructive pulmonary disease.

● With clinical text, these definitions should capture variations of terms and ensure good precision and recall of the text-mining process.

3

Problem setting (2)

● It is not practical to define precisely these HOIs with concept identifiers, e.g. UMLS CUIs.

● We provide a solution that produces and refines HOI definitions from terms provided by the end-user.

● Our solution aims to propose sound and complete definitions in a best-effort way.

4

Approach overview

Diseases

Procedures

DrugsDevices

Bioportal - Knowledge

termsconcepts

Semantic QueryExpansion

Terminology3 DB

Semantic QueryExpansion

Formal ConceptAnalysis

StatisticsBasedPruning

5

SQE

● Improve search results by expanding queries with the transitive closure of the subsumption relationship of ontology concepts.

● Queries can be generalized (resp. specialized) via expansions with ancestors (resp. descendants).

● Ex: expanding a query with 'neoplasm' or 'tumor' when searching for 'cancer'.

6

FCA

● Abstract conceptual descriptions from a set of objects described by some attributes.

● Used in machine learning and knowledge management.

● A formal context is a triple (G,M,I), resp. a set of objects, attributes and a binary relation between G and M.

● A formal context can be represented as a matrix.

7

FCA (2)

{1,2}-{CF1,F1,CF2,F2}

{3}-{CF1,F1,MF2,F2}

{6}-{BLF1,F1,MF2,F2}

{4,5}-{BLF1,F1,BLF2,F2}

{1,2,3}-{CF1,F1,F2}

{3,6}-{MF2,F1,F2}

{4,5,6}-{BLF1,F1,F2}

{1,2,3,4,5,6}-{F1,F2}

⊥

⊤

8

Method

● SQE: Relational database approach– We are using the ontologies stored in Stanford's

DB and its materialization of concept subsumption (almost 14 millions entries).

● FCA: objects and attributes of the formal context are concept identifiers (UMLS concept identifiers).

10

Method (3)

● To improve relevance, identifying potential concepts among discovered ones, a pruning FCA-based approach is designed.

● Formal contexts is composed of matching concepts as objects and candidate concepts as attributes.

● Thus the binary relation corresponds to the subsumption relationship.

11

Method (4)

● Ex: 10365: “hyperlipoproteinemia type iv” and 740154 : “disease, disorder or finding”● Standard FCA algorithms are used to define the FCA lattice.

12

Method (5)

● Qualifying a discovered concept is performed using a top-down navigation of the FCA lattice.

● For each formal concept <Ai,Bi>, we compute the transitive closure of sub concepts of Ai (resp. Bi), denoted LAi (resp. Lbi).

● If (|LBi ∩ LAi |)/ | LBi | ≥ Θ, with Θ a predefined pruning threshold then Bi is potential concept

13

Method (6)

● Concept sets:– M : matching

– D : Discovered

– P : Potential

– C : Other concept

14

Example

● Search on Hypercholesterolemia on 18 ontologies provides:– 20 matching concepts (i.e., FCA objects)

– 102 discovered concepts (i.e., FCA attributes)

● Generates an FCA lattice with 67 formal concepts

● First formal concept satisfying a Θ=.75 pruning threshold is at the 4th level of the lattice: only 4 concepts out of 16 LBi are covered by LAi .

● These 4 concepts have the following preferred labels: “hypercholesterolemia”, “cholesterolosis”, “secondary hypercholesterolemia” and “hyperlipidemia”.

15

Method (7)

● We include interactions with end-user to validate our potential discoveries.

● Hence the domain expert has the final decision on acceptance/rejection of a proposition.

● Important issue: trade-off between user interactions and precision/recall of results.

● End-user can validate whenever she wants.● Interactions are performed in a web interface providing

additional information on the search (clinical text snippets, number of patients).

16

Evaluation

● i2b2 obesity NLP reference set used as an evaluation data set

● Gold standard are the results of a previous experiment conducted at Stanford.

● Evaluation in terms of specificity, sensitivity and duration of computation (on commodity hardware)

17

Evaluation (2)

● An improvement of 2 and 3 % on resp. sensitivity and specificity.

● Computation duration in terms of seconds on a standard laptop.

18

Evaluation (3)

● More interesting is that some of our false negatives seem to be relevant to the search.

● Some of these false negative come from the matching and also the potential (i.e. FCA based) approaches:

● Matching example :– Sitosterolemia for hypercholesterolemia'' for hypercholesterolemia

● Potential examples: ● “h/o: raised blood, familial hyperlipoproteinemia”, “fh: raised blood lipids” for

hypercholesterolemia, while the gold standard contains concepts such as “hyperlipoproteinemia type ii”) concepts which confirms the relevance of using a semantic approach.

● Note that among our true positive, depending on the use case, a significant number of items have been retrieved from the potential concept set, i.e., using our FCA statistical approach.

19

Conclusion

● We have proposed a semi-automatic solution for defining HOIs.

● Approach uses SQE and FCA enriched with a statistical approach.

● Our results are comparable to state of the art methods.

● It refines HOIs definitions efficiently with relevant terms/concepts/

20

Future works

● Conduct user-driven evaluations with clinicians and researchers.

● Analyze acceptance/rejection of end-users in practical scenarios.

● Use active learning over past query refinements to improve future queries.

● Study our method's impact on mining EHRs clinical notes and cohort building tools.

21

Thanks

Questions ?

[email protected]

Swat4 ls fca_slides

Education

formal concepts

fca objects

fca attributes

fca lattice

discovered concepts

formal concept analysis

potential concepts

matching concepts