Top Banner
CURRENT ADVANCES TO BRIDGE THE USABILITY-EXPRESSIVITY GAP IN BIOMEDICAL SEMANTIC SEARCH (AND VISUALIZING LINKED DATA) Maulik R. Kamdar Biomedical Informatics PhD Program 3 rd April 2015
51

Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

Jul 17, 2015

Download

Engineering

Maulik Kamdar
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

CURRENT ADVANCES TO BRIDGE THE USABILITY-EXPRESSIVITY GAP IN BIOMEDICAL SEMANTIC SEARCH (AND VISUALIZING LINKED DATA)

Maulik R. Kamdar Biomedical Informatics PhD Program

3rd April 2015

Page 2: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

QUERYING HETEROGENEOUS DATASETS ON THE LINKED DATA WEB André Freitas, Edward Curry, João Gabriel Oliveira and Seán O'Riain

Internet Computing February 2012

EVALUATING THE USABILITY OF NATURAL LANGUAGE QUERY LANGUAGES AND INTERFACES TO SEMANTIC WEB KNOWLEDGE BASES Esther Kaufmann and Abraham Bernstein

Journal Of Web Semantics

November 2010

Page 3: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

INTRODUCTION

¢ Opportunities �  Builds on existing Web Infrastructure (URIs and HTTP)

and Semantic Web Standards (RDF, RDFS, vocabularies) �  Reduce barriers to data publication, consumption, reuse

and availability, adding a fine-grained structure. �  Expose previously siloed databases as data graphs (D2R,

Google Refine) to be interlinked and integrated with other datasets to create a global-scale interlinked dataspace.

¢ Challenges �  Awareness of which exposed datasets potentially contain

the data they want, their location and their data model. �  Syntax of structured query languages like SPARQL �  Heterogeneous, different descriptors for same entity,

loosely-connected (yet!) and distributed data sources

Page 4: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

USABILITY-EXPRESSIVITY GAP

Page 5: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

USABILITY-EXPRESSIVITY GAP

Page 6: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

USABILITY-EXPRESSIVITY GAP

Page 7: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

USABILITY-EXPRESSIVITY GAP

Page 8: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

USABILITY-EXPRESSIVITY GAP

Page 9: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

USABILITY-EXPRESSIVITY GAP

Page 10: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

EXISTING APPROACHES

¢  Information Retrieval Approaches �  Entity-centric Search (SWSE, Sindice) �  Structure Search (Semplore) – use of inverted indexes

and user feedback strategies

¢ Natural Language Queries �  Question Answering (PowerAqua, FREyA) �  Difficult to expand across domains �  Best-effort Natural Language Interfaces (Treo) �  Habitability Problem - users need guidance and support �  WordNet/Wikipedia semantic approximation techniques

¢ Structured SPARQL Queries

Page 11: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

CHALLENGE DIMENSIONS

¢  Query expressivity �  Query datasets by referencing elements in the data model, operate

over the data (aggregate results, express conditional statements).

¢  Usability �  An easy-to-operate, intuitive, and task-efficient query interface.

¢  Vocabulary-level semantic matching �  Semantically match query terms to dataset vocabulary-level terms.

¢  Entity reconciliation �  Match entities expressed in the query to semantically equivalent

dataset entities.

¢  Semantic tractability mechanisms �  Answer queries not supported by explicit dataset statements

(for example, “Is Natalie Portman an Actress?” can be supported by the statement “Natalie Portman starred Star Wars”).

Page 12: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)
Page 13: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)
Page 14: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

GOOGLE KNOWLEDGE GRAPH

Page 15: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

GOOGLE KNOWLEDGE GRAPH

Page 16: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

BIOMEDICAL MOTIVATION

~5 compounds

~300 000 compounds

~300 interesting compounds

~ 10 interesting compounds

Lite

ratu

re

Virtu

al S

cree

ning

Que

ry d

atab

ases

Hypothesis Generation

(Linked) Data

“Are there Drugs with molecular weight under 400 tested against ‘Colon Cancer’?”

“Do any Publications refer to assays using ‘Aspirin’ as the primary Drug in treatment of ‘Prostrate Cancer’?

Page 17: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

REVEALD: A USER-DRIVEN DOMAIN-SPECIFIC INTERACTIVE SEARCH PLATFORM FOR BIOMEDICAL RESEARCH

Maulik R. Kamdar, Dimitris Zeginis, Ali Hasnain, Stefan Decker and Helena F. Deus

Journal of Biomedical Informatics February 2014

Page 18: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

CHALLENGES

¢ Awareness of which exposed datasets potentially contain the data they want and their data model.

¢ Large, heterogeneous biomedical data sources, which are too dynamic for reliable data centralization

¢ The assembly of SPARQL queries to create the aggregated information for bioinformatics analysis still poses a high cognitive entry barrier.

¢ Human-readable, and more specifically, domain-specific representation of query results is required.

¢ None of the previous systems tested in biomedical domains, except DistilBio, VIQUEN and Cuebee

¢ Trade-off between expressivity and usability.

Page 19: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

BACKGROUND: CANCO DOMAIN-SPECIFIC MODEL

Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources." Semantic Web 5.2 (2014): 127-142.

Page 20: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

BACKGROUND: CANCO DOMAIN-SPECIFIC MODEL

Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources." Semantic Web 5.2 (2014): 127-142.

Page 21: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

LIFE SCIENCES LINKED OPEN DATA CLOUD

~3 Billion Triples Life Sciences 53 datasets

Cyganiak,R. and Jentzsch,A. (2014) The Linking Open Data cloud diagram. http://lod-cloud.net/ [Accessed: March 23, 2013]

Page 22: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

BACKGROUND: CATALOGUING & LINKING 1248 Concepts and 1255 properties were harvested from more than 53 Linked Biomedical Data Sources (LBDS) (Life Sciences Linked Open Data – LSLOD catalogue) and linked to the CanCO Query Elements.

Hasnain, Ali, et al. "Cataloguing and linking life sciences LOD cloud." 1st International Workshop on Ontology Engineering in a Data-driven World (OEDW 2012).

Page 23: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

BACKGROUND: ENTITY RECONCILIATION

Page 24: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

BACKGROUND: FEDERATED ARCHITECTURE

Chebi:Compound                void-­‐ext:subClassOf          Granatum:Molecule  Pubchem:Compound    void-­‐ext:subClassOf          Granatum:Molecule  

?molec a Granatum:Molecule

?molec a Chebi:Compound ?molec a Pubchem:Compound

SPARQL    Query  

Chebi   DrugBank   UniProt   Others  

Life  Sciences  Linked  Open  Data    (LSLOD)  

LSLOD  Catalogue  

CanCO  

Saved  Queries  

Transformed  Query  

Transformed  Query  

Transformed  Query  

Transformed  Query  

Rule  Templates  Experimental  Datasets  

Query    Engine    Query  Logging  

TransformaGon  

Cataloguing  &    Links  CreaGon  

RDFizaGon  

Social  CollaboraGve  Workspace  

Hasnain, Ali, et al. "A Roadmap for navigating the Life Scinces Linked Open Data Cloud." International Semantic Technology (JIST2014) conference. 2014.

Page 25: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

BACKGROUND: FEDERATED ARCHITECTURE

Ø Non-intuitive Ø SPARQL, RDF, Schema knowledge required Ø Domain-specific visualization of results is not possible

Page 26: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

REVEALD SEARCH PLATFORM

¢ ReVeaLD :- Real-Time Visual Explorer and Aggregator of Linked Data, is a user-driven domain-specific search platform.

¢  Intuitively formulate advanced search queries using a click-input-select mechanism

¢ Visualize the results in a domain–suitable format. ¢ Entity-centric and Visual Query Search System ¢ Assembly of the query is governed by a Domain-

specific Language (DSL), which in this case is the Cancer Chemoprevention Ontology(CanCO)

Page 27: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

REVEALD SEARCH PLATFORM Demo: https://www.youtube.com/watch?v=6HHK4ASIkJM&hd=1

Page 28: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

REVEALD SEARCH PLATFORM Demo: https://www.youtube.com/watch?v=6HHK4ASIkJM&hd=1

Page 29: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

DSL VISUAL REPRESENTATION

¢ Concept Map Visualization

Page 30: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

VISUAL QUERY BUILDER

CanCO DSL

Page 31: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

VISUAL QUERY BUILDER

CanCO DSL

Page 32: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

VISUAL QUERY MODEL

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX granatum: <http://chem.deri.ie/granatum/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT DISTINCT * WHERE { ?x0_Assay a granatum:Assay ; granatum:hasInput ?x1_Target ; granatum:identify ?x2_ChemopreventiveAgent ; granatum:outcome_method ?x3_outcome_method . ?x1_Target granatum:title ?x4_title . ?x2_ChemopreventiveAgent granatum:molecularWeight ?x10_molecularWeight ; granatum:SMILESnotation ?x9_SMILESnotation ; granatum:hasFormula ?x7_hasFormula ; granatum:HBD ?x5_Hydrogen_Bond_Donors ; granatum:HBA ?x6_Hydrogen_Bond_Acceptors ; granatum:TPSA ?x8_Topological_Polar_Surface_Area . FILTER regex(xsd:string(?x4_title), "estrogen receptor", "is") FILTER ( xsd:double(?x10_molecularWeight) < 300 ) } LIMIT 100

Pubchem

ChEBI

Uniprot

↑ → SPARQL Translation

All Assays, which Target Estrogen Receptors present in Human (Organism), and which identify potential Chemopreventive Agents with Molecular Weight < 300

http://srvgal78.deri.ie:8080/explorer?type=sampleQuery&nodes=17-1-30-33-73-78-91-81-82-92-98-63 &links=17.1-17.30-1.33-17.73-17.78-1.91-30.81-30.82-30.92-30.98-33.63 &filters=1.91.c.estrogen%20receptor|30.98.lt.300|33.63.c.human&flexible=1

Page 33: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

REVEALD DATA BROWSER

Page 34: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

REVEALD DATA BROWSER

Page 35: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

REVEALD DATA BROWSER

Page 36: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

REVEALD DATA BROWSER

Page 37: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

GRAPHIC RULES

¢ Query : SELECT * WHERE {<clickedURI> ?p ?o} ¢ Results are subjected to a set of Graphic Rules, which

follow the Event-Condition-Action paradigm (ECA) and provide visual representations using Fresnel Display Vocabulary.

¢ Example : �  Event: Each retrieved triple as query execution result

<http://www4.wiwiss.fu-berlin.de/drugbank/resource/targets/844> <http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/pdbIdPage> “http://www.pdb.org/pdb/explore/explore.do?structureId=1IVO”

�  Condition: sdf_file or pdbIdpage (Predicate) + http (Object) �  Action: HTTP GET and invoke a specific Resource Renderer �  Resource Renderer: GLMol Molecular Viewer

Page 38: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

SINGLE ENTITY SEARCH

Page 39: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

EVALUATION ¢  Tracking Real-time User Experience Methodology (TRUE)

- widely used in the HCI community to evaluate computer games

¢  Game-based evaluation where domain users are given tasks to complete and time and interactions are tracked using Google Analytics

¢  Subjectivistic evaluation where users were asked to fill out a survey.

¢  The main purpose of this evaluation focused on two usability concerns: �  Does familiarity of the users with the DSL affect the time needed to

formulate the query? �  Does a constrained DSL (smaller DSL), lead to less time needed for

query formulation?

Page 40: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

EVALUATION RESULTS

Page 41: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

EVALUATION RESULTS

Page 42: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

EVALUATION RESULTS

Page 43: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

OTHER IMPLEMENTATIONS: LINKED TCGA

Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.

http://srvgal78.deri.ie/tcga-pubmed/

Page 44: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

OTHER IMPLEMENTATIONS: LINKED TCGA

Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.

http://srvgal78.deri.ie/tcga-pubmed/

Page 45: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

OTHER IMPLEMENTATIONS: LINKED TCGA

Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.

http://srvgal78.deri.ie/tcga-pubmed/

Page 46: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

OTHER IMPLEMENTATIONS: LINKED TCGA

Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.

http://srvgal78.deri.ie/tcga-pubmed/

Page 47: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

OTHER IMPLEMENTATIONS: LINKEDPPI

Kazemzadeh, L., Kamdar, M. R.,et al. LinkedPPI: Enabling Intuitive, Integrative Protein-Protein Interaction Discovery. Linked Science, 48.

Page 48: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

OTHER IMPLEMENTATIONS: LINKEDPPI

Kazemzadeh, L., Kamdar, M. R.,et al. LinkedPPI: Enabling Intuitive, Integrative Protein-Protein Interaction Discovery. Linked Science, 48.

Page 49: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

DISCUSSION

¢ DSL Incrementation Mechanism �  Extend the current model represented in the Visual Query

Builder by adding new concepts and properties. �  Use or merge publicly available extensions of the DSL

¢ No reliance on the Federated Query Engine, SPARQL Endpoint, underlying DSL and Graphic Rules.

¢ Corrupt Graphic Rules result in the textual representation of the relevant triple.

¢ Domain-specific Languages increase usability and enable abstraction of underlying data models

Query expressivity   Usability   Vocabulary-level semantic matching  

Entity reconciliation   Semantic tractability mechanisms  

Medium  (SELECT,  FILTER,  OPTIONAL)  

Medium  (En=ty-­‐centric  Search,  VQS)  

Low  (Indexed  Term  URI  to  Concept)  

Low  (owl:sameAs  for  same  unique  keys)  

None  

Page 50: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

FUTURE WORK

¢ Ontologies, indexed term labels and catalogue as elements in a Controlled Natural Language to increase usability

¢ Results pipelined to any Problem-solving method (like Autodock Vina, visualization, ML algorithm etc.)

¢ Faceted Search, Related Entity Recognition based on Feature-based Similarity Measures

¢ Allowing users of the platform to provide their own DSL, data sources, and graphic rules.

¢ SPARQL Endpoint availability and latency ¢ Ontology Reuse instead of Ontology Alignment!

Page 51: Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

Thank You!

[email protected]