Top Banner
RDF2ES : Bring KaBOB online as RDF REST services using ElasticSearch by François Belleau BD2K Hackathon 1: San Diego May 7-9, 2015
15
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BD2K hackathon - Bio2RDF submission

RDF2ES : Bring KaBOB online as RDF REST services using ElasticSearch

by François Belleau

BD2K Hackathon 1: San Diego May 7-9, 2015

Page 2: BD2K hackathon - Bio2RDF submission

Goal of the project

● Experiments with elasticsearch as a triplestore to expose RDF on the web

● Create a mashup like the one of KaBOB project

● Expose linked data as JSON-LD document accessible via REST

● Explore and visualize linked data using Kibana software

Page 3: BD2K hackathon - Bio2RDF submission

1) Build an Elasticsearch cluster

● Our bioinformatic lab has a 4 node (16 Go RAM and 750 Go DD) cluster on which elasticsearch has been installed.

Page 4: BD2K hackathon - Bio2RDF submission

2) Load some of KaBOB datasource into ElasticsearchKaBOB: ontology-based semantic integration of biomedical databaseshttp://www.biomedcentral.com/1471-2105/16/126/abstract

KaBOB recent paper describes how a mashup have been created using 14 ontologies and 18 data sources converted to RDF, all loaded into a triplestore which is not made public. Great work, a mashup well designed based on ontologies and data normalization a quality standard never really put into Bio2RDF's triplestores. Nice work but not available to the bioinformatician community and it is a lot of work to rebuild it from scratch.

KaBOB currently imports the following 14 ontologies:

3. Chemical Entities of Biological Interest (ChEBI) [11] (54,838 from ONTOBEE)

(GO) [7] (42,807 from ONTOBEE)

KaBOB currently imports data from the following 18 data sources:

2. DrugBank [21] (19,844 from Bio2RDF)

4. UniProt Gene Ontology Annotation (GOA) [23]

5. HUGO Gene Nomenclature Committee (HGNC) [24] (43,407 from Bio2RDF)

6. HomoloGene [25] (18,712 from Bio2RDF)

8. InterPro [27] (25,272 from Bio2RDF)

12. NCBI Gene [31] (47,728 from Bio2RDF)

13. Online Mendelian Inheritance in Man (OMIM) [32] (14,609 from Bio2RDF)

18. UniProt [37] (124,567 from UniProt)

Page 5: BD2K hackathon - Bio2RDF submission

2.1) KaBOB loaded datasource in ESKaBOB currently imports the following 14 ontologies:3. Chemical Entities of Biological Interest (ChEBI) [11] (54,838 from ONTOBEE)5. (GO) [7] (42,807 from ONTOBEE)

KaBOB currently imports data from the following 18 data sources:2. DrugBank [21] (19,844 from Bio2RDF)4. UniProt Gene Ontology Annotation (GOA) [23]5. HUGO Gene Nomenclature Committee (HGNC) [24] (43,407 from Bio2RDF)6. HomoloGene [25] (18,712 from Bio2RDF)8. InterPro [27] (25,272 from Bio2RDF)12. NCBI Gene [31] (47,728 from Bio2RDF)13. Online Mendelian Inheritance in Man (OMIM) [32] (14,609 from Bio2RDF)18. UniProt [37] (124,567 from UniProt)

Page 6: BD2K hackathon - Bio2RDF submission

3) Talend ETL process to transform RDF from the web to JSON-LD

Page 7: BD2K hackathon - Bio2RDF submission

4) Build REST JSON-LD service using Talend ESB

Page 8: BD2K hackathon - Bio2RDF submission

4.1) http://json-ld.bio2rdf.org

Page 9: BD2K hackathon - Bio2RDF submission

4.2) Describe service

Page 10: BD2K hackathon - Bio2RDF submission

4.3) Search service

Page 11: BD2K hackathon - Bio2RDF submission

4.3) Links service

Page 12: BD2K hackathon - Bio2RDF submission

5) Explore with Kibanahttp://kabob.bio2rdf.org

Page 13: BD2K hackathon - Bio2RDF submission

5.1) Visualize search results with the dashboard

Page 14: BD2K hackathon - Bio2RDF submission

Conclusion

1. Semantic web is :a. RDFb. SPARQLc. Linked data

2. Big Data is :a. Clusterb. JSONc. MongoDB, Hadoop, Elasticsearchd. and peta bytes

May be BD2K will be RDF+JSON-LD cluster ?

Page 15: BD2K hackathon - Bio2RDF submission

Thanks for the hackathon and good luck to all hackers.