Transcript
RDF2ES : Bring KaBOB online as RDF REST services using ElasticSearch
by François Belleau
BD2K Hackathon 1: San Diego May 7-9, 2015
Goal of the project
● Experiments with elasticsearch as a triplestore to expose RDF on the web
● Create a mashup like the one of KaBOB project
● Expose linked data as JSON-LD document accessible via REST
● Explore and visualize linked data using Kibana software
1) Build an Elasticsearch cluster
● Our bioinformatic lab has a 4 node (16 Go RAM and 750 Go DD) cluster on which elasticsearch has been installed.
2) Load some of KaBOB datasource into ElasticsearchKaBOB: ontology-based semantic integration of biomedical databaseshttp://www.biomedcentral.com/1471-2105/16/126/abstract
KaBOB recent paper describes how a mashup have been created using 14 ontologies and 18 data sources converted to RDF, all loaded into a triplestore which is not made public. Great work, a mashup well designed based on ontologies and data normalization a quality standard never really put into Bio2RDF's triplestores. Nice work but not available to the bioinformatician community and it is a lot of work to rebuild it from scratch.
KaBOB currently imports the following 14 ontologies:
3. Chemical Entities of Biological Interest (ChEBI) [11] (54,838 from ONTOBEE)
(GO) [7] (42,807 from ONTOBEE)
KaBOB currently imports data from the following 18 data sources:
2. DrugBank [21] (19,844 from Bio2RDF)
4. UniProt Gene Ontology Annotation (GOA) [23]
5. HUGO Gene Nomenclature Committee (HGNC) [24] (43,407 from Bio2RDF)
6. HomoloGene [25] (18,712 from Bio2RDF)
8. InterPro [27] (25,272 from Bio2RDF)
12. NCBI Gene [31] (47,728 from Bio2RDF)
13. Online Mendelian Inheritance in Man (OMIM) [32] (14,609 from Bio2RDF)
18. UniProt [37] (124,567 from UniProt)
2.1) KaBOB loaded datasource in ESKaBOB currently imports the following 14 ontologies:3. Chemical Entities of Biological Interest (ChEBI) [11] (54,838 from ONTOBEE)5. (GO) [7] (42,807 from ONTOBEE)
KaBOB currently imports data from the following 18 data sources:2. DrugBank [21] (19,844 from Bio2RDF)4. UniProt Gene Ontology Annotation (GOA) [23]5. HUGO Gene Nomenclature Committee (HGNC) [24] (43,407 from Bio2RDF)6. HomoloGene [25] (18,712 from Bio2RDF)8. InterPro [27] (25,272 from Bio2RDF)12. NCBI Gene [31] (47,728 from Bio2RDF)13. Online Mendelian Inheritance in Man (OMIM) [32] (14,609 from Bio2RDF)18. UniProt [37] (124,567 from UniProt)
3) Talend ETL process to transform RDF from the web to JSON-LD
4) Build REST JSON-LD service using Talend ESB
4.1) http://json-ld.bio2rdf.org
4.2) Describe service
4.3) Search service
4.3) Links service
5.1) Visualize search results with the dashboard
Conclusion
1. Semantic web is :a. RDFb. SPARQLc. Linked data
2. Big Data is :a. Clusterb. JSONc. MongoDB, Hadoop, Elasticsearchd. and peta bytes
May be BD2K will be RDF+JSON-LD cluster ?
Thanks for the hackathon and good luck to all hackers.
top related