Top Banner
RDF/SPARQL: a UniProtKB/Swiss-Prot practical perspective Jerven Bolleman Developer Swiss-Prot Group
18

Why sparql tohu

Jan 22, 2018

Download

Science

Jerven Bolleman
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Why sparql tohu

RDF/SPARQL: a UniProtKB/Swiss-Prot practical perspective

Jerven Bolleman Developer Swiss-Prot Group

Page 2: Why sparql tohu

Our Goals

• ProvidecoreBioinformaticsresources

– UniProtKB/

– …

• Provideservicesandinfrastructure

– Vital-IT:HPCforthelife-sciences

– …

Page 3: Why sparql tohu

GeneticVariationsandDiseasesinUniProtKB/Swiss-Prot:

TheInsandOutsofExpertManualCuration

Famiglietti, et al.

We annotate a lot of disease/variants!

http://europepmc.org/abstract/MED/24848695

Page 4: Why sparql tohu

Why provide a public SPARQL endpoint

• A10manwetlaboratorycannotafford:

Page 5: Why sparql tohu

Why provide a public SPARQL endpoint

• A10manwetlaboratorycannotafford:

– tohosttheirowndatabaseinhouseholdingallorevenabitofalllifesciencedata.

Page 6: Why sparql tohu

Why provide a public SPARQL endpoint

• A10manwetlaboratorycannotafford:

– tohosttheirowndatabaseinhouseholdingallorevenabitofalllifesciencedata.

– nottohaveaccess,anduse,existinglifescienceinformation.

Page 7: Why sparql tohu

← Not CPU Time...But Brain Time

The right kind of optimisation

Page 8: Why sparql tohu

Why provide a public SPARQL endpoint

• ClassicalSQLcanbeprovidedontheweb

–Isnotpractical–Nofederation–Poorstandardsconformance

• Local SQL is expensive • LocalJSONisnobetter

• NorislocalXML

Page 9: Why sparql tohu

Data Integration Traditional

Pathway.txt

UniProt.txt

Pathway Parser

UniProt Parser

Pathway Schema

UniProt Schema

Own Lab Data

Data warehouse

SQL queries

$

$

$

$

$

$

Page 10: Why sparql tohu

Data Integration RDF/SPARQL

Pathway.rdf

UniProt.rdf

Own Lab Data

Triple Store SPARQL Queries

$

$?

Page 11: Why sparql tohu

Why not some other graph database?

EcosystemRDF enables sharing and reuse of data at low cost

Identity Precision Standards

Page 12: Why sparql tohu

Why provide a public SPARQL endpoint

• DocumentcentricRESTisnotenough

–Swiss-ProtavailableasREST–(over e-mail !!) since 1986

–expasy.ch since 1993 –www.uniprot.orgsince2002

• Most user use a GUI not a CLI • developersbuildGUIonaCLI

Page 13: Why sparql tohu

13© 2015 SIB

Page 14: Why sparql tohu
Page 15: Why sparql tohu

100

10'000

1'000'000

2015-01

2015-02

2015-03

2015-04

2015-05

2015-06

2015-07

2015-08

2015-09

queries ask selectconstruct describe

Queries per month in 2015 peak: 4 million per month

Page 16: Why sparql tohu

Real users

Mix between hard analytics and super specific

Estimate somewhere between: 400 - 1200 real humans per month

We know they are real because they take holidays ;)

Page 17: Why sparql tohu

Questions?