Top Banner
Chunlei Wu, Ph.D. [email protected] @chunleiwu Associate Professor of Molecular Medicine Dept. of Molecular Experimental Medicine The Scripps Research Institute La Jolla, CA, USA 01/22/2016 From MyGene.info and MyVariant.info towards BioThings API
35

Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Feb 12, 2017

Download

Science

Chunlei Wu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Chunlei Wu, [email protected]

@chunleiwu

Associate Professor of Molecular MedicineDept. of Molecular Experimental Medicine

The Scripps Research InstituteLa Jolla, CA, USA

01/22/2016

From MyGene.info and MyVariant.info towards BioThings API

Page 2: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

As a

MyGene.info and MyVariant.info recap

AnnotationsGeneVariant(Aggregated)

(high-performance)(real-time) Web Service

Page 3: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

So many variant annotation resources

dbNSFP

The Exome Aggregation

Consortium (ExAC)

Page 4: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Annotations centered around bio-entities

Gene

GVariant

V

Pathway

P

D

Metabolite

M

Disease

Page 5: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Simple JSON-based Aggregation mechanism

{ "_id": "chr1:g.196659237C>T", "cadd": { … }, "clinvar": { … }, "cosmic": { … }, "dbsnp": { … }, "dbnsfp": { … }, "evs": { … }, "emv": { … }, "mutdb": { … }, "gwassnp": { … }, "snpedia": { … }, "wellderly": { … }}

{ "_id": "chr1:g.196659237C>T", “dbsnp": { "snpclass": "single", "rsid": "rs1061170", "func": "missense" }}

{ "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }}

{ "_id": "chr1:g.196659237C>T", “dbnsfp": { “sift": { "breast“: “tolerated”, “val”: 1 } }}

“cadd” “clinvar” “evs” “mutdb”

Page 6: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Keep data always up-to-date

Each data source is updated individually. Colors indicate their different updating schedules.

Schematic view of MyVariant.info architecture

Page 7: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

High-performance web service APIs

Schematic view of MyVariant.info architecture

Page 8: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyVariant.info for the end users:

http://MyVariant.info(currently v1 API, two endpoints)

http://MyVariant.info/v1/query?q=<query>

any query term(s)

matching variant hits

http://MyVariant.info/v1/variant/<variantid>

hgvs id(s)

matching variant object(s)

Both supports batch-mode via POST

Simple API. No sign-up. No API key.

Try our live API , and documentations

Page 9: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyGene.info for the end users:

http://MyGene.info(currently v2 API, two endpoints)

http://MyGene.info/v2/query?q=<query>

any query term(s)

matching gene hits

http://MyGene.info/v2/gene/<geneid>

gene id(s)

matching gene object(s)

Both supports batch-mode via POST

Simple API. No sign-up. No API key.

Try our live API , and documentations

Page 10: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyGene.info usage updates

lastyear

thisyear

2M

3MMonthly hits in Millions

Page 11: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Usage spikes (5M hits/day) during X-Mas 2014

Page 12: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

30%9%

35%26%

Increased clients adoptionRequests by MyGene.info clients

Highlights:• mygene Python client usage now surpasses BioGPS usage• mygene R client usage now increased to 9% from <1%

10/07/2015-01/05/2016

Page 13: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

30%9%

35%26%

Increased clients adoptionmygene Python client hosted in PyPI

mygene R client hosted in Bioconductor

Page 14: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyVariant.info updates

Total over 334 Millions of annotated variants

The Exome Aggregation Consortium (ExAC)

New additions:

dbNSFPUpdated:

Page 15: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyVariant.info updates

30%

68%2%

10/07/2015-01/05/2016

1 Million requests in 3 months

Page 16: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

MyVariant.info official Python/R Clients

myvariant Python client hosted in PyPI (initial release in Aug 2015)

myvariant R client hosted in Bioconductor(initial release in Oct 2015)

Page 17: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

A Node.js client made by a user with passion

Page 18: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Next?

MyVariant.info

MyGene.info

Page 19: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Make our APIs serve Linked Data

via

Page 20: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Why Linked Data?Gene

GVariant

V

Pathway

P

D

Metabolite

M

Disease

Page 21: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Linked Data for data aggregation

MyVariant.info

V

Another Variant API

V

V

Page 22: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Linked Data for data aggregation

MyVariant.info

Another Variant API

{ "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, …}

{ "pop": "GWD", "nobs": 226, "freq": 0.371681415929, …}

{ "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, "new_src": { "pop": "GWD", "nobs": 226, "freq": 0.371681415929 }, …}

Page 23: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

JSON + context = JSON-LD{ "@context": { "clinvar": "http://schema.myvariant.info/datasource/clinvar", "rcv": "http://schema.myvariant.info/datanode/rcv", "gene": "http://schema.myvariant.info/datanode/gene", "_id": "@id" }, "_id": "chr6:g.26093141G>A", "clinvar": { "@context": { "uniprot": "http://identifiers.org/uniprot/", "omim": "http://identifiers.org/omim/" }, "chrom": "6", "alt": "A", "ref": "G", "allele_id": 15048, "rsid": "rs1800562", "rcv": { "@context": { "accession": "http://identifer.org/clinvar" }, "accession": "RCV000000020", "origin": "germline", "clinical_significance": "risk factor" }, "gene": { "@context": { "symbol": "http://identifiers.org/hgnc.symbol/" }, "id": "3077", "symbol": "HFE" }, "omim": "613609.0001", "variant_id": 9 }}

Page 24: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Processed JSON-LD

<chr6:g.26093141G>A> <http://schema.myvariant.info/datasource/clinvar> _:b0 ._:b0 <http://identifiers.org/omim/> "613609.0001" ._:b0 <http://schema.myvariant.info/datanode/gene> _:b1 ._:b0 <http://schema.myvariant.info/datanode/rcv> _:b2 ._:b1 <http://identifiers.org/hgnc.symbol/> "HFE" ._:b2 <http://identifer.org/clinvar> "RCV000000020" .

JSON-LD N-Quads output:

{ "@id": "chr6:g.26093141G>A", "http://schema.myvariant.info/datasource/clinvar": { "http://identifiers.org/omim/": "613609.0001", "http://schema.myvariant.info/datanode/gene": { "http://identifiers.org/hgnc.symbol/": "HFE" }, "http://schema.myvariant.info/datanode/rcv": { "http://identifer.org/clinvar": "RCV000000020" } }}

JSON-LD compacted output:

Page 25: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

In a nut-shell, what JSON-LD context does?

Marks values in a JSON object to defined URIs

"http://identifer.org/clinvar" →clinvar.rcv.accession

Page 26: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

JSON-LD context makes your data

"Linkable"

"Linked"Downstream

processing libraries

Page 27: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

A Python library for processing JSON-LD data

In [1]: fetch_value_source_for_variant("chr6:g.26093141G>A","http://identifiers.org/dbsnp/")Out[1]:

['rs1800562 http://schema.myvarint.info/datasource/dbnsfp', 'rs1800562 http://schema.myvarint.info/datasource/clinvar', 'rs1800562 http://schema.myvarint.info/datasource/dbsnp', 'rs1800562 http://schema.myvarint.info/datasource/evs', 'rs1800562 http://schema.myvarint.info/datasource/gwassnps', 'rs1800562 http://schema.myvarint.info/datasource/mutdb']

By Kevin Xin

Page 28: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Need to define an API specs

• Output as a JSON object with a defined _id.• "jsonld=true/false" toggle for the inclusion of JSON-LD

context.• Support the retrieval of a single entity via GET

(use case: individual data aggregation on the fly)• Support the retrieval of a list of entities via POST

(use case: routine data aggregation in batches)• Output should indicate the entity existence:

GET /variant/<unknown_id> 404

POST /variant/ id1, <unknown_id>, id3 [id1: {…},

<unknown_id>: "notfound",id3: {…}]

to enable data exchange via JSON-LD

Page 29: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

BioThings API

MyVariant.info

MyGene.info

By Cyrus Afrasiabi

Page 30: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

BioThings API

MyVariant.info MyGene.info

JSON data aggregation mechanism

High-performance query engine

Well-designed REST API pattern

JSON-LD enabled Linked Data

Data-updating schedulerPython/R clients…

Page 31: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Data-sharing via Web API is trending

Making a single web service is trivial, but making a sustainable/scalable web API is non-trivial.

We would like to help other groups to create their own hosted web API for sharing their data.

Page 32: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Action item 1: BioThings API whitepaper

Also the action item from last BD2K CA consortium meeting and the API working group from last year's NIH BD2K AHM

Page 33: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Action item 2: BioThings API framework

NIH commonsInfrastructure as a Service:

Software as a Service:BioThings

API

Page 34: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Action item 3: expansion to other "BioThings"

D

Disease

D

Drugs

MyDrug.info MyDisease.infoneed an alt. name here

Page 35: Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info

Acknowledgement

Funding and SupportU54GM114833U01HG008473

Washtington U:Ben AinscoughObi Griffith

TSRI:

Andrew SuJiwen XinCyrus AfrasiabiGinger TsuengAdam Mark

Greg StuppTim Putman

STSI:

Eric TopolAli TorkamaniGalina Erikson

U. Washington:

Sean MooneyMoritz JuchlerNikhil Gopal

OICR:Robin Haw

UC Berkeley:Chris Mungall

UCSD:Trish Whetzel

MyVariant.info

MyGene.info