Sharing Genetic Variation Data via EMBL-EBI: The European Variation Archive Gary Saunders, PhD .

Post on 18-Jan-2016

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Sharing Genetic Variation Data via EMBL-EBI: The European Variation Archive

Gary Saunders, PhD

www.ebi.ac.uk/eva

Agenda

• Overview of European Variation Archive (EVA)

• EVA model of data sharing

• Summary of how we share genetic variation data

• Merging open-access variation datasets

European Variation Archive – EVA (Eva)

• Curated genetic variation data sharing & analysis platform

• All types of variation:

• SNVs, MNVs, small indels and structural variation

• Germ line, somatic, within / cross population, potentially between speciesSingle portal for open access variation data

EVA Data Sharing Model

Submitter Archived at EBI

Sample(s) Methodology Genome

EVA

EVA Data Sharing Model

Submitter Archived at EBI

Sample(s) Methodology Genome

EVA Publication

Collaborators

Wider Study Data

Stable POA Credit for reuse

EVA: Study Browser

• Core EVA functionality: portal to open-access genetic variation project data (VCF files):

EVA: Study Browser – project pages

• Core EVA functionality: portal to open-access genetic variation project data (VCF files):

EVA: Study Browser – assessing data quality• Core EVA functionality: portal to open-access

genetic variation project data (VCF files):

Submission to EVA

• Minimal or data-rich submissions are accepted

• Collaborative process

• Submitter recognition

• Hold date

• Links to runs / experiments / analyses

• Accession number in 48 hours

• EVA has a dynamic study loading pipeline

• Online documentation

• eva-helpdesk@ebi.ac.uk

Rate of Submission to EVA

Non-human

Total

March 2014 October 2015

1 billion

Merging Open-Access Datasets

share data

Merging Open-Access Datasets

Data submitters

share data

Merging Open-Access Datasets

Data submitters

share data

Merging Open-Access Datasets

Merging Open-Access Datasets

Merging Open-Access Datasets

Conclusion

European Variation Archivewww.ebi.ac.uk/eva

• Open-access genetic variation archive

• Curated resource

• All types of variants

• All species

• Simplified submission system

FundingEVAJustin Paschall

Ignacio Medina Castello

Gary Saunders

Cristina Yenyxe Gonzalez

Jag Kandasamy

Ilkka Lappalainen

EGAJeff Almeida-King

Vasudev Kumanduri

Saif Ur-Rehman

Tom Smith

AcknowledgmentsEnsembl VariationFiona Cunningham

Sarah Hunt

William McLaren

Anja Thormann

Laurent Gil

ENARasko Leinonen

Rajesh Radhakrishnan

Daniel Vaughan @ebivariation

www.ebi.ac.uk/eva

Case-study: deCODE

• Variation data from 2000 Icelanders

• VCF files

• Novel samples and metadata, custom reference genome

• Hold until publication

Case-study: deCODE

Case-study: deCODE

Variant Call Format (VCF): The Community Standard

• Most VCF validation tools do not truly conform to specification:

• Of all ~250 Human VCFs loaded to EVA < 10% were truly valid in first pass

• (EVA has publicized comprehensive C++ VCF validator that raises errors and warnings)

Most VCFs publically available are not truly valid

Sharing Genetic Variation Data

• Data accuracy

• Metadata

• Links to associated data

• Credit to data generator(s) for reuse

PROBLEMS

Sharing Genetic Variation Data

top related