Top Banner
Data Driven Innovation Interoperability Tech Track (#agridata) 18 & 19 March 2015, Wageningen (@rfinkers)
35
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Data Driven Innovation

Interoperability Tech Track (#agridata)

18 & 19 March 2015, Wageningen (@rfinkers)

Page 2: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Outline

Introduction “Interoperable Genetic Diversity”

Concept ”Bring Your Own Data” party

Aim BYOD Green Genetics?

Outcome BYOD Green Genetics

Hands on

2

Page 3: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)
Page 4: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Climate change & Social disruption

4Photograph: AFP/Getty Imageshttp://www.theguardian.com/commentisfree/2015/mar/08/guardian-view-climate-change-social-disruption#img-1

Page 5: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)
Page 6: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Select a genetically diverse collection

6

Legacy databases (e.g. Uniprot)

Genome Sequence & Genome Annotation

Genome Variation Data (re-sequencing collections) & SNP annotation

Accession Passport Information

Accession Phenotype Information

Page 7: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Web based aggregation of Information

7

Page 8: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Interoperable Genetic Diversity

Genebanks should utilize genomics data

●But should not store them!

Genomics studies should make variant data available

●But need access to passport and characterization & evaluation data.

Breeders needs tools to access diversity

Finkers, van Hintum et al. 2014 DOI: 10.1017/S1479262114000689

Genebank (s)

Genomics provider(s)

Page 9: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Intermezzo: Linked Open Data

Standardization makes the information interoperable• Controlled vocabularies• Machine readable• Can all be queried by a single question vs. visiting

many websites

Page 10: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Interoperable Genetic Diversity (2)

Implications:

●Data can be stored at many different locations, but can be found by computers

●Newly published information (in the correct format) will be included automatically.

●Tools can be written to dedicated questions, such as assessing allelic variation or utilize for collection management

Finkers, van Hintum et al. 2014 DOI: 10.1017/S1479262114000689

Genebank (s)

Genomics provider(s)

Page 11: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Interdisciplinary Approach Needed

11

Genebanks Genomics provider(s)

Page 12: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Interdisciplinary Approach Needed

Need for Data Scientists & Domain Experts

12

Genebanks Genomics provider(s)

Page 13: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Format: Bring your own Data Workshop

1. Users define the question(s)2. Users and Linked data experts define concepts and ontologies3. Experts help to create linked data and formulate query

Page 14: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Bring Your Own Data Workshop

More Info: http://www.dtls.nl/fair-data/byod/

14

Data owners

Domain Experts

Trainers Linked Data

Experts

Page 15: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Example: Solanaceae Trait Ontology

Page 16: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

BYOD in action

Page 17: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Select a genetically diverse collection

17

Legacy databases (e.g. Uniprot)

Genome Sequence & Genome Annotation

Genome Variation Data (re-sequencing collections) & SNP annotation

Accession Passport Information

Accession Phenotype Information

Page 18: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Example Query

18

Page 19: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Outcome: Query Graph

19

Page 20: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

FAIRport* in VLPB?

*More on FAIRport in the presentation of Luiz Bonino, Thursday 10:30

Page 21: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Summary

Blueprint “Interoperable Genetic Diversity Shown”

BYOD resulted in interoperable data which could be queried

●Request your own BYOD?

Public <-> Private integration possible

Page 22: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Select a genetically diverse collection

22

Legacy databases (e.g. Uniprot)

Genome Sequence & Genome Annotation

Genome Variation Data (re-sequencing collections) & SNP annotation

Accession Passport Information

Accession Phenotype Information

Page 23: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Select a genetically diverse collection

23

Legacy databases (e.g. Uniprot)

Genome Sequence & Genome Annotation

Genome Variation Data (re-sequencing collections) & SNP annotation

Accession Passport Information

Accession Phenotype Information

Page 24: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Working Prototype

screendump

24

Page 25: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Questions?

Acknowledgements:

BYOD team

Theo van Hinthum & Frank Menting (CGN)

Denis Guryunov & Martijn van Kaauwen (prototype)

et. all.

Page 26: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

HaploSmasher Hands On Session

HaploSmasher Prototype:

●genomic regions as input: SL2.40ch03:10000..10200

●Solyc gene identifiers: Solyc10g085020

●Filter SNPs on impact type ● HIGH, MODERATE, LOW, MODIFIER

(SNPEff )

●No input validation yet● Use correct notation, existing Solyc

gene ID’s

Page 27: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

HaploSmasher

Page 28: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

HaploSmasher

Query CGN FAIRdata graph

● Prototype is only generating links to CGN passport data now

● Graph data of three CGN accessions is available in our testset

Page 29: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

HaploSmasher examples:

Haplotype Output

Page 30: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

Example queries

http://www.plantbreeding.wur.nl/hs/

Also, explore variation data & Linked resources

●http://www.tomatogenome.net

Examples:

●Beta-tubulin: Solyc10g085020●HIGH & MODERATE vs. ALL effects

●Glutamate dehydrogenase Solyc05g052100●Uridine kinase Solyc02g067880●magnesium chelatase Solyc04g015750

30

Page 31: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

HaploSmasher examples:

Conserved housekeeping genes:

● Beta-tubulin Solyc10g085020 439 AA

● 1 SNP (HIGH & MODERATE effect) , two haplotypes

Page 32: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

HaploSmasher examples:

● Beta-tubulin Solyc10g085020 439 AA

● 136 SNPs (all SNPEff impact types)

● Part of haplotype groups:

Page 33: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

HaploSmasher examples:

● Glutamate dehydrogenase Solyc05g052100

● 13 SNPs (HIGH, MODERATE)

Page 34: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

HaploSmasher examples:

● Uridine kinase Solyc02g067880

● 23 SNPs (HIGH, MODERATE)

● Example haplotype groups:

Page 35: Data Driven Innovation - Interoperable Genebanks (Tech Track Session)

HaploSmasher examples:

● magnesium chelatase Solyc04g015750

● 21 SNPs (HIGH, MODERATE)

● Example haplotype groups: