Top Banner
iEVOBIO 2011 The role of grass-roots data sharing communities, standards and megasequencing projects in the genomics revolution Dawn Field NERC Centre for Ecology and Hydrology
57

2011Field talk at iEVOBIO 2011

Jan 27, 2015

Download

Technology

A keynote talk at iEVOBIO 2011 meeting - http://ievobio.org/. Has been a great meeting.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

The role of grass-roots data sharingcommunities, standards and

megasequencing projects in the genomics revolution

Dawn FieldNERC Centre for Ecology and Hydrology

 

Page 2: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Opportunities and Challenges

The era of genomics is just beginning...

...how will we cope with the data?

...how will we gain the most knowledge from this investment in data?

Page 3: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

PARADIGM SHIFTPARADIGM SHIFT1960-1990

16S RNA

1990-2010

Genomes

2010-2020

Pangenomes

Nikos Kyrpides

Page 4: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

GREAT CHALLENGESGREAT CHALLENGES

1995-2009 2010-2015

Finished 1000 3000

Draft 1000 10000

P. Chain et al. Science, 2009Genome Sequencing Projects on GOLD

September 2009, 5643 projects

0

1000

2000

3000

4000

5000

6000

Incomplete

Complete

Nikos Kyrpides

Page 5: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Page 6: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Culturable

Unculturable

Nikos Kyrpides

Page 7: 2011Field talk at iEVOBIO 2011

The trend is now increasingly geared towards

ever more ambitious megasequencing

projects...

Page 8: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Page 9: 2011Field talk at iEVOBIO 2011

And democratization of access to sequencing

power...

Just one example....

Page 10: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

(~80) 41 metagenomes“Global Ocean Survey” Sanger sequencing(Rusch et al, 2007)

Metagenomics: Putting data generating capacity into perspective with an example from Bergen

(1) 1 metagenomeSargasso SeaSanger sequencing(Venter et al, 2005)

(~120) 4 metagenomes &4 metatranscriptomesBergen mesocosm experimentPyrosequencing(Gilbert et al, 2008)

Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I. (2008) Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS ONE. Aug 22;3(8):e3042.

Page 11: 2011Field talk at iEVOBIO 2011

The Bergen ocean acidification study produced 19% of the reads produced in the GOS study and 5% of the total

basepairs of sequence.

Further evidence for the “Unknown Genome” and the

Dark Matter of the Tree of Life

Page 12: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

The

Data

- Flood

- Tsunami

- Deluge

?

Page 13: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

the data bonanza

Page 14: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

To exploit fully the promise of these data we need both scientific

innovation and community agreement on how to provide

appropriate stewardship of these resources for the benefit of all. 

Requires the evolution of our scientific, technological and sociological thinking....

Page 15: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

SuperMarket

The Genome Catalogue

Page 16: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

DataMarket Norman Morrison

Page 17: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Packaging data

Page 18: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Labels for data

<phenotype>

<environmen

tal context>

Page 19: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

standardsPrinciples: Not everything should be ‘standardized’Aggregation of data, information, and knowledge

requires standard ways of doing things

Standards provide foundations; Standards should drive innovation(think of electrical plugs or the internet)

Pick the right concepts to standardize – at the right time, with the right people

Requires good ‘group think’ – or ‘systems thinking’

Page 20: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Community-driven solutions:

The Common Path:

•Identify the problem•Define a community to address it•Define scope of the solution•Implement solution•Gain adoption of solution

Page 21: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

The Genomic Standards ConsortiumGSC 10

Argonne, 2010

GSC 11,Hinxton,

2010

Innovation through Collaboration

GSC 12Bremen,

2011

GSC 13BGI 2012

Page 22: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

The GSC’s Mission

• the implementation of new genomic standards

• methods of capturing and exchanging metadata

• harmonization of metadata collection and analysis efforts across the wider genomics community

Page 23: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

The GSC fulfills its mission by

• Organizing meetings • Forming working groups• Creating Consensus Products

Page 24: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Pelin Yilmaz et al 2011

Page 25: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Page 26: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Use of MIGS/MIMS/MIENS

Please provide this minimum information when you publish

•a genome•a metagenome•a gene marker study (i.e. ribosomal genes)

Genbank, EMBL and DDBJ now accept this information and encourage its submission to their public DNA databases

Page 27: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Labels for data

<MIGS><MIMS>

Page 28: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Page 29: 2011Field talk at iEVOBIO 2011

Goal:Goal:International effort to sequence a reference genome for every cultured Archaeal and Bacterial organism (~9,000 microbes)

Goal:Goal:International effort to sequence a reference genome for every cultured Archaeal and Bacterial organism (~9,000 microbes)

The Microbial Earth The Microbial Earth ProjectProject

Phase I:Sequence one representative from every characterized microbial type type

speciesspecies

Phase I:Sequence one representative from every characterized microbial type type

speciesspecies

GEBAGEBAGEBAGEBA HMPHMPHMPHMP

Page 30: 2011Field talk at iEVOBIO 2011

iEVOBIO 201130

Source: Jack A. GilbertArgonne National Labs

http://earthmicrobiome.org

Page 31: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Field et al unpublished work on a Metadata Coverage Index (MCI)

MCI > 50

Page 32: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

GSC 5 at the EBI2008

Page 33: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Page 34: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Page 35: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

J BacteriologyJ Bacteriology

PNASPNAS

NatureNature

ScienceScience

SIGSSIGS

PLoS ONEPLoS ONE

Genome ResearchGenome Research

PLoS GeneticsPLoS Genetics

Nat BiotechNat Biotech

BMC GenomicsBMC Genomics

To

tal g

eno

me

pu

blic

atio

ns (

1995

- 2

011

)

Top ten journals publishing genome reports

Total 1160 Genome publicationsin 60 peer reviewed publications

Source - GenomesOnline DatabaseMay 28, 2011

Page 36: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Incentives for compliance

Page 37: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

MIGS compliant marine phage genomes

Page 38: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

GSC 9 at the JCVI – April 2010

Page 39: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Page 40: 2011Field talk at iEVOBIO 2011

Darwin Core

GSC MIxSPeter Dawyndt

Darwin core vs GSC MixS standard

Darwin core vs GSC MixS standard

Page 41: 2011Field talk at iEVOBIO 2011

Darwin Core

GSC MIxS standard

TaxonIdentification

Occurrence

IPR related info

EventLocation

GeologicalContextSamplingProtocolEnvironmentalConditions

Darwin core vs GSC MixS standard

Darwin core vs GSC MixS standard

Peter Dawyndt

Page 42: 2011Field talk at iEVOBIO 2011

Preliminary (first) conclusions

Preliminary (first) conclusions

•DC & GSC checklist more complementary than overlapping

how can we make these standards completely orthogonal?

Page 43: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Page 44: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

http://gensc.org

More Information about the GSC...

Page 45: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Feast of the Mind

Page 46: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Labels for data

<soil>

<water>

Page 47: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

http://environmentontology.org

Member of OBO Foundry http://obofoundry.org

Page 48: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

1) Pick terms2) View hits

3) Browse4) Follow links to primary

data

– building on ontologies

Users :

http://ontogrator.org Morrison et al, 2011 SIGS

Page 49: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Ontogrator approach depends on quality of

• Data Resources• Knowledge Organization Systems (KOS)

used

Can we use this approach to improve both?Can we complete the virtuous cycle?

Page 50: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Page 51: 2011Field talk at iEVOBIO 2011

Field, et al 2009. Science. 326:234-236. 

http://biosharing.org

Page 52: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Page 53: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Conclusions

• The era of genomics is just beginning…• Self-organization by the scientific community

can pay dividends (i.e. consensus building, large-scale co-ordination)– Standards are keys to unlocking data– Group thinking overcomes the tragedy of the

commons

• Emerging key players from the molecular domain – “one stop shops”– Genomic Standards Consortium– BioSharing – driving cross-community collaborations

Page 54: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Feast of the Mind

Page 55: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Future

• Analysis – proof sharing is beneficial• Making the field of data sharing more

quantitative – Objective measures of consensus– Useful Metrics: i.e. Metadata coverage index (MCI)– Modelling – i.e. how to best incentivize data

sharing?

• Further shared concepts– Minimum Information about a Sampling Site (MISS)– Minimum Data Policy– PubData?

Page 56: 2011Field talk at iEVOBIO 2011

AcknowledgementsBergen and L4 metagenomicsJack Gilbert Sue

HuseIan Joint Paul

SwiftPaul Somerfield Rob

Knight

NEBCBela TiwariTim BoothMesude Bicak

CEHNorman MorrisonDave Hancock

University of Manchester

Henning HermjakobChris Taylor

European Bioinformatics Institute

Susanna SansonePhilippe Rocca-SerraEamonn Maguire

Oxford University

Genomic Standards ConsortiumPeter Sterk

Page 57: 2011Field talk at iEVOBIO 2011

iEVOBIO 2011

Acknowledgements

Coordination, workshops, working groups,infrastructure and exchange visits

Additional workshop funds

Local Hosts of GSC workshops

Sponsors of GSC 9 and GSC 10

GSC FundingRCN4GSC