Top Banner
Informatics Infrastructure at the start of the Second Decade of DNA Barcoding SUJEEVAN RATNASINGHAM BIODIVERSITY INSTITUTE OF ONTARIO UNIVERSITY OF GUELPH
39

Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Feb 11, 2017

Download

Data & Analytics

sratnasi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Informatics Infrastructure at the start of the Second Decade of DNA BarcodingSUJEEVAN RATNASINGHAM

BIODIVERSITY INSTITUTE OF ONTARIOUNIVERSITY OF GUELPH

Page 2: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

10+ Years 4M+ Records 0.5M+ Species 100+ Nations

Page 3: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Decade 1 - Capacity Building

Community DatabaseCollaborative Networks Software Tools

Data Standards

Page 4: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Data Standards

Complete

Locus

Quality Provenance

Page 5: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Building the Library

10,000

100,000

1,000,000

10,000,000

2004 2006 2008 2010 2012 2014

barcodesLinnean SpeciesBINs

Page 6: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Spiders

Birds

LepidopteraOf North America

Birds of Argentina

IUCNRedList

CITES

Fish

BeesAmphibians

Mammals

Taxonomic

Thematic

Geographic

Community Benchmarks

Mosquitoes

Page 7: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Collaborative Networks

0

2

4

6

8

10

12

14

16

2004 2006 2008 2010 2012 2014

Regi

ster

ed U

sers

(Tho

usan

ds)

Data Sharing

2005 – 102 users from 30 institutions

Page 8: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

1000+ Institutions from 94 countries sharing data on BOLD

Acr

oss N

atio

ns

Within Nations

100K+10K – 100K1K – 10K

BOLD User Network - 2015

Page 9: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

CanadaFranceUSA GermanyCosta Rica

United Kingdom Switzerland

Acr

oss N

atio

ns

Within Nations

100K+10K – 100K1K – 10K

Finland

BOLD User Network - 2015

Page 10: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

KenyaBelgiumMadagascar

NorwayAustria

SwedenJordan

ArgentinaChina Brazil

Spain Mexico PanamaPortugal

Pakistan Egypt South AfricaIndia

New Zealand

Netherlands

Acr

oss N

atio

ns

Within Nations

100K+10K – 100K1K – 10K

BOLD User Network - 2015

Page 11: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Acr

oss N

atio

ns

Within Nations

100K+10K – 100K1K – 10K

BOLD User Network - 2015

66 Other Countries from Every Continent

Page 12: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Testing the Library Depth

BBC Tree of Life, 2014

400K+163K+

80%500+

Specieswith Full Taxonomyof the BOLD LibraryOrders

Animals

Test Data:4000 species from 200 orders,20 per order

Page 13: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Reference

Queries

Testing the Library Depth

Page 14: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

0

25

50

75

100

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Top

Ma

tch

Sim

ilarit

y >98%

>95%

>92%

>90%

Testing the Library Depth

Page 15: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Testing the Library Depth

−0.05 0.00 0.05 0.10 0.15

−0.06

−0.04

−0.02

0.00

0.02

PCA1

PC2

ColeopteraDipteraEphemeropteraHemipteraHymenopteraLepidopteraPlecopteraThysanopteraTrichoptera

−0.05 0.00 0.05 0.10

−0.04

−0.02

0.00

0.02

0.04

PC1

PC2

ColeopteraDipteraEphemeropteraHemipteraHymenopteraLepidopteraPlecopteraThysanopteraTrichoptera

K-mers (k=3)

Ratnasingham, Ma, Hebert, in prep.

Amino Acid Composition

Page 16: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Barcode Index Number (BIN)Algorithm

Registry

• Tuned to the marker (COX1)

• Fixed parameters for balanced OTU generation

• Uses prior threshold but refines for each group

• Occurrence of DNA Barcode (place and time)

• Aggregation of all associated metadata

• Reusable - works across studies

Page 17: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

BOLD:AAF2716

Notioplusia illustratataOraesia Janzen 03 Ctenoplusia sp. ANIC1

Page 18: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

0 50000 100000 150000 200000 250000 300000

Mammalia

Birds

Insecta

Fish

Araneae

Mollusca

Plants

Fungi

Importance of registering OTUs

SpeciesBarcode Index NumbersUnregistered OTUs

0

10,000

20,000

30,000

40,000

50,000

60,000

LATITUDINAL RANGE

Page 19: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding
Page 20: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Allow anyone, anywhere, to identify any organism

Page 21: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

MAIL

Page 22: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding
Page 23: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

LifeScanner Solution Overview

Species Identification

ID Engine

Sample Collection

Sequencing

PCR

Prep

Partner Sequencing

Labs

Page 24: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding
Page 25: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding
Page 26: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Moving into decade 2

Page 27: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Emb

race

Big

Da

ta

Impact

Reporting

Analysis

Monitoring

Forecasting

Complexity (Data volume & Dimensionality)

What happened?

Why it happened?

What is happening?

What might happen?

Page 28: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Support for multiple scales

102 103

104

104

Page 29: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Support for NGS based Barcoding

Page 30: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

BOLD4

Launch of Beta on Sept 29,2015

Advancing the Automation ofSpecies Identification

Page 31: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Community Curated Libraries

Tier 3

Tier 2

Tier 1Purpose generated & reference specimens availableBarcode compliant & consistentKey species (e.g. Dirty 22, Domesticated & Bush meat)

Curated for consistency in taxonomic assignmentsBarcode compliant & consistentCITES/REDLIST (e.g. Endangered & controlled species)

Mined from BOLDLimited verification and only to be used as last resortDisease vectors & invasive species

78%

20%

25%

Page 32: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Community Defined Metadata Extensions

Rougerie R, Smith AM, Fernandez-Triana J, Lopez-Vaamonde C, Ratnasingham S Hebert PDN. 2011. Molecular analysis of parasitoid linkages (MAPL): gut contents of adult parasitoid wasps reveal larval host. Mol Ecol 20:179-186.

Page 33: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

More Analytical Tools

��

����

��

����

��

����

��

�� ����� ����� ����� ����� ������ ������

��� �������

���������

�������������� ��������������������������� ���� ����

���������

��������� �

����������������������!�

����������������������!�

���������������� ������!�

������� ��������!�

���������������������!�

�����������������������!�

�����������������!�

����������������!�

��������������!�

��� ��� �� ��

Page 34: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Community Developed Analytical Tools

Plug-in Framework

6000+ Analytical Packages in CRAN

Page 35: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Analytical Pipelines

Simple Pipeline

Page 36: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

BOLD 4 + SAP Lumira

Page 37: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

BOLD4 – Some other features

• Checklist Support (synonyms, progress, shopping lists)

• Data portal for core facilities

• Complete record histories

• RESL algorithm on your own datasets

• Storage and analysis of pre-clustered NGS data

Page 38: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Support for Metabarcoding

mBRAVELinkages & Partners

Metabarcoding Research And Visualization Environment

Page 39: Informatics Infrastructure at the start of the Second Decade of DNA Barcoding

Acknowledgements

BOLD Team

Paul HebertDan Janzen & Winnie HallwachsScott MillerAxel HausmannStefan KremerMany others!

Biodiversity Institute Collections TeamCanadian Center for DNA Barcoding Team

BOLD Users