Top Banner
A peek inside the bioinformatics black box A/Prof. Torsten Seemann Victorian Life Sciences Computation Initiative (VLSCI) Doherty Centre for Applied Microbial Genomics (DCAMG) The University of Melbourne DCAMG Symposium - Melbourne, AU - Mon 20 July 2015
24

A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Aug 12, 2015

Download

Science

Torsten Seemann
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

A peek inside the bioinformatics black box

A/Prof. Torsten Seemann

Victorian Life Sciences Computation Initiative (VLSCI)Doherty Centre for Applied Microbial Genomics (DCAMG)

The University of Melbourne

DCAMG Symposium - Melbourne, AU - Mon 20 July 2015

Page 2: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Bioinformatics

Page 3: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Bioinformatics

Page 4: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

What’s inside the black box?

Page 5: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

It’s black boxes, all the way down.

Data

Algorithms

Software

Analyses

Page 6: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

We use real black boxes too!

Page 7: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

The data

Page 8: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

The currency of genomics

Reads

Reads are stored in FASTQ files

Genome

Page 9: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

What data do we really have?

Isolate genomeSequenced reads

Other isolates in sequencing run

Contamination

Unsequenced regions

What we want

Page 10: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Metadata

■ Genome data itself is of limited value

■ Needs “extra” information

□ location: Australia 37.8S,145.0E □ date: 2015 2015-07-20□ source: human 60yo male faecal swab□ etc.

Page 11: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Got my reads, now what?

Page 12: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

De novo genome assembly

Page 13: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Draft vs. Finished genomes

250 bp - Illumina - $100 8000 bp - Pacbio - $1000

Page 14: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Compare to already assembled genomes

AGTCTGATTAGCTTAGCTTGTAGCGCTATATTATAGTCTGATTAGCTTAGAT

ATTAGCTTAGATTGTAG

CTTAGATTGTAGC-C

TGATTAGCTTAGATTGTAGC-CTATAT

TAGCTTAGATTGTAGC-CTATATT

TAGATTGTAGC-CTATATTA

TAGATTGTAGC-CTATATTAT

SNP Deletion

Reference

Reads

Page 15: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Best practice

■ Use both approaches□ reference-based + de novo

■ Best of both worlds□ and worst of both worlds - interpretation is non-trivial

■ Still need□ good epidemiology, metadata and domain knowledge!

Page 16: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Comparing genomes

Page 17: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Core vs. Pan genome

Core is common to all & has similar sequence.

Page 18: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Pan genome analysis

Rows are genomes, columns are genes.

Page 19: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Phylogenomics

Page 20: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Inferring transmission

■ Identical sequence does not imply transmission

■ Easier to rule out than in

Page 21: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

Conclusion

Page 22: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015

The future

■ Genomics is delivering on the promise□ still not maximally exploited

■ Directions

□ more use of pan-genome□ understanding recombination / horizontal transfer□ dynamics of microevolution□ useful visualization of large data sets□ open science: data sharing, open source software

Page 23: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
Page 24: A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015