Top Banner
BME 130 – Genomes Lecture 5 Genome assembly I The good old days
28

BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Dec 14, 2015

Download

Documents

Byron Hext
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

BME 130 – Genomes

Lecture 5

Genome assembly IThe good old days

Page 2: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Administrivia

Homework 1 – on the website today, due Friday; homework policy

Student-led paper discussion; choose groups and pick paper

Guest lecture Friday – Bob Kuhn will demo the UCSC genome browser

Page 3: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Genomics in the newsGenomic Fossils Calibrate the Long-Term

Evolution of Hepadnaviruses

Citation: Gilbert C, Feschotte C (2010) Genomic Fossils Calibrate the Long-Term Evolution of Hepadnaviruses. PLoS Biol 8(9): e1000495. doi:10.1371/journal.pbio.1000495

Page 4: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.10 Genomes 3 (© Garland Science 2007)

Page 5: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.10 part 1 of 2 Genomes 3 (© Garland Science 2007)

Page 6: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.10 part 2 of 2 Genomes 3 (© Garland Science 2007)

Page 7: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Sequence assembly

de novo

reference-guided

overlap layout consensus

s1

s2

s3

s4

s5

s6

s1 s2 s3 s4 s5 s6s1

s2

s5

s3

s4

s6

s1

s2

s5

s3

s4

s6

s1s2

s5 s3 s4s6

Reference sequence

Page 8: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

de novo sequence assembly

overlap

s1

s2

s3

s4

s5

s6

s1 s2 s3 s4 s5 s6

Most CPU and memory demanding

stage

Phusion: group reads sharing >= 11 k-mers of 17 bases

Phrap: “banded” alignment of reads around k-mer matches; tolerate alignment mismatches of low-quality bases

Celera: k-mer seed and extend alignment of reads

Arachne: 24-mer seed and extend alignment of reads

newbler: flowgram similarities (?)

Page 9: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Generate alignments s1

s2

s5

s3

s4

s6

de novo sequence assembly

Wide range of strategies for the layout stage, many using mate-pair

information

s1

s2

s3

s4

s5

s6

s1 s2 s3 s4 s5 s6

Find connected

components

s1 s2

s3

s4

s5

s6

Page 10: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

consensus

s1

s2

s5

s3

s6

de novo Sequence assembly

s4

PHRAPConsensus base is base with

highest quality score Quality score for position is based

on all reads quality scores

PCAP/CAP3Sum up quality scores for each

base take base with highest sumQuality score for position:

highest sum – all other sums

Page 11: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

s1

s2

s5 s3 s4

s6

Reference sequence

Reference-guidedsequence assembly

Advantages(much) faster

(much) less memory

DisadvantagesIndels/rearragements

Lack of closely related referenceBias towards reference similarity

Pop M et al., “Comparative Genome Assembly”Brief Bioinform. 2004 Sep;5(3):237-48.

Page 12: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.11a Genomes 3 (© Garland Science 2007)

Why is this called a sequence gap and not a physical gap?

Page 13: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Closing a physical gap means finding a physical clone to

sequence that will span the gap

Page 14: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.11b Genomes 3 (© Garland Science 2007)

Genomic DNA is template for this PCR

Page 15: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.12 Genomes 3 (© Garland Science 2007)

Chromosome walking(is slow)

Page 16: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.13 Genomes 3 (© Garland Science 2007)

PCR from clone libraryInsert 1 connects to who?

Page 17: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.14 Genomes 3 (© Garland Science 2007)

Page 18: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.15 Genomes 3 (© Garland Science 2007)

Page 19: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.15a Genomes 3 (© Garland Science 2007)

Page 20: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.15b Genomes 3 (© Garland Science 2007)

Page 21: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.15c Genomes 3 (© Garland Science 2007)

Page 22: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.15d Genomes 3 (© Garland Science 2007)

Page 23: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.16 Genomes 3 (© Garland Science 2007)

Assembly can by validated by mate-pair information

Page 24: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.16a Genomes 3 (© Garland Science 2007)

Page 25: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.16b Genomes 3 (© Garland Science 2007)

Page 26: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.17a Genomes 3 (© Garland Science 2007)

Page 27: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.17b Genomes 3 (© Garland Science 2007)

Page 28: BME 130 – Genomes Lecture 5 Genome assembly I The good old days.

Figure 4.18 Genomes 3 (© Garland Science 2007)