Top Banner
273a Lecture 9, Aut08, Batzoglou 273a Lecture 9, Fall 2008 Quality of assemblies—mouse Terminology: N50 contig length N50 contig length If we sort contigs from largest to smallest, and Covering the genome in that order, N50 is the le Of the contig that just covers the 50 th percentil 7.7X sequence coverage
29

CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Quality of assemblies—mouse

Terminology: N50 contig lengthN50 contig lengthIf we sort contigs from largest to smallest, and startCovering the genome in that order, N50 is the lengthOf the contig that just covers the 50th percentile.

7.7X sequence coverage

Page 2: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Quality of assemblies—dog

7.5X sequence coverage

Page 3: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Quality of assemblies—chimp

3.6X sequence Coverage

AssistedAssembly

Page 4: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

History of WGA

• 1982: -virus, 48,502 bp

• 1995: h-influenzae, 1 Mbp

• 2000: fly, 100 Mbp

• 2001 – present human (3Gbp), mouse (2.5Gbp), rat*, chicken, dog, chimpanzee,

several fungal genomes

Gene Myers

Let’s sequence the human

genome with the shotgun

strategy

That is impossible, and

a bad idea anyway

Phil Green

1997

Page 5: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

$399 Personal Genome Service

$2,500 Health Compass service

$985 deCODEme(November 2007)

(November 2007)

(April 2008)

$350,000 Whole-genome sequencing(November 2007)

Genetic Information Nondiscrimination Act(May 2008)

Page 6: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

Whole-genome sequencing

Comparative genomicsGenome resequencing

Structural variation analysis

Polymorphism discoveryMetagenomicsEnvironmental

sequencingGene expression profiling

Applications

GenotypingPopulation genetics

Migration studiesAncestry inference

Relationship inferenceGenetic screening

Drug targetingForensics

Page 7: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Sequencing applications

Demand for more sequencing

Sequencing technology improvement

Increase in sequencing data output

New sequencing applications

Page 8: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologySanger sequencing

1975 1980 20081990 2000

$10.00

$1.00

$0.10

$0.01

Cost per finished bp:

Read length: 15 – 200 bp 500 – 1,000 bp

Throughput: “grad-student years” 2 ∙ 106 bp/day

Fred Sanger

Page 9: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologySanger sequencing

3 ∙ 109 bp

1x coverage

10x coverage

2 ∙ 106 bp/day= 40 years

× 3 ∙ 109 bp

10x coverage × 3 ∙ 109 bp × $0.001/bp = $30 million

Page 10: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Pyrosequencing on a chip

Mostafa Ronaghi, Stanford Genome Technologies Center

454 Life Sciences

Page 11: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologyNext-generation sequencing

Read length: 250 bp

Throughput: 300 Mb/day

Cost: ~ 10,000 bp/$

De novo: yes

Genome Sequencer / FLX

“short reads”

Page 12: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Single Molecule Array for Genotyping—Solexa

Page 13: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologyNext-generation sequencing

Read length: ~ 35 bp

Throughput: 300 – 500 Mb/day

Cost: ~ 100,000 bp/$

De novo: yes

Genome Analyzer SOLiD Analyzer

“microreads”

Page 14: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologyNext-generation sequencing

Read length: ~ 50-150 bp

Throughput: 3 Gb/day

Cost: ~ 3,000,000 bp/$

De novo: yes

Genome Analyzer SOLiD Analyzer

reads

Page 15: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Illumina Projections

Page 16: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Complete Genomics

$5,000 this summer Quality?...

1,000 genomes in 2009 20,000 genomes in 2010

Page 17: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Pacific Biosciences

Page 18: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

• 2006: $10 million• 2008: $100,000• 2009: $10,000• ? $1,000• ??? $100

So, how fast is cost going down?

Page 19: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Molecular Inversion Probes

Page 20: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Illumina Genotype Arrays

Page 21: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technology

Next-generation sequencing

Read length: 1 bp

Throughput: 1 – 2 Mb/day

Cost: 5,000 bp/$

De novo: no

Infinium Assay GeneChip Array

genotypes

“SNP chips”

Page 22: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Nanopore Sequencing

http://www.mcb.harvard.edu/branton/index.htm

Page 23: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technologyNext-generation sequencing

Page 24: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Sequencing technologySequencing technology

Technology Read length (bp)

Throughput (Mb/day)

Cost (bp/$)

De novo

Sanger 1,000 2 1,000

454 250 300 10,000

Solexa / ABI 35 500 100,000

SNP chip 1 2 5,000

Application Sanger 454 Solexa/ABISNP chip

Bacterial sequencing $

Mammalian sequencing $$$ $$not likely

today

Mammalian resequencing $$$ $$ $

Metagenomics $$ $

Genotyping $$$ $$$ $$$

Page 25: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, Batzoglou

Multiple Sequence Alignment

Page 26: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Evolution at the DNA level

…ACGGTGCAGTTACCA…

…AC----CAGTCCACCA…

Mutation

SEQUENCE EDITS

REARRANGEMENTS

Deletion

InversionTranslocationDuplication

Page 27: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Evolutionary Rates

OK

OK

OK

X

X

Still OK?

next generation

Page 28: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008

Orthology, Paralogy, Inparalogs, Outparalogs

Page 29: CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.

CS273a Lecture 9, Aut08, BatzoglouCS273a Lecture 9, Fall 2008