Genomic Sequence Questions •How are sequence maps of genomes produced? •How is the information in the genome deciphered? •What can comparative genomics reveal about genome structure and evolution? •How does the availability of genomic sequence affect what we can do/ask?
53
Embed
Genomic Sequence Questions - UWI St. Augustine · PDF fileIntroduction to next-gen sequencing bioinformatics.ca Genomic Sequence Questions •How are sequence maps of genomes produced?
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Introduction to next-gen sequencing bioinformatics.ca
Genomic Sequence Questions
• How are sequence maps of genomes produced? • How is the information in the genome deciphered? • What can comparative genomics reveal about genome structure and evolution? • How does the availability of genomic sequence affect what we can do/ask?
The human nuclear genome viewed as a set of labeled DNA
Chapter 13 Opener
3072-CHARAC/Page 1953 REAMS OF PAPER TO PRINT OUT DNA=1.8 Meters!AGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGCGGGG AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCACATGCTACTAGCTGATATTCCTTCCGCGCGGCCGGCGAATCATTTACGTAAAAAAATTTTTCGC!AGCTAGTACAGCAGCTAGGCCGCATCATTAATTCGTATATATATATTCTCTCTCTAGAGCATCA
“Normal” DNA synthesis without dideoxy terminators
Figure 7-15
The structure of 2’,3’-dideoxynucleotides
The dideoxy sequencing method
Figure 20-16a
The dideoxy sequencing method
Figure 20-16b
Lecture 3.0 8
Principles of DNA Sequencing 5’
5’ Primer
3’ Template G C A T G C
dATP dCTP dGTP dTTP ddATP
dATP dCTP dGTP dTTP ddCTP
dATP dCTP dGTP dTTP ddTTP
dATP dCTP dGTP dTTP ddCTP
GddC
GCATGddC
GCddA GCAddT ddG
GCATddG
Lecture 3.0 9
Principles of DNA Sequencing G
C
T
A
+
_
+
_
G C A T G C
short
long
Lecture 3.0 11
Multiplexed CE with Fluorescent detection
ABI 3700 96x700 bases
Small Fragments
Large Fragments
Large Fragments
Introduction to next-gen sequencing bioinformatics.ca
• Assembly (often multiple versions) – Depth (coverage) – Gaps (sequence and physical) – Scaffolds (100 N convention)
Lecture 3.0 14
Shotgun Sequencing
Isolate Chromosome
ShearDNA into Fragments
Clone into Seq. Vectors Sequence
Lecture 3.0 15
Shotgun Sequencing
Sequence Chromatogram
Send to Computer Assembled Sequence
Introduction to next-gen sequencing bioinformatics.ca Lecture 3.0 16
Shotgun Sequencing
• Very efficient process for small-scale (~10 kb) sequencing (preferred method)
• First applied to whole genome sequencing in 1995 (H. influenzae)
• Now standard for all prokaryotic genome sequencing projects
• Successfully applied to D. melanogaster • Moderately successful for H. sapiens
Genome sequencing is now automated
Figure 13-3
• Laboratory Intensive
• Physical Maps • Chromosome isolation • “Walking” • Slow, but you always know where you are
• Computationally Intensive • Fast to generate data, use any of many technologies to randomly generate sequence from a variety of sources • Slow to put all the pieces back together
There are two main approaches to genome sequencing
Chromosome walking
Figure 20-13
A physical map puts clones in order
Figure 13-7a
Strategy for ordered-clone sequencing
Figure 13-8
The logic of creating a sequence map of the genome
Figure 13-2
End reads from multiple inserts may be overlapped to produce a contig
Figure 13-4
To sequence a genome, plasmids with different, but known, insert sizes are
required
• Small insert plasmid library ~2kb +/- 100 bp • Medium insert plasmid library ~10kb +/- 500 bp • Large insert library ~50kb +/- 1kb
Note: Regardless of the insert size in the library, all clones are sequenced using “mated” or paired-end sequencing
2kb
10kb
50kb
Strategy for whole-genome shotgun sequencing assembly
What do you do if you encounter a “GAP” a region that is missing from contigs? (Note: Contig = Contiguous sequence)
Paired-end reads may be used to join two sequence contigs
Introduction to next-gen sequencing bioinformatics.ca
Introduction to next-gen sequencing bioinformatics.ca
The Bioinformatic Pipeline
• Many software packages, the most widely use free suite is: Phred-Phrap-Consed
• Quality are obtained and files generated • Vector sequences are removed • A repeat library is constructed and sequences
are masked • Reads are assembled, viewed and assessed • Primers are designed to close gaps
Introduction to next-gen sequencing bioinformatics.ca
Genomic Sequence
• What was the sequencing strategy? • What is the genome size? Repeat content? • What “fold” coverage exists? 1X? 10X? • Has host and vector contamination been
removed?
Introduction to next-gen sequencing bioinformatics.ca
The Plasmodium falciparum Genome
• Approx 30 million bp in size, distributed in 14 chromosomes
• Genome project is an internationally funded effort,(NIH, Wellcome Trust, Burroughs Wellcome Foundation)
• Sequence is being generated at 3 different sites, (Sanger Centre, Stanford, TIGR)
• Sequence is nearly complete in terms of total coverage but unfinished in terms of assembly
• Sequence is nearly 80% A/T in composition
Introduction to next-gen sequencing bioinformatics.ca
The sequencing Strategy
• Separate chromosomes on a pulse-field gel • In some cases, make chromosome-specific
BAC’s or YAC’s • Shotgun sequence smaller plasmids • Remove contaminants (vector, E. coli, yeast) • Assemble “contigs”
P. falciparum Statistics (3D7)
11
13
10
Add consed picture
Contig Assembly
Chromosome
BACs YACs
Shotgun Clones (Plasmids)
Contiguated Clones
Contig Assembly Problems
Chromosome
BACs YACs
X
Physical gap, no cloned DNA exists PCR Library Walking
Contig Assembly Problems
Sequence Gap, clone exist but no sequence read
X
Contig Assembly Problems
Repetitive DNA elements
Introduction to next-gen sequencing bioinformatics.ca
The Nature of Unfinished Unannotated Sequence
• Fragmented • May contain vector or library host DNA • May have sequence gaps • May be mis-assembled • Genes and features are not identified • Probably will NEVER be “finished”
Module 1 Introduction to next-gen sequencing
FRANCIS OUELLETTE
History of DNA Sequencing
Avery: Proposes DNA as ‘Genetic Material’
Watson & Crick: Double Helix Structure of DNA
Holley: Sequences Yeast tRNAAla
1870
1953
1940
1965
1970
1977
1980
1990
2002
Miescher: Discovers DNA
Wu: Sequences λ Cohesive End DNA
Sanger: Dideoxy Chain Termination Gilbert: Chemical Degradation
Human microbiome, deep environmental sequencing, Bar-Seq
Other Epigenome, rearrangements, ChIP-Seq
Introduction to next-gen sequencing bioinformatics.ca
Differences between the various platforms:!
• Nanotechnology used."• Resolution of the image analysis."• Chemistry and enzymology."• Signal to noise detection in the software"• Software/images/file size/pipeline"• Cost $$$"
Next Generation DNA Sequencing Technologies Adapted from Richard Wilson, School of Medicine, Washington University, “Sequencing the Cancer Genome” http://tinyurl.com/5f3alk
Human Genome 6GB == 6000 MB
Req’d Coverage 6 12 30
3730 454 Illumina
bp/read 600 400 2X75
reads/run 96 500,000 100,000.000
bp/run 57,600 0.5 GB 15 GB
# runs req’d 625,000 144 12
runs/day 2 1 0.1 Machine days/human genome
312,500 (856 years)
144 120
Cost/run $48 $6,800 $9,300
Total cost $15,000,000 $979,200 $111,600
Introduction to next-gen sequencing bioinformatics.ca