262 Lecture 11, Win07, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the circular genome (host) that incorporated the fragment BAC Bacterial Artificial Chromosome, a type of insert–vector combination, typically of length 100-200 kb read a 500-900 long word that comes out of a sequencing machine coverage the average number of reads (or inserts) that cover a position in the target DNA piece shotgun the process of obtaining many reads sequencing from random locations in DNA, to detect overlaps and assemble
28
Embed
CS262 Lecture 11, Win07, Batzoglou Some Terminology insert a fragment that was incorporated in a circular genome, and can be copied (cloned) vector the.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CS262 Lecture 11, Win07, Batzoglou
Some Terminologyinsert a fragment that was incorporated in a circular genome, and can be copied (cloned)
vector the circular genome (host) that incorporated the fragment
BAC Bacterial Artificial Chromosome, a type of insert–vector combination, typically of length 100-200 kb
read a 500-900 long word that comes out of a sequencing machine
coverage the average number of reads (or inserts) that cover a position in the target DNA piece
shotgun the process of obtaining many reads sequencing from random locations in DNA, to
detect overlaps and assemble
CS262 Lecture 11, Win07, Batzoglou
Whole Genome Shotgun Sequencing
cut many times at random
genome
forward-reverse paired reads
plasmids (2 – 10 Kbp)
cosmids (40 Kbp) known dist
~500 bp~500 bp
CS262 Lecture 11, Win07, Batzoglou
Fragment Assembly(in whole-genome shotgun sequencing)
CS262 Lecture 11, Win07, Batzoglou
Fragment Assembly
Given N reads…Given N reads…Where N ~ 30 Where N ~ 30
million…million…
We need to use a We need to use a linear-time linear-time algorithmalgorithm
CS262 Lecture 11, Win07, Batzoglou
Steps to Assemble a Genome
1. Find overlapping reads
4. Derive consensus sequence ..ACGATTACAATAGGTT..
2. Merge some “good” pairs of reads into longer contigs
3. Link contigs to form supercontigs
Some Terminology
read a 500-900 long word that comes out of sequencer
mate pair a pair of reads from two endsof the same insert fragment
contig a contiguous sequence formed by several overlapping readswith no gaps
supercontig an ordered and oriented set(scaffold) of contigs, usually by mate
pairs
consensus sequence derived from thesequene multiple alignment of reads
• Euler• Indexing Euler graph layout by picking paths consensus
CS262 Lecture 11, Win07, Batzoglou
Quality of assemblies—mouse
CS262 Lecture 11, Win07, Batzoglou
Quality of assemblies—mouse
Terminology: N50 contig lengthN50 contig lengthIf we sort contigs from largest to smallest, and startCovering the genome in that order, N50 is the lengthOf the contig that just covers the 50th percentile.