273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment
Dec 21, 2015
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Evolution at the DNA level
…ACGGTGCAGTTACCA…
…AC----CAGTCCACCA…
Mutation
SEQUENCE EDITS
REARRANGEMENTS
Deletion
InversionTranslocationDuplication
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Orthology and Paralogy
HB HumanHB Human
WB WormWB Worm
HA1 HumanHA1 Human
HA2 HumanHA2 Human
YeastYeast
WA WormWA Worm
Orthologs:Derived by speciation
Paralogs:Everything else
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Orthology, Paralogy, Inparalogs, Outparalogs
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Genome Evolution – Macro Events
• Inversions• Deletions• Duplications
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Synteny maps
Comparison of human and mouse
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Building synteny maps
Recommended local aligners• BLASTZ
Most accurate, especially for genes Chains local alignments
• WU-BLAST Good tradeoff of efficiency/sensitivity Best command-line options
• BLAT Fast, less sensitive Good for
• comparing very similar sequences • finding rough homology map
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Index-based local alignment
Dictionary:
All words of length k (~10)
Alignment initiated between words of alignment score T
(typically T = k)
Alignment:
Ungapped extensions until score
below statistical threshold
Output:
All local alignments with score
> statistical threshold
……
……
query
DB
query
scan
Question: Using an idea from overlap detection, better way to find all local alignments between two genomes?
CS273a Lecture 9/10, Aut 10, Batzoglou
Chaining local alignments
1. Find local alignments
2. Chain -O(NlogN) L.I.S.
3. Restricted DP
CS273a Lecture 9/10, Aut 10, Batzoglou
Progressive Alignment
• When evolutionary tree is known:
Align closest first, in the order of the tree In each step, align two sequences x, y, or profiles px, py, to generate a new
alignment with associated profile presult
Weighted version: Tree edges have weights, proportional to the divergence in that edge New profile is a weighted average of two old profiles
x
w
y
z
Example
Profile: (A, C, G, T, -)px = (0.8, 0.2, 0, 0, 0)py = (0.6, 0, 0, 0, 0.4)
s(px, py) = 0.8*0.6*s(A, A) + 0.2*0.6*s(C, A) + 0.8*0.4*s(A, -) + 0.2*0.4*s(C, -)
Result: pxy = (0.7, 0.1, 0, 0, 0.2)
s(px, -) = 0.8*1.0*s(A, -) + 0.2*1.0*s(C, -)
Result: px- = (0.4, 0.1, 0, 0, 0.5)
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Threaded Blockset Aligner
Human–Cow
HMR – CDRestricted AreaProfile Alignment
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Reconstructing the Ancestral Mammalian Genome
Human: C
Baboon: C
Cat: C
Dog: G
C
C or G
G
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Finding Conserved Elements (1)
• Binomial method 25-bp window in the human genome Binomial distribution of k matches in N bases given the neutral
probability of substitution
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Finding Conserved Elements (2)
• Parsimony Method Count minimum # of mutations explaining each column Assign a probability to this parsimony score given neutral model Multiply probabilities across 25-bp window of human genome
A
CAAG
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Finding Conserved Elements (3)
GERP
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Phylo HMMs
HMM
Phylogenetic Tree Model
Phylo HMM
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
How do the methods agree/disagree?
CS273a Lecture 9/10, Aut 10, BatzoglouCS273a Lecture 10, Fall 2010
Statistical Power to Detect Constraint
L
N
C: cutoff # mutationsD: neutral mutation rate: constraint mutation rate relative to neutral