Top Banner
Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas maximus (Indian elephant) Mammuthus primigenius (Siberian wooly Mammoth) Which modern elephant is closer to a mammoth ? Use clustalW to do the alignment Chap. 3: Sequence Alignment
52

Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Dec 16, 2015

Download

Documents

Cortez Wormwood
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Example Mitochondrial cytochrome b –

transport electrons From NCBI protein web page, search

for cytb and Loxodonta africana (African elephant) Elephas maximus (Indian elephant) Mammuthus primigenius (Siberian wooly Mammoth)

Which modern elephant is closer to a mammoth ?

Use clustalW to do the alignment

Chap. 3: Sequence Alignment

Page 2: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

>0012AAX12542.1| cytochrome b [Elephas maximus]MTHTRKSHPLFKIINKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTMTAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILLLLLLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALLLSILILGLMPLLHTSKHRSMMLRPLSQVLFWALTMDLLMLTWIGSQPVEYPYIAIGQMASILYFSIILAFLPIAGMIENYLIK

>gi|56578537|gb|AAW01445.1| cytochrome b [Loxodonta africana]MTHIRKSYPLLKIINKSFIDLPTPSNISAWWNFGSLLGACLITQILTGLFLAMHYTPDTMTAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFALHFILPFTMTALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILLLLLLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSILILGLMPLLHTSKYRSMMLRPLSQVLFWTLTMDLLMLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGMIENYLIK

>gi|2924604|dbj|BAA25008.1| cytochrome b [Mammuthus primigenius]MTHIRKSHPLLKILNKSFIDLPTPSNISTWWNFGSLLGACLITQILTGLFLAMHYTPDTMTAFSSMSHICRDVNYGWIIRQLHSNGASIFFLCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTDLVEWIWGGFSVDKATLNRFFALHFILPFTMIALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILFLLLLALLSPDMLGDPDNYMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALLLSILILGIMPLLHTSKHRSMMLRPLSQVLFWTLATDLLMLTWIGSQPVEYPYIIIGQMASILYFSIILAFLPIAGMIENYLIK

Page 3: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Pairwise sequence alignment is the most fundamental operation of bioinformatics

It is used to decide if two proteins (or genes) are related structurally or functionally

It is used to identify domains or motifs that are shared among proteins

It is the basis of BLAST searching (next) It is used in the analysis of genomes

Page 4: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Globin Globins carry oxygens and are first proteins to be

sequenced Hemoglobins – in read blood cell Myoglobin – in muscle cells of mammals Leghemoglobin – in legumes (beans, etc.)

Page 5: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Globin

Page 6: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

(a)Myoglobin(b)Tetrameric hemoglobin(c) Beta globin subunit(d)Myoglobin & beta globin

Page 7: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Similarity and Homology Similarity

Observation or measurement of resemblance, independent of the source of the resemblance

Can be observed now but involves no historical hypothesis

Homology Specifies that sequences and the organisms

descended from a common ancestor Implies that similarities are shared ancestral

characteristics Cannot make the assertion of homology from

historical evidence, and thus is an inference from observations of similarity

Page 8: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Homology Similarity attributed to descent from a common

ancestor Two types of homology

Orthologs Homologous sequences in different species that arose from a

common ancestral gene during speciation; may or may not be responsible for a similar function.

Paralogs Homologous sequences within a single species that arose by

gene duplication.

Page 9: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Orthologs:members of a gene (protein)family in variousorganisms.This tree showsglobin orthologs.

Page 10: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Paralogs: members of a gene (protein) family within aspecies. This tree shows human globin paralogs.

Page 11: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Orthologs and paralogs are often viewed in a single tree

Page 12: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Globin phylogeny by Dayhoff (1972)

Page 13: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Globin phylogeny by Dayhoff in evolutionary time (1972)

Page 14: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Direct Alignment

Given two sequences +1 if letters in the same positions match -1, otherwise

Extremely simple, but what if there is a gap? Gap when a base is inserted or deleted (indel) Maybe only in biological data Maybe more significant mutation – give more

negative score as a penalty

RNDKPFSTARNRNQKPKWWTA+ + - + +- - - - - -

Page 15: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Visual Alignment -- Dotplot

A seq. in x axis and the other in y axis Dot on a crosspoint if

identical in both sequences

view

Page 16: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Special Dotplot

Periodic Palindrome

Page 17: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Sequence Alignment Direct alignment

An alignment with gaps

What is the criteria for a good alignment ? Use score to check for optimality May not produce a unique optimal alignment

g c t g a a c gc t a t a a t c

g c t g - a a - c - g- - c t - a t a a t c

g c t g - a a - c g- c t a t a a t c -

Page 18: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Calculation of an alignment score

Page 19: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

General approach to pairwise alignment

Given two sequences Select an algorithm that generates a score Allow gaps (insertions, deletions) Score reflects degree of similarity Alignments can be global or local Estimate probability that the alignment occurred by

chance

Page 20: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Pairwise alignment: protein sequencescan be more informative than DNA

protein is more informative (20 vs 4 characters); many amino acids share related biophysical properties

codons are degenerate: changes in the third position often do not alter the amino acid that is specified

protein sequences offer a longer “look-back” time DNA sequences can be translated into protein, and then

used in pairwise alignments Many times, DNA alignments are appropriate when

to confirm the identity of a cDNA to study noncoding regions of DNA to study DNA polymorphisms example: Neanderthal vs modern human DNA

Page 21: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Genetic Code

Page 22: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Scoring Matrix Dotplot

Incredibly useful in identifying biological significance and interesting regions

Do not privde a measure of statistical similarity A numerical method

Not just provide position-by-position overlap But provide the nature and characteristics of residues

being aligned Scoring matrices

Empirical weighting schemes

Page 23: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Scoring Matrix Three biological factors in constructing a

scoring matrix Conservation

Account for conservation between proteins, but provide a way to assess conservation substitutions

Score represents what residues are capable of substitution for other residues while not adversely affecting the function of the native protein (determined by charge, size, hydrophobicity, etc.)

Frequency Reflect how often residues occur among

proteins Rare residues are given more weight

Evolution By design, implicitly represent evolutionary

patterns Review

http://books.google.com/books?hl=en&lr=&id=9p3E2sS1aJUC&oi=fnd&pg=PA73&ots=eJ0lzjEg_b&sig=Fl2kBl5QBq7VIoy-eDgDqXhaZ14#v=onepage&q&f=false

Page 24: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Scoring Matrix Log-Odds Score

qij : prob. of how often i and j are seen aligned pi: prob. of observing AA I among all proteins

sij = log(qij/ pipj)

score Represent the ratio of observed versus random

frequency of substitutign i by j Positive score – two residues are replaced more often

than by chance Negative – less likely to substitute than by chance

Page 25: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Scoring Matrix Nucleotides

AAs More complicated in 20x20

Page 26: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Other Scores

Gap penalty Gap initiation and extension

Clustal-W recommends use of identity matrix For DNA sequences

1 for a match, 0 for a mismatch, gap penalty of 10 for initiation and 0.1 for extension per residue

For AA sequences BLOSUM62 matrix for substitution, gap penalty

of 11 for initiation and 1 for extension per residue

a a a g a a aa a a – a a a

a a a g g g a a aa a a - - - a a a

Page 27: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Pairwise Alignment: Global and Local Given a scoring scheme, find alignments

maximizing the score Global

Entire sequence of protein or DNA sequence Needleman and Wunsch (dynamic

programming) Local

Focus on regions of greatest similarity Smith and Waterman In general, preferable to Global Alignment

Because only portions of proteins align

Page 28: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Global and Local in Dotplot

Page 29: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Dynamic Programming

Guaranteed to yield an optimal global alignment Drawback – many alignments may give the same

optimal score and none of them may correspond to biologically correct alignment W.Fitch and T.Smith found 17 alignments of alpha- and

beta-chains of chicken haemoglobin, one of which is correct based on structures

Drawback – complexity O(nm) for sequences of length n and m

Page 30: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Dynamic Programming

Rock removal game Two piles of rocks, each with 10 rocks A and B alternatively remove one rock from a

single pile or one rock each from both piles Player who remove the last rock(s) wins the game

Use reduction strategy starting with smaller problems

Consider 2+2 problem A removes one rock each, B removes one rock

each A removes one rock, B takes one rock from the

same pile B wins

3+3 problem ?

Page 31: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Rock Removal with 10+10 ↑ A takes one from pile X ← A takes one from pile Y A takes one from each pile * A will lose

Page 32: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Manhattan Tourist Problem

Visit as many tourist sites in a Manhattan grid Move to the east

or south only Start at upper

left corner End at # 15,

lower right corner

Page 33: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Problem Statement

Given a weighted grid G with two vertices (nodes) for a source and a sink

Find the longest path in a weighted grid

Weight: # of attraction sites on an edge (link)

Each vertex (node) can be identified by (i,j) Source at (0,0) Sink at (n, m)

3 2 4

1 0 2 43 2 4

4 6 5 20 7 3

4 4 5 23 3 0

Page 34: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Solution

Define si,j: the longest path from source to vertex (i,j) (0 ≤ i < n, 0 ≤ j < m)

Solve for smaller problems first

Solving for s0,j and si,0 is easy

3 2 4

1 0 2 43 2 4

4 6 5 20 7 3

4 4 5 23 3 0

0 3 5 9

1

5

9

(0,0)

Page 35: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Solution (2)

Iteratively solve for neighboring nodes si,1

si,2, etc.

si,j = max[si-1,j + weight on edge between (i-1,j) and (i,j),

si,j-1 + weight on edge between (i,j-1) and (i,j)]

3 2 4

1 0 2 43 2 4

4 6 5 20 7 3

4 4 5 23 3 0

0 3 5 9

1

5

9

(0,0)

4

10

14

(1,0)

(2,0)

(3,0)

(0,1)

Page 36: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Algorithm Algorithm

Given Weast(i,j) and Wsouth(i,j),

s0,0 = 0

for i =1 to n si,0 = si-1,0 + Wsouth(i,0)

for j =1 to n s0,j = s0,j-1 + Weast(0,j)

for i =1 to n for j = 1 to m

si,j = max[si-1,j + Wsouth(i,j),

si,j-1 + Weast(i,j)]

return sn,m

Page 37: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

General Graph Problem

Not regular with two inputs (indegree) and two outputs (outdegree) at a node

Page 38: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Directed Acyclic Graph DAG: Directed Acyclic Graph

G = (V, E) Longest Path Problem

sv = max(su + weight from u to v) over all u which are Predecessor(v)

Predecessor relationship has to be established ahead of the time

57 3

5v

u1

u2

u3

Page 39: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Graph Problem applied to Alignment Measure of similarity

Hamming distance: equal-length sequences Levenshtein or edit distance, 1966

unequal-length sequence Min. # of ‘edit operations’ (insertion,

deletion, alteration of a single character in either sequence) required to change one string into the other

e.g.

Levenshtein distance = 3

a g – t c cc g c t c a

Page 40: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Edit Distance and Alignment

Two strings, v and w Gaps are allowed in string, except that two gaps

are not allowed at the same char positions

Each char in a string is represented by positions in the original string without gaps v: (1 2 2 3 4 5 6 7 7) w: (1 2 3 4 5 5 6 6 7)

For both strings, (0

0) (11) (2

2) (23) (3

4) (45) (5

5) (66) (7

6) (77)

Represents a path in a grid

A T - G T T A T -A T C G T - A - G

Page 41: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Edit Distance

Vertex (i,j) corresponds to (i

j) for (vi, wj) G = (V, E) Longest Path Problem

sv = max(su + weight from u to v) over all u, Predecessor(v)

Predecessor relationship has to be established ahead of the time

Page 42: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Global Alignment

A string has a sequence of characters drawn from an alphabet A of size k

Scoring matrix, δ, of (k+1)x(k+1) Problem Statement

Given two strings, v and w, and a scoring matrix δ,

Find the longest (max. score) path

Dynamic programming kernel Recurrence relationship

si-1, j + δ(vi, -)si, j = max [ si, j-1 + δ(-, wj) ] si-1, j-1 + δ(vi, wj)

Page 43: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Global Alignment

Example of scoring matrix Match: +1; mismatch: -μ; indels: -σ

Indels are frequent, and gap penalties proportional to indel sizes are considered to be severe Affine gap penalties soften the penalty rate Can be linear, -(a + bx) for the indel length of x

si-1, j - σsi-1, j = max [ si, j-1 - σ ] si-1, j-1 + 1, if vi=wj

si-1, j-1 - μ, otherwise

Page 44: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Needleman-Wunsch, 1970

Setting up a matrix

Page 45: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Setting up a matrix

Page 46: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Scoring the matrix

Page 47: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.
Page 48: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Identifying the optimal alignment

Page 49: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Local Alignment

Global sequence alignment is useful for alignment of sequences from the same protein family, for example

Substrings from two sequences may be highly conserved in biological applications Temple Smith and Michael Waterman, 1981 Biologically irrelevant diagonal matches are likely

to have a higher score

Page 50: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Local Alignment Problem Given two strings v and w, and a scoring

matrix δ Find substrings of v and w whose global

alignment is maximal among all substrings of v and w Seemingly harder, because the global alignment

is to find the longest path from (0,0) to (n,m), whereas the local alignment is to find the longest path among all paths between two arbitrary points, (i,j) to (i’, j’)

Add edges of weight 0 from (0,0) to every other vertex (vertex (0,0) is a predecessor of every vertex

Page 51: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.

Local Alignment Solution

Recurrence kernel becomes

Select the largest si, j

Other non-maximal local alignments may have biological significance Select k best nonoverlapping local alignments

si-1, j + δ(vi, -)si, j = max [ si, j-1 + δ(-, wj) ] si-1, j-1 + δ(vi, wj)

0

Page 52: Example Mitochondrial cytochrome b – transport electrons From NCBI protein web page, search for cytb and Loxodonta africana (African elephant) Elephas.