Top Banner
Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department of Bioinformatics Göttingen, October/November 2006
89

Tools for multiple sequence alignment

Jan 30, 2016

Download

Documents

noleta

Bioinformatics Methods Course Multiple Sequence Alignment Burkhard Morgenstern University of Göttingen Institute of Microbiology and Genetics Department of Bioinformatics Göttingen, October/November 2006. Tools for multiple sequence alignment. T Y I M R E A Q Y E T C I V M R E A Y E. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tools for multiple sequence alignment

Bioinformatics Methods Course

Multiple Sequence Alignment

Burkhard Morgenstern

University of GöttingenInstitute of Microbiology and Genetics

Department of Bioinformatics

Göttingen, October/November 2006

Page 2: Tools for multiple sequence alignment

Tools for multiple sequence alignment

T Y I M R E A Q Y E

T C I V M R E A Y E

Page 3: Tools for multiple sequence alignment

Tools for multiple sequence alignment

T Y I - M R E A Q Y E

T C I V M R E A - Y E

Page 4: Tools for multiple sequence alignment

Tools for multiple sequence alignment

T Y I M R E A Q Y E

T C I V M R E A Y E

Y I M Q E V Q Q E

Y I A M R E Q Y E

Page 5: Tools for multiple sequence alignment

Tools for multiple sequence alignment

T Y I - M R E A Q Y E

T C I V M R E A - Y E

Y - I - M Q E V Q Q E

Y – I A M R E - Q Y E

Page 6: Tools for multiple sequence alignment

Tools for multiple sequence alignment

T Y I - M R E A Q Y E

T C I V M R E A - Y E

- Y I - M Q E V Q Q E

Y – I A M R E - Q Y E

Astronomical Number of possible alignments!

Page 7: Tools for multiple sequence alignment

Tools for multiple sequence alignment

T Y I - M R E A Q Y E

T C I V - M R E A Y E

- Y I - M Q E V Q Q E

Y – I A M R E - Q Y E

Astronomical Number of possible alignments!

Page 8: Tools for multiple sequence alignment

Tools for multiple sequence alignment

T Y I - M R E A Q Y E

T C I V M R E A - Y E

- Y I - M Q E V Q Q E

Y – I A M R E - Q Y E

Which one is the best ???

Page 9: Tools for multiple sequence alignment

Tools for multiple sequence alignment

Questions in development of alignment programs:

(1) What is a good alignment?

→ objective function (`score’)

(2) How to find a good alignment?

→ optimization algorithm

First question far more important !

Page 10: Tools for multiple sequence alignment

Tools for multiple sequence alignment

Before defining an objective function (scoring scheme)

What is a biologically good alignment ??

Page 11: Tools for multiple sequence alignment

Tools for multiple sequence alignment

Criteria for alignment quality:

1. 3D-Structure: align residues at corresponding positions in 3D structure of protein!

2. Evolution: align residues with common ancestors!

Page 12: Tools for multiple sequence alignment

Tools for multiple sequence alignment

T Y I - M R E A Q Y E

T C I V - M R E A Y E

- Y I - M Q E V Q Q E

- Y I A M R E - Q Y E

Alignment hypothesis about sequence evolution

Search for most plausible hypothesis!

Page 13: Tools for multiple sequence alignment

Tools for multiple sequence alignment

Compute for amino acids a and b

Probability pa,b of substitution

a → b (or b → a), Frequency qa of a

Define

s(a,b) = log (pa,b / qa qb)

Page 14: Tools for multiple sequence alignment

Tools for multiple sequence alignment

Page 15: Tools for multiple sequence alignment
Page 16: Tools for multiple sequence alignment

Tools for multiple sequence alignment

Traditional objective functions:

Define Score of alignments as

Sum of individual similarity scores s(a,b) Gap penalty g for each gap in alignment

Needleman-Wunsch scoring system (1970) for pairwise alignment (= alignment of two sequences)

Page 17: Tools for multiple sequence alignment

T Y W I V

T - - L V

Example:

Score = s(T,T) + s(I,L) + s (V,V) – 2 g

Page 18: Tools for multiple sequence alignment

T Y W I V

T - - L V

Idea: alignment with optimal (maximal) score probably biologically meaningful.

Dynamic programming algorithm finds optimal alignment for two sequences efficiently (Needleman and Wunsch, 1970).

Page 19: Tools for multiple sequence alignment

Tools for multiple sequence alignment

Traditional Objective functions can be generalized to multiple alignment (e.g. sum-of-pair score, tree alignment)

Needleman-Wunsch algorithm can also be generalized to find optimal multiple alignment, but:

Very time and memory consuming!

-> Heuristic algorithm needed, i.e. fast but sub-optimal solution

Page 20: Tools for multiple sequence alignment

Tools for multiple sequence alignment

Most commonly used heuristic for multiple alignment:

Progressive alignment

(mid 1980s)

Page 21: Tools for multiple sequence alignment

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WWRLNDKEGYVPRNLLGLYP

AVVIQDNSDIKVVPKAKIIRD

YAVESEAHPGSFQPVAALERIN

WLNYNETTGERGDFPGTYVEYIGRKKISP

Page 22: Tools for multiple sequence alignment

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WWRLNDKEGYVPRNLLGLYP

AVVIQDNSDIKVVPKAKIIRD

YAVESEAHPGSFQPVAALERIN

WLNYNETTGERGDFPGTYVEYIGRKKISP

Guide tree

Page 23: Tools for multiple sequence alignment

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WW--RLNDKEGYVPRNLLGLYP-

AVVIQDNSDIKVVP--KAKIIRD

YAVESEASFQPVAALERIN

WLNYNEERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 24: Tools for multiple sequence alignment

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN

WW--RLNDKEGYVPRNLLGLYP-

AVVIQDNSDIKVVP--KAKIIRD

YAVESEASVQ--PVAALERIN------

WLN-YNEERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 25: Tools for multiple sequence alignment

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN-

WW--RLNDKEGYVPRNLLGLYP-

AVVIQDNSDIKVVP--KAKIIRD

YAVESEASVQ--PVAALERIN------

WLN-YNEERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 26: Tools for multiple sequence alignment

`Progressive´ Alignment

WCEAQTKNGQGWVPSNYITPVN--------

WW--RLNDKEGYVPRNLLGLYP--------

AVVIQDNSDIKVVP--KAKIIRD-------

YAVESEA---SVQ--PVAALERIN------

WLN-YNE---ERGDFPGTYVEYIGRKKISP

Profile alignment, “once a gap - always a gap”

Page 27: Tools for multiple sequence alignment

CLUSTAL W

Most important software program:

CLUSTAL W:

J. Thompson, T. Gibson, D. Higgins (1994), CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment … Nuc. Acids. Res. 22, 4673 - 4680

(~ 20.000 citations in the literature)

Page 28: Tools for multiple sequence alignment

Tools for multiple sequence alignment

Problems with traditional approach:

Results depend on gap penalty

Heuristic guide tree determines alignment;

alignment used for phylogeny reconstruction

Algorithm produces global alignments.

Page 29: Tools for multiple sequence alignment

Tools for multiple sequence alignment

Problems with traditional approach:

But:

Many sequence families share only local similarity

E.g. sequences share one conserved motif

Page 30: Tools for multiple sequence alignment

Local sequence alignment

Find common motif in sequences; ignore the rest

EYENS

ERYENS

ERYAS

Page 31: Tools for multiple sequence alignment

Local sequence alignment

Find common motif in sequences; ignore the rest

E-YENS

ERYENS

ERYA-S

Page 32: Tools for multiple sequence alignment

Local sequence alignment

Find common motif in sequences; ignore the rest – Local alignment

E-YENSERYENSERYA-S

Page 33: Tools for multiple sequence alignment

Gibbs Motive Sampler

Local multiple alignment without gaps:

C.E. Lawrence et al. (1993)Detecting subtle sequence signals: a Gibbs Sampling Strategy for Multiple AlignmentScience, 262, 208 - 214

Page 34: Tools for multiple sequence alignment

Traditional alignment approaches:

Either global or local methods!

Page 35: Tools for multiple sequence alignment

New question: sequence families with multiple local similarities

Neither local nor global methods appliccable

Page 36: Tools for multiple sequence alignment

New question: sequence families with multiple local similarities

Alignment possible if order conserved

Page 37: Tools for multiple sequence alignment

The DIALIGN approach

Morgenstern, Dress, Werner (1996),PNAS 93, 12098-12103

Combination of global and local methods

Assemble multiple alignment from gap-free local pair-wise alignments (,,fragments“)

Page 38: Tools for multiple sequence alignment

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 39: Tools for multiple sequence alignment

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 40: Tools for multiple sequence alignment

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 41: Tools for multiple sequence alignment

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 42: Tools for multiple sequence alignment

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 43: Tools for multiple sequence alignment

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 44: Tools for multiple sequence alignment

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 45: Tools for multiple sequence alignment

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 46: Tools for multiple sequence alignment

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 47: Tools for multiple sequence alignment

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Page 48: Tools for multiple sequence alignment

The DIALIGN approach

atc------taatagttaaactcccccgtgc-ttag

cagtgcgtgtattactaac----------gg-ttcaatcgcg

caaa--gagtatcacc----------cctgaattgaataa

Consistency!

Page 49: Tools for multiple sequence alignment

The DIALIGN approach

atc------TAATAGTTAaactccccCGTGC-TTag

cagtgcGTGTATTACTAAc----------GG-TTCAATcgcg

caaa--GAGTATCAcc----------CCTGaaTTGAATaa

Page 50: Tools for multiple sequence alignment

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 51: Tools for multiple sequence alignment

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaccctgaattgaagagtatcacataa

(1) Calculate all optimal pair-wise alignments

Page 52: Tools for multiple sequence alignment

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

(1) Calculate all optimal pair-wise alignments

Page 53: Tools for multiple sequence alignment

The DIALIGN approach

Multiple alignment:

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

(1) Calculate all optimal pair-wise alignments

Page 54: Tools for multiple sequence alignment

The DIALIGN approach

Fragments from optimal pair-wise alignments might be inconsistent

Page 55: Tools for multiple sequence alignment

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 56: Tools for multiple sequence alignment

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 57: Tools for multiple sequence alignment

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 58: Tools for multiple sequence alignment

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 59: Tools for multiple sequence alignment

The DIALIGN approach

atc------taatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaa--gagtatcacccctgaattgaataa

Page 60: Tools for multiple sequence alignment

The DIALIGN approach

atctaatagttaaactcccccgtgcttag

cagtgcgtgtattactaacggttcaatcgcg

caaagagtatcacccctgaattgaataa

Page 61: Tools for multiple sequence alignment

The DIALIGN approach

Score of alignment:

Define weight score for fragments based on probability of random occurrence

Score of alignment = sum of weight scores of fragments

Goal: find consistent set of fragments with maximum total weight

Page 62: Tools for multiple sequence alignment

The DIALIGN approach

Advantages of segment-based approach:

Program can produce global and local alignments!

Sequence families alignable that cannot be aligned with standard methods

Page 63: Tools for multiple sequence alignment

T-COFFEE

C. Notredame, D. Higgins, J. Heringa (2000), T-Coffee: A novel algorithm for multiple sequence alignment, J. Mol. Biol.

Page 64: Tools for multiple sequence alignment
Page 65: Tools for multiple sequence alignment
Page 66: Tools for multiple sequence alignment
Page 67: Tools for multiple sequence alignment

T-COFFEE

T-COFFEE Less sensitive to spurious pairwise similarities Can handle local homologies better than CLUSTAL

Page 68: Tools for multiple sequence alignment

T-COFFEE

T-COFFEE

Idea:

1. Build library of pairwise alignments

2. Alignment from seq i, j and seq j, k supports alignmetn from seq i, k.

Page 69: Tools for multiple sequence alignment

Evaluation of multi-alignment methods

Alignment evaluation by comparison to trusted benchmark alignments.

`True’ alignment known by information about structure or evolution.

Page 70: Tools for multiple sequence alignment

Evaluation of multi-alignment methods

For protein alignment:

M. McClure et al. (1994):

4 protein families, known functional sites

J. Thompson et al. (1999):

Benchmark data base, 130 known 3D structures (BAliBASE)

T. Lassmann & E. Sonnhammer (2002): BAliBASE + simulated evolution (ROSE)

Page 71: Tools for multiple sequence alignment

Evaluation of multi-alignment methods

Page 72: Tools for multiple sequence alignment

1aboA 1 .NLFVALYDfvasgdntlsitkGEKLRVLgynhn..............gE 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslvalgfsdgqearpeeiG 1ihvA 1 .NFRVYYRDsrd......pvwkGPAKLLWkg.................eG 1vie 1 .drvrkksga.........awqGQIVGWYctnlt.............peG

1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

Key

alpha helix RED beta strand GREEN core blocks UNDERSCORE BAliBASE

Reference alignments

Evaluation of multi-alignment methods

Page 73: Tools for multiple sequence alignment
Page 74: Tools for multiple sequence alignment

Result: DIALIGN best method for distantly related sequences, T-Coffee best for globally related proteins

Page 75: Tools for multiple sequence alignment

Evaluation of multi-alignment methods

BAliBASE: 5 categories of benchmark sequences

(globally related, internal gaps, end gaps)

CLUSTAL W, T-COFFEE, MAFFT, PROBCONS perform well on globally related sequences, DIALIGN superior for local similarities

Page 76: Tools for multiple sequence alignment

Evaluation of multi-alignment methods

Conclusion: no single best multi alignment program!

Advice: try different methods!

Page 77: Tools for multiple sequence alignment

Anchored sequence alignment

Idea: semi-automatic alignment

use expert knowledge to define constraints instead of fully automated alignment

Define parts of the sequences where biologically correct alignment is known as anchor points, align rest of the sequences automatically.

Page 78: Tools for multiple sequence alignment

Anchored sequence alignment

NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN

IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT

GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS

Page 79: Tools for multiple sequence alignment

Anchored sequence alignment

NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN

IIHREDKGVIYALWDYEPQNDDELPMKEGDCMT

GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS

Anchor points in multiple alignment

Page 80: Tools for multiple sequence alignment

Anchored sequence alignment

NLFV ALYDFVASGDNTLSITKGEKLRVLGYNHN

IIHREDKGVIYALWDYEPQND DELPMKEGDCMT

GYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS

Anchor points in multiple alignment

Page 81: Tools for multiple sequence alignment

Anchored sequence alignment

-------NLF V-ALYDFVAS GD-------- NTLSITKGEk lrvLGYNhn

iihredkGVI Y-ALWDYEPQ ND-------- DELPMKEGDC MT-------

-------GYQ YrALYDYKKE REedidlhlg DILTVNKGSL VA-LGFS--

Anchored multiple alignment

Page 82: Tools for multiple sequence alignment

Algorithmic questions

Goal:

Find optimal alignment (=consistent set of fragments) under costraints given by user-specified anchor points!

Page 83: Tools for multiple sequence alignment

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Algorithmic questions

Page 84: Tools for multiple sequence alignment

Algorithmic questions

NLFVALYDFVASGDNTLSITKGEKLRVLGYNHN IIHREDKGVIYALWDYEPQNDDELPMKEGDCMTGYQYRALYDYKKEREEDIDLHLGDILTVNKGSLVALGFS

Page 85: Tools for multiple sequence alignment

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Algorithmic questions

Page 86: Tools for multiple sequence alignment

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Sequences

Algorithmic questions

Page 87: Tools for multiple sequence alignment

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Sequences start positions

Algorithmic questions

Page 88: Tools for multiple sequence alignment

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Sequences start positions length

Algorithmic questions

Page 89: Tools for multiple sequence alignment

Additional input file with anchor points:

1 3 215 231 5 4.5

2 3 34 78 23 1.23

1 4 317 402 8 8.5

Sequences start positions length score

Algorithmic questions