Top Banner
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March 2004
103

Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Introduction to Bioinformatics

Burkhard Morgenstern

Institute of Microbiology and Genetics

Department of Bioinformatics

Goldschmidtstr. 1

Göttingen, March 2004

Page 2: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Introduction to Bioinformatics

Bioinformatics in Göttingen:

Dep. of Bioinformatics (UKG),

Edgar Wingender Dep. of Bioinformatics (IMG), BM Inst. Num. and Applied Mathematics,

Stephan Waack Dep. of Genetics (Hans Fritz, IMG),

Rainer Merkl

Page 3: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Introduction to Bioinformatics

Definition:

Bioinformatics

= development and application of software

tools for Molecular Biology

Page 4: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Bioinformatics:

Topics:

(a) Sequence Analysis (Gene finding …)

(b) Structure Analysis (RNA, Protein)

(c) Gene Expression Analysis

(d) Metabolic Pathways, Virtual Cell

Page 5: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Bioinformatics:

Areas of work:

(a) Application of software tools for data analysis in (Molecular) Biology

(b) Computing infrastructure, database development, support

(c) Development of algorithms and software tools

Page 6: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Information flow in the cell

Page 7: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Information flow in the cell

Idea:

Sequence -> Structure -> Function

Page 8: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Information flow in the cell

Lots of data available at the sequence level

Fewer data at the structure and function level

Page 9: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Topics of lecture:

Data bases SwissProt, GenBank Pair-wise sequence comparison Data base searching Multiple sequence alignment Gene prediction

Page 10: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Protein data bases

Sanger and Tuppy: protein-sequencing methods (1951)

Margaret Dayhoff: Atlas of Protein Sequence and Structure (1972); later: Protein Identification Resource (PIR) as international collaboration

(a) Organize proteins into families;

(b) Amino acid substitution frequencies Amos Bairoch: SwissProt (1986)

Page 11: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Exponential growth of data bases

Page 12: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 13: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 14: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 15: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 16: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 17: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

DNA data bases

Maxam and Gilbert; Sanger: DNA sequencing methods (1977)

GenBank DNA data base (1979), now run by NCBI.

Collaboration with EMBL (1982), DDBJ (1984)

Translated DNA sequences stored in protein data bases (PIR, trEMBL)

Page 18: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 19: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 20: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 21: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 22: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 23: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Most important tool for sequence analysis:

Sequence comparison

Page 24: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

The dot plot

Y Q E W T Y I V A R E A Q Y E

C I V M R E Q Y

Page 25: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

The dot plot

Y Q E W T Y I V A R E A Q Y E C I V M R E Q Y

Page 26: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

The dot plot

Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X

Page 27: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

The dot plot

Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X

Page 28: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

The dot plot

Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X

Page 29: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

The dot plot

Y Q E W T Y I V A R E A Q Y E C I X V X M R X E X X X Q X X Y X X

Page 30: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

The dot plot

Y Q E W T Y Q E V R E Y Q E I C I X V X M R Y X X X Q X X X E X X X X

Page 31: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

The dot plot

Y Q E W T Y Q E V R E Y Q E I C I X V X M R Y X X X Q X X X E X X X X

Page 32: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

The dot plot

Advantages:

1. Various types of similarity detectable (repeats, inversions)

2. Useful for large-scale analysis

Page 33: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

The dot plot

Page 34: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

Evolutionary or structurally related sequences:

alignment possible

Sequence homologies represented by inserting gaps

Page 35: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E C I X V X M R X E X X Q X Y X X

Page 36: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E C I X V X M R X E X X Q X Y X X

Page 37: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E C I X V X M R X E X X Q X Y X X

Page 38: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E C I X V X M R X E X X Q X Y X X

Page 39: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

C I V M R E Q Y

Page 40: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

Page 41: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

Global alignment: sequences aligned over the entire length

Page 42: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

Basic task:

Find best alignment of two sequences

Page 43: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

Basic task:

Find best alignment of two sequences

= alignment that reflects structural and evolutionary relations

Page 44: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

Questions:

1. What is a good alignment?

2. How to find the best alignment?

Page 45: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

Problem: Astronomical number of possible

alignments

Page 46: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

C I - V M R E - Q Y –

Problem: Astronomical number of possible

alignments

Page 47: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

Problem: Astronomical number of possible

alignments

Stupid computer has to find out: which alignment is best ??

Page 48: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

First (simplified) rules:

1. Minimize number of mismatches

2. Maximize number of matches

Page 49: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

C I - V M R E - Q Y –

First (simplified) rules:

1. Minimize number of mismatches

2. Maximize number of matches

Page 50: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

First (simplified) rules:

1. Minimize number of mismatches

2. Maximize number of matches

Page 51: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

First (simplified) rules:

1. Minimize number of mismatches

2. Maximize number of matches

Page 52: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

C I - V M R E - Q Y –

Second (simplified) rule:

Minimize number of gaps

Page 53: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V - A R E A Q Y E

C I - V M - R E - Q Y –

Second (simplified) rule:

Minimize number of gaps

Page 54: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

For protein sequences: Different degrees of similarity among amino

acids. Counting matches/mismatches

oversimplistic

Page 55: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V

T L V

Page 56: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V

T L - V

Page 57: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V

T - L V

Page 58: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V

T - L V

Use similarity scores for amino acids

Page 59: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 60: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V

T - L V

Use similarity scores for amino acids:

Define score s(a,b) for amino acids a and b

Page 61: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Page 62: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V

T - L V

Given a similarity score for pairs of amino acids

Define score of alignment as

sum of similarity values s(a,b) of aligned

residues minus gap penalty g for each

residue aligned with a gap

Page 63: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V

T - L V

Example:

Score = s(T,T) + s(I,L) + s (V,V) - g

Page 64: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V

T - L V

Dynamic-programming algorithm finds

alignment with best score.

(Needleman and Wunsch, 1970)

Page 65: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

Alignment corresponds to path through comparison matrix

Page 66: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E C I X V X M R X E X X Q X Y X X

Page 67: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E X X C X I X V X M X R X E X X Q X Y X X

Page 68: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T Y I V A R E A Q Y E

- C I V M R E - Q Y –

Alignment corresponds to path through comparison matrix

Page 69: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V - R E A Q I - C I V M R E - H Y

Page 70: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

Score of alignment: Sum of similarity values of aligned residues minus gap penatly

T W L V - R E A Q I - C I V M R E - H Y

Page 71: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

Example: S = - g + s(W,C) + s(L,L) + s(V,V) - g + s(R,R) …

T W L V - R E A Q I - C I V M R E - H Y

Page 72: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V R E A Q Y I X X C X Alignment corresponds I X to path through V X comparison matrix M X R X E X X H X Y X X

T W L V - R E A Q I - C I V M R E - H Y

Page 73: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

i T W L V R E A Q Y I X X Dynamic programming: C X Calculate scores S(i,j) I X of optimal alignment of V X prefixes up to positions M X i and j. j R X E H Y

T W L V - R - C I V M R

Page 74: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

i T W L V R E A Q Y I X X C X S(i,j) can be calculated from I X possible predecessors V X S(i-1,j-1), S(i,j-1), S(i-1,j). M X j R X E H Y

T W L V - R - C I V M R

Page 75: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from top left = V X M X S(i-1,j-1) + s(R,R) j R X E H Y

T W L V - R - C I V M R

Page 76: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from above = V X j-1M X S(i,j-1) – g j R X E H Y

T W L V R - - C I V M R

Page 77: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

i-1 i T W L V R E A Q Y I X X C X Score of optimal path that I X comes from left = V X M X S(i-1,j) – g j R X X E H Y

T W L - - V R - C I V M R -

Page 78: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

i-1 i T W L V R E A Q Y I X X C X Score of optimal path = I X V X Maximum of these three M X values j R X X E H Y

T W L - - V R - C I V M R -

Page 79: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

Recursion formula:

S(i,j) = max { S(i-1,j-i)+s(ai,bj) , S(i-1,j) – g , S(i,j-i) – g }

Page 80: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V R C I V M R E H Y

Page 81: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V R x x x C x x x I x x V x x M x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:

Page 82: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V R x x x C x x x I x x x V x x M x x R x x E x x H x x Y x x Fill matrix from top left to bottom right:

Page 83: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V R x x x x x x C x x x x x x I x x x x x x V x x x x x x M x x x x x x R x x x x x x E x x x x x x H x x x x x x Y x x x x x x Fill matrix from top left to bottom right:

Page 84: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V R x x x x x x C x x x x x x I x x x x x x V x x x x x x M x x x x x x R x x x x x x E x x x x x x H x x x x x x Y x x x x x x Find optimal alignment by trace-back procedure

Page 85: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V R x x x x x x C x I x V x M x R x E x H x Y x Initial matrix entries?

Page 86: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

i

T W L V R

X X

C X Entries S(i,j) scores

I X of optimal alignment of

j V X prefixes up to positions

M i and j.

R

E

H

Y

T W L V

- C I V

Page 87: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

i T W L V R j X X X X X C Entries S(i,0) scores I of optimal alignment of V prefix up to positions M i and empty prefix. R E Score = - i* g H Y T W L V - - - -

Page 88: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V R C I V M R E H Y Initial matrix entries: Example, g = 2

Page 89: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V R 0 -2 -4 -6 -8 -10 C -2 I -4 V -6 M -8 R -10 E -12 H -14 Y -16 Initial matrix entries: Example, g = 2

Page 90: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise global alignment

T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X

T W L V - R E A Q I - C I V M R E - F Y

Page 91: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise global alignment

Complexity:

l1 and l2 length of sequences:

Computing time and memory proportional to

l1 * l2

Time and space complexity = O(l1 * l2)

Page 92: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise local alignment

Sequences often share only

local sequence similarity

(conserved genes or domains)

Important for database searching

Page 93: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise local alignment

T W L V R E A Q Y I X X C X I X V X M X R X E X X H X Y X X

T W L V - R E A Q I - C I V M R E - F Y

Page 94: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise local alignment

T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X

T W L V - R E A Q I - C I V M R E - F Y

Page 95: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise local alignment

Problem:

Find pair of segments with maximal

Alignment score

(not necessarily part of optimal global alignment!)

Page 96: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise local alignment

T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X

T W L V - R E A Q I - C I V M R E - F Y

Page 97: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

Recursion formula for global alignment:

S(i,j) = max { S(i-1,j-i)+s(ai,bj) , S(i-1,j) – g , S(i,j-i) – g }

Page 98: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

Recursion formula for local alignment:

S(i,j) = max { 0 , S(i-1,j-i)+s(ai,bj) , S(i-1,j) – g , S(i,j-i) – g }

Page 99: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V R 0 0 0 0 0 0 C 0 I 0 V 0 M 0 R 0 E 0 H 0 Y 0 Initial matrix entries = 0

Page 100: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

T W L V R 0 0 0 0 0 0 C 0 0 I 0 V 0 M 0 R 0 E 0 H 0 Y 0 s(C,T) = -2

Page 101: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise sequence alignment

Recursion formula for local alignment:

S(i,j) = max { 0 , S(i-1,j-i)+s(ai,bj) , S(i-1,j) – g , S(i,j-i) – g }

Store position with maximal value S(i,j) in matrix

Page 102: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise local alignment

T W L V R E A Q Y I X X C X I X V X M X R X E X X F X Y X X

T W L V - R E A Q I - C I V M R E - F Y

Page 103: Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.

Pair-wise local alignment

Algorithm by

Smith and Waterman (1983)

Implementation: e.g. BestFit in GCG package