Top Banner
Cédric Notredame (16/03/22) Comparing Two Protein Sequences Cédric Notredame
75

Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Dec 30, 2015

Download

Documents

Andrew Robbins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Comparing Two Protein Sequences

Cédric Notredame

Page 2: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Our Scope

Pairwise Alignment methods are POWERFUL

Pairwise Alignment methods are LIMITED

If You Understand the LIMITS they Become VERY POWERFUL

Look once Under the Hood

Page 3: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Outline

-WHY Does It Make Sense To Compare Sequences

-HOW Can we Align Two Sequences ?

-HOW can I Search a Database ?

-HOW Can we Compare Two Sequences ?

Page 4: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Why Does It Make Sense To Compare Sequences ?

Sequence Evolution

Page 5: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Why Do We Want To Compare Sequences

wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| ||||????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA

EXTRAPOLATE

??????

Homology?

SwissProt

Page 6: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Why Do We Want To Compare Sequences

Page 7: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Why Does It Make Sense To Align Sequences ?

-Evolution is our Real Tool.

-Nature is LAZY and Keeps re-using Stuff.

-Evolution is mostly DIVERGEANT

Same Sequence Same Ancestor

Page 8: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Why Does It Make Sense To Align Sequences ?

SameSequence

Same Function

Same 3D Fold

Same Origin

Page 9: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Comparing Is Reconstructing Evolution

Page 10: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

An Alignment is a STORY

ADKPKRPLSAYMLWLN

ADKPKRPKPRLSAYMLWLNADKPRRPLS-YMLWLN

ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN

Mutations+

Selection

Page 11: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

An Alignment is a STORY

ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLN

Mutation

InsertionDeletion

ADKPKRPLSAYMLWLN

ADKPKRPKPRLSAYMLWLNADKPRRPLS-YMLWLN

ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN

Mutations+

Selection

Page 12: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Evolution is NOT Always Divergent…

AFGP with (ThrAlaAla)nSimilar To Trypsynogen

N

AFGP with (ThrAlaAla)n

S

Chen et al, 97, PNAS, 94, 3811-16

NOT

Similar to Trypsinogen

Page 13: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Evolution is NOT Always Divergent

AFGP with (ThrAlaAla)nSimilar To Trypsynogen

AFGP with (ThrAlaAla)nNOT

Similar to Trypsinogen

N

S

SIMILAR Sequences BUT

DIFFERENT origin

Page 14: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Evolution is NOT always Divergent…

But in MOST cases, you may assume it is…

SameSequence

Same Function

Same 3D Fold

Same Origin

Similar Function DOES NOT REQUIRESimilar Sequence

Similar Sequence

Historical Legacy

Page 15: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Do Sequences Evolve

Each Portion of a Genome has its own Agenda.

Page 16: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Do Sequences Evolve ?

CONSTRAINED Genome Positions Evolve SLOWLY

EVERY Protein Family Has its Own Level Of Constraint

Family KS KA

Histone3 6.4 0Insulin 4.0 0.1Interleukin I 4.6 1.4Globin 5.1 0.6Apolipoprot. AI 4.5 1.6Interferon G 8.6 2.8

Rates in Substitutions/site/Billion Years as measured on Mouse Vs Human (80 Million years)Ks Synonymous Mutations, Ka Non-Neutral.

Page 17: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Different molecular clocks for different proteins--another prediction

Page 18: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

GC

LIV A

F

Aliphatic

Aromatic

Hydrophobic

C

How Do Sequences Evolve ?The amino Acids Venn Diagram

To Make Things Worse, Every Residue has its Own Personality

ST

WY

QHK

R

ED N

Polar

PG

Small

C

Page 19: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Do Sequences Evolve ?

In a structure, each Amino Acid plays a Special Role

OmpR, Cter Domain

In the core, SIZE MATTERS

On the surface, CHARGE MATTERS

--+

Page 20: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Do Sequences Evolve ?

Accepted Mutations Depend on the Structure

Big -> BigSmall ->SmallNO DELETION

--+

Charged -> ChargedSmall <-> Big or SmallDELETIONS

Page 21: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?

Substitution Matrices

Page 22: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?

To Compare Two Sequences, We need:

Their Function

Their Structure

We Do Not Have Them !!!

Page 23: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?

We will Need To Replace Structural Information With Sequence Information.

SameSequence

Same Function

Same 3D Fold

Same Origin

It CANNOT Work ALL THE TIME !!!

Page 24: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?

To Compare Sequences, We need to Compare ResiduesWe Need to Know How Much it COSTS to SUBSTITUTE

an Alanine into an Isoleucinea Tryptophan into a Glycine…The table that contains the costs for all the

possible substitutions is called the SUBSTITUTION MATRIX

How to derive that matrix?

Page 25: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?

G

C

LIV A

F

Aliphatic

Aromatic

Hydrophobic

C

ST

W

YQH

K

R

ED N

Polar

PG

Small

C

Using Knowledge Could Work

But we do not know enough about Evolution and Structure.

Using Data works better.

Page 26: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?Making a Substitution Matrix

-Take 100 nice pairs of Protein Sequences, easy to align (80% identical).

-Align them…

-Count each mutations in the alignments

-25 Tryptophans into phenylalanine-30 Isoleucine into Leucine…

-For each mutation, set the substitution score to the log odd ratio:

Expected by chance

ObservedLog

Page 27: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?Making a Substitution Matrix

The Diagonal Indicates How Conserved a residue tends to be.W is VERY Conserved

Some Residues are Easier To mutate into other similar

Cysteins that make disulfide bridges and those that do not get averaged

Page 28: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?Making a Substitution Matrix

Page 29: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 30: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?Using Substitution Matrix

ADKPRRP---LS-YMLWLNADKPKRPKPRLSAYMLWLN

Mutation

InsertionDeletion

Given two Sequences and a substitution Matrix,We must Compute the CHEAPEST Alignment

Page 31: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Most popular Subsitution Matrices • PAM250• Blosum62 (Most widely used)

Raw Score

TPEA¦| |APGA

TPEA¦| |APGA

Score =1 = 9

• Question: Is it possible to get such a good alignment by chance only?

+ 6 + 0 + 2

Scoring an Alignment

Page 32: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Insertions and Deletions

Gap Penalties

• Opening a gap is more expensive than extending it

Seq AGARFIELDTHE----CAT||||||||||| |||

Seq BGARFIELDTHELASTCAT

Seq AGARFIELDTHE----CAT||||||||||| |||

Seq BGARFIELDTHELASTCAT

gap

Gap Opening PenaltyGap Extension Penalty

Page 33: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?Limits of the substitution Matrices

They ignore non-local interactions and Assume that identical residues are equal

They assume evolution rate to be constant

ADKPKRPLSAYMLWLN

ADKPKRPKPRLSAYMLWLN

ADKPRRPLS-YMLWLN

ADKPKRPLSAYMLWLNADKPKRPLSAYMLWLN

Mutations+

Selection

Page 34: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?Limits of the substitution Matrices

Substitution Matrices Cannot Work !!!

Page 35: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?Limits of the substitution Matrices

I know… But at least, could I get some idea of when they are likely to do all right

Page 36: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?The Twilight Zone

Length

%Sequence Identity

100

Same 3D Fold

Twilight Zone

Similar SequenceSimilar Structure

30%

Different SequenceStructure ????

30

Page 37: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?The Twilight Zone

Substitution Matrices Work Reasonably Well on Sequences that have more than 30 % identity over more than 100 residues

Page 38: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 39: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 40: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 41: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 42: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?Which Matrix Shall I used

The Initial PAM matrix was computed on 80% similar Proteins

It been extrapolated to more distantly related sequences.

Pam 250Pam 350

Other Matrices Exist:BLOSUM 42BLOSUM 62

BLOSUM 62

Page 43: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

How Can We Compare Sequences ?Which Matrix Shall I use

PAM: Distant Proteins High Index (PAM 350)BLOSUM: Distant Proteins Low Index (Blosum30)

•GONNET 250> BLOSUM62>PAM 250.

•But This will depend on:

•The Family.•The Program Used and Its Tuning.

Choosing The Right Matrix may be Tricky…

•Insertions, Deletions?

Page 44: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Dot MatricesGlobal AlignmentsLocal Alignment

HOW Can we Align Two Sequences ?

Page 45: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 46: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Dot Matrices

QUESTION

What are the elements shared by two sequences ?

Page 47: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Dot Matrices

>Seq1THEFATCAT>Seq2THELASTCAT

T H E F A T C A TTHEFASTCAT

Window

Stringency

Page 48: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Dot Matrices

Sequences Window size

Stringency

Page 49: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Dot MatricesStrigency

Window=1Stringency=1

Window=11Stringency=7

Window=25Stringency=15

Page 50: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Dot Matrices

xy

xy x

Page 51: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Dot Matrices

Page 52: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Dot Matrices

Page 53: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Dot Matrices

Page 54: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Dot Matrices

Page 55: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Dot MatricesLimits

-Visual aid

-Best Way to EXPLORE the Sequence Organisation

-Does NOT provide us with an ALIGNMENT

wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| ||||????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA

Page 56: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Cost

L

Afine Gap Penalty

Global Alignments

-Take 2 Nice Protein Sequences

-A good Substitution Matrix (blosum)

-A Gap opening Penalty (GOP)

-A Gap extension Penalty (GEP)

GOP

GEP

GOP GOP

GOP

Parsimony: Evolution takes the simplest path

(So We Think…)

Page 57: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Insertions and Deletions

Gap Penalties

• Opening a gap is more expensive than extending it

Seq AGARFIELDTHE----CAT||||||||||| |||

Seq BGARFIELDTHELASTCAT

Seq AGARFIELDTHE----CAT||||||||||| |||

Seq BGARFIELDTHELASTCAT

gap

Gap Opening PenaltyGap Extension Penalty

Page 58: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Global Alignments

-Take 2 Nice Protein Sequences

-A good Substitution Matrix (blosum)

-A Gap opening Penalty (GOP)

-A Gap extension Penalty (GEP)

>Seq1THEFATCAT>Seq2THEFASTCAT

-DYNAMIC PROGRAMMING

DYNAMICPROGRAMMING

THEFA-TCATTHEFASTCAT

Page 59: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Global Alignments

F A S T

F A T

----FATFAST---

(L1+l2)!

(L1)!*(L2)!

---FAT-FAST---

--F-AT-FAST---

Brut Force Enumeration

2

( )

DYNAMIC PROGRAMMING

Page 60: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Global AlignmentsDYNAMIC PROGRAMMING

Match=1 MisMatch=-1Gap=-1

FAT

F A S T

1

-1

-1

-2

-3

0

-2 -3 -4

2

0

0

Dynamic Programming (Needlman and Wunsch)

FAT

F A S T

1

-1

-1

-2

-3

0

-2 -3 -4

2

0

0 -1 0

0

21-1-1

1

FAT

F A S T

1

-1 -2 -3 -4

2

0

2

1

F A S TF A - T

Page 61: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Global AlignmentsDYNAMIC PROGRAMMING

Global Alignments are very sensitive to gap Penalties

GOP

GEP

Page 62: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Global AlignmentsDYNAMIC PROGRAMMING

Global Alignments are very sensitive to gap Penalties

Global Alignments do not take into account the MODULAR nature of Proteins

C: K vitamin dep. Ca BindingK: Kringle DomainG: Growth Factor moduleF: Finger Module

Page 63: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Local Alignments

GLOBAL Alignment

LOCAL Alignment

Smith And Waterman (SW)=LOCAL Alignment

Page 64: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Local Alignments

We now have a PairWise Comparison Algorithm,

We are ready to search Databases

Page 65: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Database Search

1.10e-20

10

1.10e-100

1.10e-2

1.10e-1

10

3

1

3

6

1.10e-2

1

20

15

13

QUERRY

Comparison Engine

Database

E-valuesHow many time do we expect such anAlignment by chance?

SWQ

Page 66: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 67: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

CONCLUSION

Page 68: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

-There is a relation between Sequence and Structure.

The Easiest way to Compare Two Sequences is a dotplot.

Sequence Comparison

-Thanks to evolution, We CAN compare Sequences

-Substitution matrices only work well with similar Sequences (More than 30% id).

Page 69: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

A few Addresses

Page 70: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 71: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 72: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 73: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 74: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)

Page 75: Cédric Notredame (08/10/2015) Comparing Two Protein Sequences Cédric Notredame.

Cédric Notredame (19/04/23)