Top Banner
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity and homology. Appreciate that proteins can be modular Workshop-Learn to recognize amino acid structures. Perform sliding window to compute %G+C as a function of position in sequence. Become familiar with the Dotter program.
26

Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Sequence analysis

June 18, 2008

Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity and homology. Appreciate that proteins can be modular

Workshop-Learn to recognize amino acid structures. Perform sliding window to compute %G+C as a function of position in sequence. Become familiar with the Dotter program.

Page 2: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Sliding window

A sliding window-gathers information about properties of nucleotides or amino acids.

GCATATGCGCATATCCCGTCAATACCA

GCATATGCGCATATCCCGTCAATACCA

GCATATGCGCATATCCCGTCAATACCA

4

5

6

A simple example is to calculate the %G+C content within a window. Then move the window one nucleotide and repeat the calculation.

Page 3: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Sliding window

If the window is too small it is difficult to detect the trendof the measurement. If too large you could miss meaningfuldata.

Large window size

Small window size

%G+C

%G+C

Sequence number

Sequence number

Page 4: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Sliding window

Page 5: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Dot Plot with window = 1

A T G C C T A G

A T G C C T A G

**

**

**

**

**

**

**

*

*

Window = 1

Note that 25% ofthe table will befilled due to randomchance. 1 in 4 chanceat each position

Page 6: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Dot Plot with window = 3

A T G C C T A GA T G C C T A G

*

**

**

*

Window = 3The larger the windowthe more noise canbe filtered

What is thepercent chance thatyou will receive a match randomly? Onein (four)3

chance.(¼)3 * 100 = 1.56%

{

Page 7: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Amino acid characteristics

Page 8: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Page 9: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Four levels of protein structure

1) Primary

2) Secondary

3) Tertiary

4) Quaternary

Linear sequence-AGHIPLLQ

Initial folding patterns-AGHIPLLQTTT

Complex folding patterns-

Interactions between polypeptides

Page 10: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Chou-Fasman Rules (Mathews, Van Holde, Ahern)Amino Acid -Helix -Sheet Turn Ala 1.29 0.90 0.78 Cys 1.11 0.74 0.80 Leu 1.30 1.02 0.59 Met 1.47 0.97 0.39 Glu 1.44 0.75 1.00 Gln 1.27 0.80 0.97 His 1.22 1.08 0.69 Lys 1.23 0.77 0.96 Val 0.91 1.49 0.47 Ile 0.97 1.45 0.51 Phe 1.07 1.32 0.58 Tyr 0.72 1.25 1.05 Trp 0.99 1.14 0.75 Thr 0.82 1.21 1.03 Gly 0.56 0.92 1.64 Ser 0.82 0.95 1.33 Asp 1.04 0.72 1.41 Asn 0.90 0.76 1.23 Pro 0.52 0.64 1.91 Arg 0.96 0.99 0.88

Favors-Helix

Favors-Sheet

FavorsTurns

Page 11: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Chou&Fasman structure prediction

Chou & Fasman [Biochemistry 13(2):222-245 (1974)]. By studying a number of proteins whose structures were known, they were able to determine stretches of amino acids that could serve to form an -helix or a -sheet. These amino acids are called helix formers or sheet formers and can have different strengths for forming their structures. Once these nucleation sites are determined, adjacent amino acids are examined to see if the structure can be extended in either or both directions. Values for some amino acids allow extension, other amino acids do not. Some amino acids are categorized as helix breakers, or sheet breakers. A string of these will terminate the current structure. This method is about 60-65% accurate.

Page 12: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Amino Acid Hydrop. VALUEA 1.8C 2.5D -3.5E -3.5F 2.8G -0.4H -3.2I 4.5K -3.9L 3.8M 1.9N -3.5P -1.6Q -3.5R -4.5S -0.8T -0.7V 4.2W -0.9Y -1.3

Page 13: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Kyte-Doolittle Hydropathy

– Another sliding window routine [J. Mol. Biol. 157:105-132 (1982)]. They determine a "hydropathy scale" for each amino acid based on empirical observations.

1 2 3 4 56 7

Page 14: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Evolutionary Basis of Sequence Alignment

1. Identity: Quantity that describes how muchtwo sequences are alike in the strictest terms.2. Similarity: Quantity that relates how much two amino acid sequences are alike.3. Homology: a conclusion drawn from datasuggesting that two genes share a commonevolutionary history.

Page 15: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Purpose of finding differences and similarities of amino acids in two proteins.

Infer structural information

Infer functional information

Infer evolutionary relationships

Page 16: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Page 17: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

One is mouse trypsin and the other is crayfish trypsin.They are homologous proteins. The sequences share 41% identity.

Page 18: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Modular nature of proteins

Proteins possess local regions of similarity.

Proteins can be thought of as assemblies of modular domains.

Page 19: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Modular nature of proteins (cont. 1)

Exon 1a Exon 2a

Duplication of Exon 2a

Exon 1a Exon 2a Exon 2a

Exchange with Gene B

Gene A

Gene A

Gene A

Gene B

Exon 1a Exon 2a Exon 3 (Exon 2b from Gene B)

Exon 1b Exon 2b Exon 3 (Exon 2a from Gene A)

Exon 1b Exon 2b Exon 2bGene B

Page 20: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Identity Matrix

Simplest type of scoring matrix

LICA

1000L

100I

10C

1A

Page 21: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Similarity

It is easy to score if an amino acid is identical to another (thescore is 1 if identical and 0 if not). However, it is not easy togive a score for amino acids that are somewhat similar.

+NH3CO2

- +NH3CO2

-

Leucine Isoleucine

Should they get a 0 (non-identical) or a 1 (identical) orSomething in between?

Page 22: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Page 23: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Two proteins that are similar in certain regions

Tissue plasminogen activator (PLAT)Coagulation factor 12 (F12).

Page 24: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

The Dotter Program

• Program consists of three components:

•Sliding window

•A table that gives a score for each amino acid match

•A graph that converts the score to a dot of certain density (the higher the dot density the higher the score)

Page 25: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Page 26: Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.

Region ofsimilarity

Single region on F12is similar to two regionson PLAT