Top Banner
Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003
28

Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Jan 05, 2016

Download

Documents

oakes

Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003. heterocysts. sucrose. Cyanobacteria. Free-living Nostoc. Anabaena/Nostoc grown on NO 3 - , air. N 2. CO 2. O 2. Matveyev and Elhai (unpublished). heterocysts. sucrose. NH 3. Cyanobacteria. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Welcome toResearch Simulation 1

PSSMs & Search for Repeated SequencesMonday, 9 June 2003

Page 2: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Free-living Nostoc

heterocysts

Matveyev and Elhai (unpublished)

CO2

sucroseN2

O2

CyanobacteriaAnabaena/Nostoc grown on NO3

-, air

Page 3: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Free-living Nostoc

heterocysts

Matveyev and Elhai (unpublished)

CO2

sucroseN2

NH3

NH3

O2

CyanobacteriaAnabaena/Nostoc grown on NO3

-, air

Page 4: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

Page 5: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

C2. Consider the sequence below of one strand of a DNA fragment.

5'-AGAGAGAGCTAAGGTCTCTCC-3'

Which of the following is a likely structure for the single-stranded fragment to assume?

A B C

Page 6: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

5'-AGAGAGAGCTAAGGTCTCTCC-3'

A B C

Page 7: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

5'-AGAGAGAGCTAAGGTCTCTCC-3'

A B C

Page 8: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

5'-AGAGAGAGCTAAGGTCTCTCC-3'

A B C

Page 9: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

5'-AGAGAGAGCTAAGGTCTCTCC-3'

A B C

Page 10: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

5'-AGAGAGAGCTAAGGTCTCTCC-3'

A B C

Page 11: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

5'-AGAGAGAGCTAAGGTCTCTCC-3'

A B C

Page 12: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

5'-AGAGAGAGCTAAGGTCTCTCC-3'

A B C

Page 13: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

5'-AGAGAGAGCTAAGGTCTCTCC-3'

A B C

Page 14: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

IF:

Page 15: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

TCATTGGTCATTGGTCATTGGTCATTTGTCCTTTGT

AACAGTAACAGGAAACAGTAAACAATAAACAGGAAACAGTAAAC

Page 16: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Tandem Heptameric RepeatsDo they come in complementary pairs?

TCATTGGTCATTGGTCATTGGTCATTTGTCCTTTGT

AACAGTAACAGGAAACAGTAAACAATAAACAGGAAACAGTAAAC

Page 17: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Regulatory Protein and their Binding Sites

GTA ..(8).. TAC

5’-GTA ..(8).. TACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN3’-CAT ..(8).. ATGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN

hetQ

NtcA

N RNA Polymerase

Page 18: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Differentiation in cyanobacteriaWhat does NtcA bind to?

Herrero et al (2001) J Bacteriol 183:411-425

mRNA

…(20-24)…TAnnnTGTA…(8)…TAC

Page 19: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Position-specific scoring matrices

Table 1: Examples of position-specific scoring matrices from sequence alignment

A. Sequence alignmenta

A T T T A G T A T C A A A A A T A A C A A T T C

G T T C T G T A A C A A A G A C T A C A A A A C

A T T T T G T A G C T A C T T A T A C T A T T T

A A G C T G T A A C A A A A T C T A C C A A A T

C A T T T G T A C A G T C T G T T A C C T T T A

Page 20: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

A. Sequence alignmenta

A T T T A G T A T C A A A A A T A A C A A T T C

G T T C T G T A A C A A A G A C T A C A A A A C

A T T T T G T A G C T A C T T A T A C T A T T T

A A G C T G T A A C A A A A T C T A C C A A A T

C A T T T G T A C A G T C T G T T A C C T T T A

B. Table of occurrencesa

A 3 2 0 0 1 0 0 5 2 1 3 4 3 2 2 1 1 5 0 2 4 2 2 1

C 1 0 0 2 0 0 0 0 1 4 0 0 2 0 0 2 0 0 5 2 0 0 0 2

G 1 0 1 0 0 5 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0

T 0 3 4 3 4 0 5 0 1 0 1 1 0 2 2 2 4 0 0 1 1 3 3 2

Position-specific scoring matrices

Page 21: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Position-specific scoring matrices

B. Table of occurrencesa

A 0 1 0 0 5 2 1 3 4 3

C 2 0 0 0 0 1 4 0 0 2

G 0 0 5 0 0 1 0 1 0 0

T 3 4 0 5 0 1 0 1 1 0

C. Position-specific scoring matrix (B = 0)b

A 0 .20 0 0 1.0 .40 .20 .60 .80 .60

C .40 0 0 0 0 .20 .80 0 0 .40

G 0 0 1.0 0 0 .20 0 .20 0 0

T .60 .80 0 1.0 0 .20 0 .20 .20 0

Page 22: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Position-specific scoring matrices

Table 2: Scoring a sequence with a PSSM

urt-71 T A G T A T C A A A

Scorea .60 .20 1.0 1.0 1.0 .20 .80 .60 .80 .60

w/ps’countsb .51 .24 .75 .79 .79 .24 .61 .51 .65 .51

Normal’db 1.6 .75 4.2 2.5 2.5 .75 3.4 1.6 2.0 1.6

 

Score = .60 * .20 * 1.0 * …

Page 23: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Position-specific scoring matricesIntroduction of pseudocounts

A. Sequence alignmenta

A T T T A G T A T C A A A A A T A A C A A T T C

G T T C T G T A A C A A A G A C T A C A A A A C

A T T T T G T A G C T A C T T A T A C T A T T T

A A G C T G T A A C A A A A T C T A C C A A A T

C A T T T G T A C A G T C T G T T A C C T T T A

A?qG,6 = 5 real counts

pG = ? pseudocounts

Page 24: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Position-specific scoring matricesIntroduction of pseudocounts

Score(position,nucleotide) = (q + p) / (N + B)

p = pseudocounts = B * (overall frequency of nucleotide)

[A] = 0.36[T] = 0.36[C] = 0.18[G] = 0.18

B = Total number of pseudocounts

= Square root (N) ?

or = 0.1 ?

Page 25: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Position-specific scoring matricesIntroduction of pseudocounts

C. Position-specific scoring matrix (B = 0)b

A 0 .20 0 0 1.0 .40 .20  

C .40 0 0 0 0 .20 .80  

G 0 0 1.0 0 0 .20 0  

T .60 .80 0 1.0 0 .20 0  

D. Position-specific scoring matrix (B = N = 2.2)c

A .099 .24 .099 .099 .79 .38 .24  

C .33 .056 .056 .056 .056 .19 .61  

G .056 .056 .75 .056 .056 .19 .056  

T .51 .65 .099 .79 .099 .24 .099  

Page 26: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Position-specific scoring matricesNormalization

A. Sequence alignmenta

A T T T A G T A T C A A A A A T A A C A A T T C

G T T C T G T A A C A A A G A C T A C A A A A C

A T T T T G T A G C T A C T T A T A C T A T T T

A A G C T G T A A C A A A A T C T A C C A A A T

C A T T T G T A C A G T C T G T T A C C T T T A

B. Table of occurrencesa

A 3 2 0 0 1 0 0 5 2 1 3 4 3 2 2 1 1 5 0 2 4 2 2 1

C 1 0 0 2 0 0 0 0 1 4 0 0 2 0 0 2 0 0 5 2 0 0 0 2

G 1 0 1 0 0 5 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0

T 0 3 4 3 4 0 5 0 1 0 1 1 0 2 2 2 4 0 0 1 1 3 3 2

How to account for similarity due to similar base composition?

Compare ScorePSSM / Scorebackground frequency

0.79 / 0.32 = 2.2

Page 27: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Position-specific scoring matricesLog odds form

E. Position-specific scoring matrix (B = 0.1)c

A .006 .20 .006 .006 .99 .40 .20 .59

C .40 .004 .004 .004 .004 .20 .79 .004

G .004 .004 .98 .004 .004 .20 .004 .20

T .59 .79 .006 .99 .006 .20 .006 .20

F. Position-specific scoring matrix: Log-odds form (B = 0.1)c,d

A 2.2 0.7 2.2 2.2 0.0 0.4 0.7 0.2

C 0.4 2.5 2.5 2.5 2.5 0.7 0.1 2.5

G 2.5 2.5 0.0 2.5 2.5 0.7 2.5 0.7

T 0.2 0.1 2.2 0.0 2.2 0.7 2.2 0.7

Log odds = -log(score)

Score * score * score … log + log + log …

Page 28: Welcome to Research Simulation 1 PSSMs & Search for Repeated Sequences Monday, 9 June 2003

Position-specific scoring matricesDecrease complexity through info analysis

Uncertainty (Hc) = - Sum [pic log2(pic)]

H1 = -{[4/11 log2(4/11)] + [3/11 log2(3/11)] + [1/11 log2(1/11)] + [3/11 log2(3/11)]}

= 1.87

H31 = -{[1/11 log2(1/11)] + [1/11 log2(1/11)] + [1/11 log2(1/11)] + [8/11 log2(8/11)]}

= 1.28

Information content = Sum (Hmax – Hc) (summed over all columns)