Top Banner
1 Towards a model for -1 frameshift sites Alain Denise 1,2 , Michaël Bekaert 1 , Laure Bidou 1 , Guillemette Duchateau-Nguyen 1 , Jean-Paul Forest 2 , Christine Froidevaux 2 , Isabelle Hatin 1 , Jean-Pierre Rousset 1 , Michel Termier 1 1 IGM (Institut de Génétique et Microbiologie) 2 LRI (Laboratoire de Recherche en Informatique) Université Paris-Sud, Orsay
41

1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

1

Towards a model for -1 frameshift sites

Alain Denise1,2, Michaël Bekaert1, Laure Bidou1, Guillemette Duchateau-Nguyen1,

Jean-Paul Forest2, Christine Froidevaux2,

Isabelle Hatin1, Jean-Pierre Rousset1, Michel Termier1

1 IGM (Institut de Génétique et Microbiologie)2 LRI (Laboratoire de Recherche en Informatique)

Université Paris-Sud, Orsay

Page 2: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

2

Translation

CAU AUG GAU UAC AUG GUC UAA GAU5’ 3’

mRNA

Page 3: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

3

Translation

CAU AUG GAU UAC AUG GUC UAA GAU

The ribosome reads bases by triplets (or codons)from a START codon

ribosome

5’ 3’

Page 4: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

4

Translation

CAU AUG GAU UAC AUG GUC UAA GAU

The ribosome synthetizes one amino-acid per codon

5’ 3’

Page 5: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

5

Translation

CAU AUG GAU UAC AUG GUC UAA GAU5’ 3’

Page 6: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

6

Translation

CAU AUG GAU UAC AUG GUC UAA GAU5’ 3’

Page 7: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

7

Translation

CAU AUG GAU UAC AUG GUC UAA GAU5’ 3’

Page 8: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

8

Translation

CAU AUG GAU UAC AUG GUC UAA GAU5’ 3’

Page 9: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

9

Translation

CAU AUG GAU UAC AUG GUC UAA GAU

The synthesis goes on until a STOP codon is read

5’ 3’

1 mRNA gives 1 protein

Page 10: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

10

Experimental fact

• Some mRNAs encode two distinct proteins with same 5’ end

Page 11: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

11

Programmed -1 frameshifting

Non-deterministic event

ORF1a

START0 STOP0

0 phase

STOP-1

ORF1b -1 phase

usualtranslation

-1 frameshift

1 mRNA gives 2 distinct proteinswith accurate ratio

Page 12: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

12

Typical -1 frameshift site [Brierley, 1989]

NNX XXY YYZ

AUG P SP

S1

L1

S2

L2

L’1

Slippery sequence Secondary structure

5’

3’

Page 13: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

13

IBV frameshift site

UAU UUA AAC

AUG

S1

S2

Slippery sequence Pseudoknot

5’

3’

GGGUAC

UGACGAUGGGG

GCUG AUACCCC

A G G C U C G

U C C G A G C

G

UUGC

GAAA

Page 14: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

15

Translation with frameshift

UAU UUA AAC GGG UAC

AUG

5’

3’

UGACGAUGGGG

GCUG AUACCCC

A G G C U C G

U C C G A G C

G

UUGC

GAAA

Page 15: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

16

Translation with frameshift

UAU UUA AAC GGG UAC

5’

3’

UGACGAUGGGG

GCUG AUACCCC

A G G C U C G

U C C G A G C

G

UUGC

GAAA

Page 16: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

17

Translation with frameshift

UAU UUA AAC GGG UAC

5’

3’

UGACGAUGGGG

GCUG AUACCCC

A G G C U C G

U C C G A G C

G

UUGC

GAAA

-1 shift

Page 17: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

18

UA UUU AAA CGG GUA CGG GGU AGC AGU

Translation with frameshift

5’

3’

Page 18: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

19

UA UUU AAA CGG GUA CGG GGU AGC AGU

Translation with frameshift

5’

3’

Page 19: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

20

UA UUU AAA CGG GUA CGG GGU AGC AGU

Translation with frameshift

5’

3’

Page 20: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

21

UA UUU AAA CGG GUA CGG GGU AGC AGU

Translation with frameshift

5’

3’

Page 21: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

22

Goals

To improve the known model for viral frameshift sites

To identify new frameshift sites in viral and non viral genomes

Page 22: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

23

Our approach

Biologicalsequences

Formalmodels

Predictiontools

In silicoand in vivo

validation

Applications toother genomes

representexplainpredict

Page 23: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

24

IBV frameshift site: spacer

5’

3’

GGGUAC

Page 24: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

25

Spacer consensus

HAST-1 UAC AAA

BEV UGU UG

EAV UGA GAG

HCV GAG UC

IBV GGG UAC

MHV GGG UU

TGEV GAG

RCNMV UAG GC

BWYV GGA GUG

PLRV GGG CAA

BLV UAA UAG A

FIV UGG AAG GC

HIV-1 GGG AAG AU

HTLV-2UCC UUA A

JSR UGG GUG A

MMTV gag-pro UUG UAA A

MMTV pro-pol UGA U

RSV UAG GGA

SRV-1 GGA CUG A

Consensus UGG UAG AGAA GUA

Page 25: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

26

Lab experiments

lacZ luc

-1 phase

pSV40 lacZ luc

0 phase

pSV40 FS signal

FS signal N

Test construct

Control construct

Expression reporter FS reporter

Page 26: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

27

Spacer: lab experiments

Spacer relative FS rate

wild-type IBV GGGUA 100U mutant UGGUA 100

A mutant AGGUA 55C mutant CGGUA 32CC mutant CCGUA 70CCU mutant CCUUA 49

Page 27: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

28

Refining the model: Machine learning

• To identify relevant properties that characterize FS sites

• Disjunctive learning: all sequences do not frameshift for the same reasons [Giedroc et al., 2000]

Page 28: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

29

Annotating data: spacer

5’

3’

GGGUAC

Page 29: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

30

Example of data: SP

• SP = GGGUAC

– number of A = 1; C = 1; G = 3; U = 1;

– % of A = 33; C = 33; G = 50; U = 33;

– first = G;

– last = C;

Page 30: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

31

Annotating data: stem 1

UGACGAUGGGG

GCUG AUACCCC

5’

3’

Page 31: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

32

Example of data: stem 1

• S1 =

– 5' side : GGGGUAGCAGU– 3' side : CCCCAUAGUCG

– stability : -20,7 kcal/mol

Page 32: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

33

Annotating data: full sequence

U UUA AAC

5’

3’

GGGUAC

UGACGAUGGGG

GCUG AUACCCC

A G G C U C G

U C C G A G C

G

UUGC

GAAA

Page 33: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

34

Example of data : FS rate

FS rate = 22 %

Page 34: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

35

GloBo

Disjunctive learning algorithm

Suited to small amount of data

Won the PTE challenge on analogous data

Page 35: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

36

Example of rulesIf

SP length 5 and number of G in S1.5’ bottom half 3 and

number of G in S1.5’ 4 and %T in S2.5’ 30 and%G in S2.5’ 70

then FS rate 5%

If %G in S1.5' bottom half 80 and %C in L1 45

then FS rate 5%

If

SP length 5 and S1.3' length 6 and %C in S1.3' 45

then FS rate 5%

...

Page 36: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

37

Covering and prediction

If

SP length 5 and number of G in S1.5’ bottom half 3 and

number of G in S1.5’ 4 and %T in S2.5’ 30 and%G in S2.5’ 70

then FS rate 5%

Covering of examples : 70 %

Examples predicted in test set : 80 %

Page 37: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

38

Is R1relevant for frameshift ?

Stem 1 5’-side relative FS R1 rate

wild-type IBV GGGGU AUCAGU 100 yesmutant 1 GGUCG AUCAGU 41 yesmutant 2 GGGGU UCUACA 55 yes

mutant 3 GCUCG AUCAGU 36 nomutant 4 GCCCU AUCAGU 73 no

Page 38: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

39

Covering and prediction

If

SP length 5 and S1.3' length 6 and %C in S1.3' 45

then FS rate 5%

Covering of examples : 45 %

Examples predicted in test set : 40 %

Page 39: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

40

Conclusion

• Spacer:– correlation between primary sequence and

FS rate has been established– systematic experimentation going on

Page 40: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

41

Conclusion

Biologicalsequences

Formalmodels

Predictiontools

In silicoand in vivo

validation

Applications toother genomes

Page 41: 1 Towards a model for -1 frameshift sites Alain Denise 1,2, Michaël Bekaert 1, Laure Bidou 1, Guillemette Duchateau-Nguyen 1, Jean-Paul Forest 2, Christine.

58

SpacerVirus Sequence

HAST-I : U A C A A ABEV : U G U U GEAV : U G A G A GHCV : G A G U CIBV : G G G U A CMHV : G G G U UTGEV : G A GRCNMV : U A G G CBWYV : G G A G U GPLRV : G G G C A ABLV : U A A U A G AFIV : U G G A A G G CHIV-1 : G G G A A G A UHTLV-II : U C C U U A AJSR : U G G G U G AMMTV : U U G U A A AMMTV : U G A URSV : U A G G G ASRV-1 : G G A C U G A

Consensus : U G G U A G AG A A G U A