Top Banner
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’ identifies subtype-specific functional sites
51

C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

Dec 14, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

CENTR

FORINTEGRATIVE

BIOINFORMATICSVU

E

1-month Practical Course:Genome Analysis

Sequence comparison by ‘Sequence Harmony’ identifies subtype-specific functional sites

Page 2: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[2] [2] [2]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Significance of Alignment Positions

• Observed occurrence of amino acids at some position in an alignment that deviates from expected may indicate some (functional) significance

• What ‘deviates from expected’?

• unlikely occurrences

• What is unlikely?

• only (relatively) few possibilities to obtain observed result

Page 3: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[3] [3] [3]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Pfam Ig Family Alignment

Page 4: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[4] [4] [4]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Aquaporin: Motifs

• NPA: stabilizes loops B and E

• G(a)xxxG(a)xxG(a):

• Crossing ofright-handhelicalbundles

Andreas Engel and Henning Stahlberg, in: Current Topics in Membranes (2001), Hohmann, Agre & Nielsen (Eds.) Academic Press

Page 5: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[5] [5] [5]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Counting…

• Number of possibilities for finding some combination of aminoacids:

• which types?

• how much of each?

• Examples:

• WWW 3 W only 1 way

• RHH 1 R, 2 H three ways

• SHQ 1 S, 1 H, 1 Q six ways

Page 6: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[6] [6] [6]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Counting… (2)

• ‘Real’ examples:

• WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW• 33 W only 1 way

• RRRRRRRRRRRRRRRRHHHHHHHHHHHHHHHHH• 16 R, 17 H ? ways (~ 233 109 )

• SSSSSHSSCCCCCCCCEEQQEEEEEEEEEQEEE• 7 S, 1 H, 8 C, 14 E, 3 Q ??? ways (~ 532 1023 )

• ‘many’ ways

but, we can calculate that!

Page 7: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[7] [7] [7]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Shannon’s ‘Information Entropy’:

• ‘A Mathematical Theory of Communication’, The Bell System Technical Journal, Vol. 27, 1948.

“ Can we define a quantity which will measure, in some sense, how much information is ‘produced’ by such a process, or better, at what rate information is produced? ”

• He was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination.

Page 8: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[8] [8] [8]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Solution: Entropy

• the entropy of a set of probabilities pi

• measures information, choice and uncertainty

• zero only if only one pi is not zero

• there is only one choice

• maximal if all pi are equal

• most ‘uncertain’ situation: all options are possible

H=∑i=1

n

pi log p iH=∑i=1

n

pi log p i

Page 9: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[9] [9] [9]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Information Content

• Shannon was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination.

• …but it applies equally well to any type of ‘message’

• We can use it to measure the level of conservation in columns in an alignment

Page 10: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[10] [10] [10]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Simple Example: Sequence Entropy

LLLLLLALLLLLAALLLLAAALLLAAAALLAAAAALAAAAAA

.0

.1

.2

.3

.4

.5

.6

.7

.8

.9

1.0

.0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

p

H

p1 = 0 p2 = 0

p1 = p2 = ½

p1 = f (‘L’)p2 = f (‘A’)

H=∑i=1

n

pi log p i

Page 11: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[11] [11] [11]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Sequence Analysis: Comparing Groups• Many biological problems relate to questions like:

“ Why do these proteins do this, and those proteins not? ”

• or

“ Why do these patients get sick, and those not? ”

The answer can be related to similarities and differences between sequences

• Similarities (conservation) relate to functionally critical positions

• Differences can explain functional differences

Page 12: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[12] [12] [12]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Identification of Functional Sites

• Functional differences between Protein (sub-)families

• Current practice:• use Multiple Sequence Alignment

• look for Conserved Sites within (sub-)families

• (ignore sites that are overall conserved)

Page 13: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[13] [13] [13]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Conservation and (functional) Differences:

Conservation in

0 11 112Ras/Ral

25462192TOTAL:

4 14 1028Rab5/6

16 7 023MIP

5 141029SMAD

NotOneBothKnownTest-set

• Sequence Entropy measures Conservation

• But Sites that are Different are not always Conserved:

Page 14: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[14] [14] [14]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Identification of Functional Sites (2)

• Functional differences between Protein (sub-)families

• Example Binders vs. Non-Binders:• sites crucial for binding: conserved

• sites determining ‘non-binding’: not conserved

Take into account Non-Conserved Sites as well!• comparing Amino-acid Compositions

Page 15: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[15] [15] [15]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

TGF- signalling pathway

TR-II TR-I

TGF-

AR-Smads

division, differentiation, motility, adhesion,

programmed cell death

Nucleusactivation/repressionTGF- target genes

Smad-associationp

p p

BMPR-I BMPR-IIBR-Smads

p

Nucleusactivation/repression

BMP target genes

BMP

Smad-association

p p

specificity

Page 16: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[16] [16] [16]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Smad-MH2 Alignment & Functionally Specific Sites

• 27 known sites of functional specificity

• based mostly on site-specific mutants and characterized on BMPR-I vs. TBR-I binding affinity

Smad2 H.sapiens  D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.melanogaster   D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 C.auratus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 R.norvegicus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 M.musculus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L C L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 X.laevis   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 H.sapiens   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 M.musculus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R L C L G L

Smad3 C.auratus   D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N A E - R F C L G L

Smad3 G.gallus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 R.norvegicus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad1 S.mansoni   T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G F T D P S N N S D R F C L G L

Smad1 M.musculus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 H.sapiens   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 S.scrofa   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 R.norvegicus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 X.tropicalis   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 G.gallus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad1 D.rerio   D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 C.coturnix   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad5 H.sapiens   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 M.musculus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 R.norvegicus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P A N N K S R F C L G L

Smad5 G.gallus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad5 D.rerio   D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad8 M.musculus   D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 R.norvegicus   D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 G.gallus   N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G F T D P S N N K N R F C L G L

50

|

40

|

20

|

30

|

10

|

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N E V V E Q T R R H I G K G V R L Y Y I G G E V F A E C L S D S S I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y D W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D N A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H N F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D T S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

110

|

100

|

80

|

90

|

70

|

60

|

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L S Q S V S Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y R L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C S L K I F S N Q E F A H - - - - L L S R T V H H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V

I P S R C S L K I F N N Q E F A E - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A K Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K V F N N Q L F A Q - - - - L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K V F N N Q L F A Q L L A Q L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q L F A Q - - - - P L A Q S V N H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

170

|

160

|

140

|

150

|

130

|

120

|

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D R V L T Q M G S P R L P C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P N L R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W V E I H L N G P L Q W L D R V L T Q M G T P R N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E V H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

210

|

200

|

190

|

180

|

10%

21%

3%

%FP

59%

48%

76%

%FN

31%12SDPpred

52%21TreeDet

21%6AMAS

%TPPredictMethod

Page 17: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[17] [17] [17]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Comparing Groups of Sequences: Entropy

• Relative ‘Entropy’ rEA/B group A vs. B:•

• using probabilities p of amino acid type x at position i Degenerate for pB = 0, i.e. when A and B fully different!

Introduce Relative ‘Entropy’ rEA/AB A vs. all (‘AB’):•

Not degenerate, but still unbound.

• Upper bound depends on relative size of groups

rEiA/B=∑

x

pi,xA log

p i,xA

p i,xB

rEiA/AB=∑

x

p i,xA log

p i,xA

pi,xAB

Page 18: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[18] [18] [18]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Comparing Groups: Sequence Harmony

• Weigh groups A and B equally:

• Take pA + pB in stead of pAB

Defined on the fixed interval of [01]

• one is complete overlap in composition: Harmony

• zero is no overlap in composition: No Harmony

SHiA/B=∑

x

p i,xA log

p i,xA

pi,xA +p i,x

B

Page 19: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[19] [19] [19]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Smad-MH2 Alignment & Sequence Harmony

• Walter Pirovano*, K. Anton Feenstra* and Jaap Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”, Nucleic Acids Res., in press (2006).

• K. Anton Feenstra, Walter Pirovano and Jaap Heringa. “Sub-type Specific Sites for SMAD Receptor Binding Identified by Sequence Comparison using ‘Sequence Harmony’ ”. in: From Computational Biophysics to Systems Biology. pp. 73-78. Eds. U.H.E. Hansmann, J. Meinke, S. Mohanty and O. Zimmermann, Jülich, NIC Series, Vol. 34, 2006.

• Elena Marchiori*, Walter Pirovano, Jaap Heringa and K. Anton Feenstra*. “A Feature Selection Algorithm for Detecting Subtype Specific Sites for Smad Receptor Binding”, Bio-ICMLA06, accepted (2006).

Smad2 H.sapiens  D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.melanogaster   D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 C.auratus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 R.norvegicus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 M.musculus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L C L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 X.laevis   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 H.sapiens   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 M.musculus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R L C L G L

Smad3 C.auratus   D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N A E - R F C L G L

Smad3 G.gallus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 R.norvegicus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad1 S.mansoni   T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G F T D P S N N S D R F C L G L

Smad1 M.musculus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 H.sapiens   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 S.scrofa   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 R.norvegicus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 X.tropicalis   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 G.gallus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad1 D.rerio   D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 C.coturnix   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad5 H.sapiens   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 M.musculus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 R.norvegicus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P A N N K S R F C L G L

Smad5 G.gallus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad5 D.rerio   D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad8 M.musculus   D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 R.norvegicus   D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 G.gallus   N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G F T D P S N N K N R F C L G L

50

|

40

|

20

|

30

|

10

|

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N E V V E Q T R R H I G K G V R L Y Y I G G E V F A E C L S D S S I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y D W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A T V E M T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D N A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N A A V E L T R R H I G R G V R L Y Y I G G E V F A E C L S D S A I F V Q S P N C N Q R Y G W H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H N F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N F H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D S S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C L S D T S I F V Q S R N C N Y H H G F H P T T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

L S N V N R N S T I E N T R R H I G K G V H L Y Y V G G E V Y A E C V S D S S I F V Q S R N C N Y Q H G F H P A T V C K

110

|

100

|

80

|

90

|

70

|

60

|

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L S Q S V S Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y R L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C N L K I F N N Q E F A A - - - - L L A Q S V N Q G F E A V Y Q L T R M C T I R M S F V K G W G A E Y R R Q T V

I P P G C S L K I F S N Q E F A H - - - - L L S R T V H H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V

I P S R C S L K I F N N Q E F A E - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A K Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E T V Y E L T K M C T L R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S S C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q E F A Q - - - - L L A Q S V N H G F E A V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K V F N N Q L F A Q - - - - L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K V F N N Q L F A Q L L A Q L L A Q S V H H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

I P S G C S L K I F N N Q L F A Q - - - - P L A Q S V N H G F E V V Y E L T K M C T I R M S F V K G W G A E Y H R Q D V

170

|

160

|

140

|

150

|

130

|

120

|

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D R V L T Q M G S P R L P C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S V R C S S M S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P N L R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W I E L H L N G P L Q W L D K V L T Q M G S P S I R C S S V S

T S T P C W V E I H L N G P L Q W L D R V L T Q M G T P R N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E V H L H G P L Q W L D K V L T Q M G S P L N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

T S T P C W I E I H L H G P L Q W L D K V L T Q M G S P H N P I S S V S

210

|

200

|

190

|

180

|

x xixi

xixii B

,A,

A,A

,A/B

pp

plogp SH

260 280 300 320 340 360 380 400 420 440 460

1

0

Page 20: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[20] [20] [20]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Smad2 H.sapiens  D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.melanogaster   D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 C.auratus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 R.norvegicus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 M.musculus   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L G L

Smad2 D.rerio   D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N S E - R F C L C L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 X.laevis   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 H.sapiens   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 M.musculus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R L C L G L

Smad3 C.auratus   D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G F T D P S N A E - R F C L G L

Smad3 G.gallus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 S.scrofa   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad3 R.norvegicus   D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G F T D P S N S E - R F C L G L

Smad1 S.mansoni   T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G F T D P S N N S D R F C L G L

Smad1 M.musculus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 H.sapiens   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 S.scrofa   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 R.norvegicus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad1 X.tropicalis   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 G.gallus   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad1 D.rerio   D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G F T D P S N N R N R F C L G L

Smad1 C.coturnix   D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G F T D P S N N K N R F C L G L

Smad5 H.sapiens   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 M.musculus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K S R F C L G L

Smad5 R.norvegicus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P A N N K S R F C L G L

Smad5 G.gallus   D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad5 D.rerio   D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G F T D P S N N K N R F C L G L

Smad8 M.musculus   D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 R.norvegicus   D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G F T D P S N N R N R F C L G L

Smad8 G.gallus   N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G F T D P S N N K N R F C L G L

26

2 27

0 28

0 29

0 30

0 31

0

AR

BR

Smads: Comparing two Groups

Page 21: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[21] [21] [21]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

?? (putative)HR0B5R337

?? (putative)KRk0.18LoopR334

ALK1/2NLMq0H1M327

c-Ski/SnoNKrsE0L1E309

ALK1/2SAe0H1A323

?? (putative)IV0H1V325

c-Ski/SnoNNsd–0L1–

c-Ski/SnoNNSa0L1S308

c-Ski/SnoNLiT0B3T298

c-Ski/SnoNViLMi0.11B3L297

c-Ski/SnoNTrlP0B3P295

c-Ski/SnoNSqQ0.16loopQ294

TR-INQt0B2Q284

?? (putative)HyF0loopF273

?? (putative)KqlsA0loopA272

?? (putative)EqCSh0loopS269

SARAAcenTm0B1’T267

SARAVfmLa0B1’L263

InteractionBRARSHSec.str.Pos.

Finding Low-harmony sites in Smad-MH2 2

70 2

80 2

90

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D A A P V M Y H E P A F W C S I S Y Y E L N T R V G E T F H A S Q P S I T V D G

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y S E P A F W C S I A Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E S A F W C S I S Y Y E L N Q R V G E T F H A S Q P S L T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

D L Q P V T Y C E P A F W C S I S Y Y E L N Q R V G E T F H A S Q P S M T V D G

T M H P V N Y Q E P K Y W C S I V Y Y E L N N R V G E A F N A S Q L S I I I D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G

D V H P V A Y Q E P K H W C S I V Y Y E L N N R V G E A F L A S S T S V L V D G

D V Q A V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S I L V D G

D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q P V A Y E E P K H W C S I V Y Y E L N N R V G E A F H A S S T S V L V D G

D V Q P V E Y Q E P S H W C S I V Y Y E L N N R V G E A Y H A S S T S V L V D G

D F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G

D F R P V C Y E E P L H W C S V A Y Y E L N N R V G E T F Q A S S R S V L I D G

N F R P V C Y E E P Q H W C S V A Y Y E L N N R V G E T F Q A S S R S I L I D G

30

0

Page 22: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[22] [22] [22]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Finding Low-harmony sites in Smad-MH2

21%

7%

28%

33%

79%

93%

(SH=0) 32

(SH<0.2) 40

Sequence Harmony

59%10%31%12SDPpred

21%

3%

%FP

48%

76%

%FN

52%21TreeDet

21%6AMAS

%TPPredictMethod

?? (putative)HR0B5R337

?? (putative)KRk0.18LoopR334

ALK1/2NLMq0H1M327

c-Ski/SnoNKrsE0L1E309

ALK1/2SAe0H1A323

?? (putative)IV0H1V325

c-Ski/SnoNNsd–0L1–

c-Ski/SnoNNSa0L1S308

c-Ski/SnoNLiT0B3T298

c-Ski/SnoNViLMi0.11B3L297

c-Ski/SnoNTrlP0B3P295

c-Ski/SnoNSqQ0.16loopQ294

TR-INQt0B2Q284

?? (putative)HyF0loopF273

?? (putative)KqlsA0loopA272

?? (putative)EqCSh0loopS269

SARAAcenTm0B1’T267

SARAVfmLa0B1’L263

InteractionBRARSHSec.str.Pos.

Page 23: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[23] [23] [23]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Smad-MH2: Low Harmony Patches

Page 24: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[24] [24] [24]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Smad-MH2: Functional Clusters

R462 C463

Q400

R410 W368

Y366

A392

S269

F273

N443

Q294

Q309L297

L440

N381

A354

V461

S460Q407

Q364

P360

R365

T267

A272

I341

P295S308

T298R337F346

P378

Q284

V325

A323R427

M327T430

R334FAST1, Mixer, SARA

c-Ski/SnoN

SARA

TR-I/ALK1/2TR-I/BMPR-I

?SARA/Mixer

TR-I/BMPR-I/ALK1/2

?

receptor-binding

retention & transcription factorsco-repressors

Page 25: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[25] [25] [25]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Comparison to Other Prediction Methods

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

80.0% 85.0% 90.0% 95.0% 100.0%

Specificity

Sen

sitiv

ity

AMASSDP-predTreeDetSequence Harmony

23 sites

8 sites

Page 26: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[26] [26] [26]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Comparison to Other Prediction Methods (2)

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

80.0% 85.0% 90.0% 95.0% 100.0%

Specificity

Sen

sitiv

ity

AMAS cumulativeAMASSDP-predTreeDetSH + Entropy (inc)SH + Entropy (dec)Sequence Harmony

Page 27: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[27] [27] [27]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

80.0% 85.0% 90.0% 95.0% 100.0%

Specificity

Sen

sitiv

ity

AMAS cumulativeAMASSDP-predTreeDetSH + Ranges + E(inc)SH + Ranges + E(dec)SH + Entropy (inc)SH + Entropy (dec)Sequence Harmony

18 sites

2

Comparison to Other Prediction Methods (3)

Page 28: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[28] [28] [28]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Ras family: Rab5 vs. Rab6

0%

20%

40%

60%

80%

100%

40% 50% 60% 70% 80% 90% 100%

Specificity

Sen

sitiv

ity

SDP-pred

TreeDet

SH + E(dec)

SH + E(inc)

SH + Ranges + E(dec)

Sequence Harmony

Page 29: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[29] [29] [29]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Ras Family: Ras vs. Ral

0%

20%

40%

60%

80%

100%

75% 80% 85% 90% 95% 100%

Specificity

Se

nsi

tivity

SDPpred

TreeDet

SH + E(dec)

SH + E(inc)

SH + Ranges + E(dec)

Sequence Harmony

Page 30: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[30] [30] [30]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

MIP family: AQP vs. GLP

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

Specificity

Se

nsi

tivity

SDPpred (5Å)

TreeDet (5Å)

Sequence Harmony (5Å)

Page 31: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[31] [31] [31]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Conclusions Smad-MH2 Sequence Harmony• 40 Sites of Low Sequence Harmony in Smad-MH2

• different between the AR (TGF-) and BR (BMP) sub-type Smads

• Low Harmony sites in Smad-MH2 are functionally relevant

• Other methods do not select all known (functional) sites!

Sequence information maps to structure: Next: Analyze Protein-Protein Interactions

• 14 Low Harmony Sites in Smad-MH2 of unknown function

• 11 putative functions from structural considerations

• promising candidates that determine TGF-/BMP specificity

• confirm (or rebuke) putative functions?

Page 32: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[32] [32] [32]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

General Conclusions

• Lack of experimental data • Adequate quality and quantity hard to attain

• Discriminating power of test-sets varies

• Conservation not best identifier for functional differences• Selections too conservative and not very specific

• Differences, as measured by Sequence Harmony good alternative• Selections include most known sites, but somewhat

lower specificity

Page 33: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[33] [33] [33]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Brajenovic, M. et al. J. Biol. Chem. 2004;279:12804-12811

Connectivity map of human Par complexes based on TAP purifications and co-immunoprecipitation experiments

Connectivity map of human Par complexes based on TAP purifications and co-immunoprecipitation experiments. The TAP-tagged proteins used as baits are represented as rhomboids. Lines connecting proteins indicate presence in a TAP complex or coimmunoprecipitation (dotted lines). The width of each line represents the degree of sequence coverage of the identification, which depends on the robustness of the interaction but also on the expression level and a number of other factors. Green boxes/lines represent previously known interactors/interactions; red boxes/lines represent novel interactors/interactions. Proteins that are found specifically with only one TAP-protein are grouped in boxes (S1–S6), whereas proteins that are consistently found together with more than one TAP-protein are grouped in modules (M1 and M2).

Page 34: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[34] [34] [34]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Page 35: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[35] [35] [35]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Page 36: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[36] [36] [36]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Charting protein complexes, signaling pathways, and networks in the immune system

Bauch A, Superti-Furga G Source: IMMUNOLOGICAL REVIEWS 210: 187-207 APR 2006

Page 37: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[37] [37] [37]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Copyright ©2006 by the National Academy of Sciences

Yang, Xiaowen et al. (2006) Proc. Natl. Acad. Sci. USA 103, 17237-17242

Fig. 3. The selective nature of the primary interaction site

Canonical interaction motifs:Mode I: R/K-X-X-S/T-X-PModeII:R/K-X-X-X-S/T-X-PModeIII: S-W-T-Y (C-term.)

Page 38: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[38] [38] [38]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Page 39: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[39] [39] [39]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Copyright ©2006 by the National Academy of Sciences

Yang, Xiaowen et al. (2006) Proc. Natl. Acad. Sci. USA 103, 17237-17242

Fig. 5. Dynamic nature of the 14-3-3 dimers

Fig. 5. Dynamic nature of the 14-3-3 dimers. (A) Crystal structure of the apo-isoform looking down the peptide binding grooves, which are labeled open and closed for the individual monomers. (B) Superimposition of all seven closed state 14-3-3 isoforms using only one monomer as the reference, with shown in blue and in green. The other 14-3-3 monomers, which have intermediate positions, are colored transparent gray.

Page 40: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[40] [40] [40]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Copyright ©2006 by the National Academy of Sciences

Yang, Xiaowen et al. (2006) Proc. Natl. Acad. Sci. USA 103, 17237-17242

Fig. 1. Overview of the dimeric 14-3-3 structure

Fig. 1. Overview of the dimeric 14-3-3 structure. Helices and loops involved in target domain interactions are labeled.Each monomer is colored blue to red from the N to C terminus. An aperture exists at the central dimeric interface, which is marked with a circle.

Yang et al. 2006 Structural basis for protein–protein interactions in the 14-3-3 protein family PNAS 103, 17237

Page 41: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[41] [41] [41]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Copyright ©2006 by the National Academy of Sciences

Yang, Xiaowen et al. (2006) Proc. Natl. Acad. Sci. USA 103, 17237-17242

Fig. 2. Schematic representation of the heterodimerization process involving the 14-3-3epsilon (green) and zeta (yellow) isoformsThe lines between identified residues indicate specific interactions

Page 42: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[42] [42] [42]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

HIV Differential Progression/Replication• Differences in disease progression in HIV-infected

patients based on:

• Immunotype (e.g., B57 vs. non-B57)

• Occurrence of specific 'escape' mutations

• Aim: apply Sequence Harmony to find (additional) key sites that determine disease progression or viral replication rates

Page 43: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[43] [43] [43]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Input: multiple sequence alignment of capsid protein

Page 44: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[44] [44] [44]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Comparison of multiple groups:• B57 vs. non-B57

• 'Progressors' (P) vs. 'Long-term non-progressors' (L)

• Early stage vs. Late stage

• Late stage: progressors (P) vs. non-progressors (L)

• is especially interesting: what defines the 'non-progression'

Page 45: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[45] [45] [45]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

HIV Capsid Specificity:B57 vs. non-B57

• 36 selected residues from the 422 residue alignment

• below the cutoff of 0.9.

• 26 sites (excluding gaps): all 7 known B57 escape mutations

Position Sequence RankAli Ref Harmony

  251     T242     0.05     1    156     I147     0.44     2    123     -     0.49     5    15     R15     0.50     1  

  182     S173     0.68     1    122     -     0.70     5    136     -     0.75     7    121     -     0.76     5    257     G248     0.78     1    12     E12     0.80     1  

  168     V159     0.80     1    62     G62     0.82     1  

  401     T389     0.83     1    55     E55     0.83     2  

  127     N124     0.83     5    130     Q127     0.83     7    277     L268     0.84     1    390     -     0.85     1    104     I104     0.87     1    53     T53     0.87     2  

  273     R264     0.87     1    125     T122     0.87     5    132     -     0.87     7    133     -     0.87     7    134     -     0.87     7    131     -     0.87     7    135     -     0.87     7    111     S111     0.88     1    409     K397     0.88     1    289     T280     0.88     1    224     L215     0.88     1    91     R91     0.89     1  

  155     A146     0.90     2    379     V370     0.90     1  

Page 46: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[46] [46] [46]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Output nB57/B57: Structure

Page 47: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[47] [47] [47]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Output: 'Stereotypes'

Page 48: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[48] [48] [48]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Output: Distinct Specificity Regions

n-B57 vs. LP

L-early vs L-late

L vs. P

L vs. P-late

L-late vs. P-late

P-early vs. P-late

Page 49: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

[49] [49] [49]

C E N T R F O R I N T E G R A T I V EB I O I N F O R M A T I C S V U

E

Output: Detail in the sequence(s)

Page 50: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

CENTR

FORINTEGRATIVE

BIOINFORMATICSVU

E

Sequence comparison by ‘Sequence Harmony’identifies subtype-specific functional sites

… end …

Page 51: C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

CENTR

FORINTEGRATIVE

BIOINFORMATICSVU

E

… end …