Sequence variations, protein interactions, and diseases Teresa Przytycka NIH / NLM / NCBI October 7, 2009 Inferring protein interaction from sequence co-evolution
Sequence variations, protein
interactions, and diseases
Teresa Przytycka
NIH / NLM / NCBI
October 7, 2009
Inferring protein interaction
from sequence co-evolution
• Protein interaction and sequence co-
evolution
Interacting proteins are expected to co-
evolve to ensure proper binding
BSOSC Review, November 2008 4
Ph
ylo
gen
eti
c
Tre
es
Protein A Protein B
Ort
ho
log
s
Organism 1
Organism 2
Organism 3
Organism n
Mirrortree Method:Protein A Protein B
Inferring protein interaction from
the co-evolution principle
[Goh et al 2000, Pazos and Valencia 2001]
5
Evolutionary vector
1
2
3
.
.
.
1 2 3 ….. n
Distance Matrix
BSOSC Review, November 2008 6Compute correlation
Ph
ylo
gen
eti
c
Tre
es
Protein A Protein B
Ort
ho
log
s
Organism 1
Organism 2
Organism 3
Organism n
Mirrortree Method:Protein A Protein B
Inferring protein interaction from
the co-evolution principle
Simple idea but lot’s of questions…
• How to separate co-evolution due to
common speciation history form co-
evolution due to function?
• Is the co- evolution signal distributed
uniformly over the sequence? Between
binding site only?
• Predicting Interaction specificity SOSC Review, November 2008 7
Kann et. al. Proteins 2007
Kann et. al. JMB 2009
Jothi et. al. JMB 2007
Jothi et. al. Bioinformatics 2005
Challenges
• How to separate co-evolution due to
common speciation history form co-
evolution due to function?
• Is the co- evolution signal distributed
uniformly over the sequence? Between
binding site only?
• Predicting Interaction specificity BSOSC Review, November 2008 8
Kann et. al. Proteins 2007
Kann et. al. JMB 2009
Jothi et. al. JMB 2007
Jothi et. al. Bioinformatics 2005
Do binding sites co-evolve more tightly?
BSOSC Review, November 2008 9Kann et. al. JMB 2009
Binding sites are important but not the
only contributor of the co- evolutionary
signal
BSOSC Review, November 2008 11Kann et. al. JMB 2009
BSOSC Review, November 2008 12
Predicting interacting domains
Jothi et. al. JMB 2007
Given interacting multi-domain proteins
domains that are in contact
BSOSC Review, November 2008 13
Co
rre
lati
on
Mirror tree approach can be used to recognize
interacting domains
MSA of
Protein A
MSA of
Protein B
Do
ma
in
Ali
gn
me
nts
Sim
ilari
ty
Ma
tric
es
Protein A Protein B
Organism 3
Organism 1
Organism 2
Organism n
Ort
ho
log
s
0.63 0.83 0.79
0.59 0.91 0.89
In 64% cases, the domain pair with
highest correlation was interacting
(55% expected by chance)
RESULTS: Jothi et. al. JMB 2007
BSOSC Review, November 2008 14
Predicting interacting domains
Jothi et. al. JMB 2007
Given protein-protein interaction network
domains that are in contact
Guimaraes et. al. Genome Biology 2008
Given interacting multi-domain proteins
domains that are in contact
BSOSC Review, November 2008 15
• Protein interaction and sequence co-
evolution
• Predicting domain interaction from
protein interaction networks
BSOSC Review, November 2008 16
Parsimony approach
Assumption: Protein interactions are
mediated by domain interactions
Hypothesis: Interactions evolved in
most parsimonious way
Method: Find the smallest set of
domain pairs whose interaction
would explain all protein
interactions in the network
Guimaraes et. al. Genome Biology 2008
BSOSC Review, November 2008 17
Additional problems to solve:
Constraints: one per protein interaction Pm Pn
Linear programming formulation
For each domain pair Di Dj: variable xij taking value 0 or 1xij
• Model the noise in the network
• Estimate p-values
Pm
Pn
Objective function(representing parsimony assumption):
Interacting domains pairs – domains pairs with Xij=1
Guimaraes et. al. Genome Biology 2008
BoSC, October 2006 18
Results compared to previous methods: Identifying
interacting domain pair in interacting protein pair
BSOSC Review, November 2008 19
• Protein interaction and sequence co-
evolution
• Predicting domain interaction from
protein interaction networks
• Combining genetic sequence variation,
genome wide expression profile and
protein interaction to infer pathways
dysregulated in complex diseases
Genetic variations in individuals affects gene
expression level
20
Co
ntr
ol
1
Co
ntr
ol
2
Co
ntr
ol
3
Ca
se 1
Ca
se 2
Ca
se 3
Ca
se 4
Ca
se 5
Ca
se 6
Ca
se 7
Case 8
Gene 1
Gene 2
Gene 3
.
.
.
.
.
Gene 3
Genomes
loci eQTL
Case 1
Ca
se
2
Case 7
…
…
Genotype variations
Huang et.al Bioinformatics 2009
Bringing PPI network and other high throughput
networks
21
Co
ntr
ol
1
Co
ntr
ol
2
Co
ntr
ol
3
Ca
se 1
Ca
se 2
Ca
se 3
Ca
se 4
Ca
se 5
Ca
se 6
Ca
se 7
Case 8
Gene 1
Gene 2
Gene 3
.
.
.
.
.
Gene 3
Genomes
loci
Target geneCausal gene
Case 1
Ca
se
2
Case 7
…
…
Kim et.al in submitted
eQTL
Associations of genes expressed differently
in disease /control groups are primary target
22
Co
ntr
ol
1
Co
ntr
ol
2
Co
ntr
ol
3
Ca
se 1
Ca
se 2
Ca
se 3
Ca
se 4
Ca
se 5
Ca
se 6
Ca
se 7
Case 8
Gene 1
Gene 2
Gene 3
.
.
.
.
.
Gene 3
loci
controls Disease Cases C
ase 1
Ca
se
2
Case 7
…
…
eQTL
Representative target disease genesPutative causal mutations
Uncovering causal genes and dys-regulated
pathways
23
Co
ntr
ol
1
Co
ntr
ol
2
Co
ntr
ol
3
Ca
se 1
Ca
se 2
Ca
se 3
Ca
se 4
Ca
se 5
Ca
se 6
Ca
se 7
Case 8
Gene 1
Gene 2
Gene 3
.
.
.
.
.
Gene 3
loci
controls Disease Cases C
ase 1
Ca
se
2
Case 7
…
…
Kim et.al. submitted
eQTL
Disease markers
Causal genes
Acknowledgments
BSOSC Review, November 2008 24
Former lab members
Katia Guimaraes (associate professor, Brazil)
Raja Jothi (currently PI at NIEHS )
Elena Zotenko (currently Max Planck Institute)
Current lab members
Dong Yeon Cho
Yoo-ah Kim
Yang Huang
Damian Wojtowicz
Jie Zheng
Collaborators
Maricel Kann UMBC
BSOSC Review, November 2008 25
BSOSC Review, November 2008 26
Compute
conservation profile
Use conserved positions only
Additional correction using previously mentioned methods
Evolutionarily conserved regions help separate functional
co-evolution from co-evolution due common speciation
historyLow entropy (evolutionarily conserved)
Kann et. al. Proteins 2007