Evolutionarily conserved motifs and modules in mitochondrial protein-protein interaction networks Mohieddin Jafari, Mehdi Sadeghi, Mehdi Mirzaie, Sayed-Amir Marashi, Mostafa Rezaei-Tavirani PII: S1567-7249(13)00250-X DOI: doi: 10.1016/j.mito.2013.09.006 Reference: MITOCH 859 To appear in: Mitochondrion Received date: 20 May 2013 Revised date: 18 August 2013 Accepted date: 23 September 2013 Please cite this article as: Jafari, Mohieddin, Sadeghi, Mehdi, Mirzaie, Mehdi, Marashi, Sayed-Amir, Rezaei-Tavirani, Mostafa, Evolutionarily conserved motifs and modules in mitochondrial protein-protein interaction networks, Mitochondrion (2013), doi: 10.1016/j.mito.2013.09.006 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
34
Embed
Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
�������� ����� ��
Evolutionarily conserved motifs and modules in mitochondrial protein-proteininteraction networks
Mohieddin Jafari, Mehdi Sadeghi, Mehdi Mirzaie, Sayed-Amir Marashi,Mostafa Rezaei-Tavirani
This is a PDF file of an unedited manuscript that has been accepted for publication.As a service to our customers we are providing this early version of the manuscript.The manuscript will undergo copyediting, typesetting, and review of the resulting proofbefore it is published in its final form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers thatapply to the journal pertain.
were applied to analyze the global properties of the common PIN. Using these
tools, various network properties like closeness centrality and betweenness
centrality were computed. The final visualization of common PIN was done by
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
10
ForceAtlas2 layout in Gephi. In order to find modules, simulated annealing
algorithm (Wang and Zhang, 2007) was used. MAVisto software (Schreiber and
Schwöbbermeyer, 2005) was applied to detect network motifs with three, four, five
and six nodes. Z-scores and p-values were computed by generating one hundred
randomized networks. Similar to Zhang et al. (2005), we implemented the
procedure to find network themes (see Supplementary file 2).
2.4. Leave-one-out cross validation
PIN of Zebrafish (Danio rerio) was selected for validation of results obtained
from the analysis of the common PIN. The same procedure was followed to extract
zebrafish mitochondrial PIN data. Then, we obtained a list of common
mitochondrial OPs of all five species (Supplementary file 1B). Here, our goal is to
investigate whether it is possible to predict interactions among these OPs in one
species based on the common interactions of the four other species. A leave-one-
out cross validation was then performed as follows. At each iteration, PIN of one
species was compared to the common PIN of the four other species. In each
comparison, the number of common edges and network motifs were analyzed.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
11
3. Results and Discussion
3.1. Construction of the common PIN
It is known that there are many interactions between mitochondrial and
cytoplasmic proteins. However, mitochondrial proteins and pathways are more
conserved across different eukaryotic species compared to proteins and pathways
in cytoplasm (Müller et al., 2012). Therefore, we focused on the mitochondrial PIN
to include merely those proteins and pathways which are persistently present in the
four distantly-related species, namely human, mouse, fruit fly and worm.
In order to have the common PIN of different mitochondria, the redundancy of
orthologous protein sets (OPs, which are the nodes of the common PIN) should be
minimized. On the other hand, in the process of finding orthologous proteins, as a
result of different evolutionary events such as gene duplication, we found
examples of proteins in one species which are orthologous to a protein of another
species. Then, edges are added between those OP pairs which have conserved
interactions across the four species. Then, isolated single nodes were excluded.
The resulting PIN includes 80 nodes and 169 edges, and the overall percentage of
unique nodes (i.e., those OPs which share no proteins with other OPs) increased
from 39.1% to 60.1%. Still, there were some similar OPs which differ in only one
or two proteins (40.9%). Some of these nodes have the same neighbors. These
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
12
nodes were merged as well. The final common PIN contains 51 nodes and 111
edges, including two connected components of size 2 and 49. The ratio of unique
OPs in the final network is 83.3% (Fig. 3). This value is comparable to what was
reported in a previous study (Stuart et al., 2003). The major connected component
of the common PIN, which contains 49 nodes and 110 edges, was used in the rest
of our analysis (Fig. 4 and Supplementary file 3).
In the common PIN, the average path length and diameter are 3.54 and 8,
respectively, which shows that this network is small-world (Yu et al., 2008).
Moreover, in this network average degree and heterogeneity are 4.49 and 0.68,
respectively. This means that the common PIN is scale-free.
3.2. Functional modules in the common PIN
In the common PIN, we found five modules by simulated annealing algorithm
(Guimerà and Amaral, 2005; Wang and Zhang, 2007). Each of these modules has a
good conformity with a specific biological function. The functional modules can
be classified into five major categories (Table1). The module related to the
translation process and ribosome structure contains eight unique OPs, including S7
and S16 (in 28S subunit), L11 (in 39S subunit), one ribosomal release factor, and
two translation elongation factors, and additionally two adenylate kinases. In this
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
13
module, S7 is connected to all nodes, and therefore, it is a provincial hub that can
play a key role in translation process.
The second module is related to protein import into mitochondrial inner
membrane (eight proteins). Most of the OPs in this module are related to the TIM
complex (subunits 8, 9, 10, 13, 16 and 22). GrpE (a member of PAM complex) and
L24 (a 39S ribosomal protein) also appear in this module.
The OPs in the third module are mostly involved in TCA cycle, including citrate
synthase, three subunits of succinate dehydrogenase, 2-oxyglutarate
dehydrogenase, isocitrate dehydrogenase, succinyl-CoA ligase and frataxin. In this
module, succinate dehydrogenase iron-sulfur subunit (OP16) and ATP synthase
subunit alpha (OP4) are provincial and global hubs, respectively. OP16 has
connections to the electron transportation chain module (i.e., the fourth module,
see below), while OP4 has links to both the electron transportation chain and the
translation module.OP4 has the highest betweenness centrality value. This property
is related to the evolutionary importance of the ATP synthase subunit alpha as a
producer of ATP, the main product of mitochondria (Joy et al., 2005).
The fourth module, which is related to the electron transportation chain, includes
eight OPs; cytochrome b, four subunits of cytochrome c oxidase, chains 1 and 4 of
NADH-ubiquinone oxidoreductase and surfeit locus protein 1. The latter protein is
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
14
probably involved in the biogenesis of the COX complex (Zhu et al., 1998). In this
module, the subunits of cytochrome c oxidase have the highest within-module,
which confirms that this protein complex has a central role in the electron
transportation chain.
The fifth module is involved in several metabolic processes, including three
enzymes of ubiquinone metabolism, three enzymes involved in the catabolism of
amino acids and two enzymes of the lipid and lipoyl metabolism.
In the constructed common PIN, some protein-protein interaction data are lost
due to our strict rules in finding and keeping OPs. However, module analysis
shows that the modular structure of the common PIN has a good agreement with
distinct biological functions in mitochondrion. In some previous studies (Wang and
Zhang, 2007), the opposite is shown, i.e., no correlation between structural and
functional modules was found. The difference between these observations is
presumably due to the fact that we included only highly-reliable and highly-
conserved interactions in our common PIN of four mitochondia, while the previous
studies used a much larger dataset of whole PINs of organisms which also includes
unreliable interactions.
In another study, modules in the common PIN are compared to the modules
found in the PIN of each species. More clearly, in case of each PIN at least five
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
15
modules are found. Among these modules, we found those five modules which
matched best with the five modules of the common PIN based on their shared
proteins. For each of the modules of common PIN, frequencies of the shared
proteins are computed. The results are summarized in Supplementary file 4. This
comparison shows that most of these modules do not show a great variability
across the four species.
It should be emphasized that with our method for finding OPs and common
interactions, some well-known protein complexes may be overlooked. For
example, we did not find protein complexes for some known mitochondrial
function such as TOM, PAM and SAM complexes. To better investigate the reason
for disappearance of these complexes in the common PIN, we studied these three
complexes as follows. By searching the full-text in UniProtKB database, we found
a total number of 148 mitochondrial proteins related to (but not necessarily
involved in) these three complexes. By analyzing these 148 proteins, 48 OPs were
found. However, only 7 OPs have conserved interactions across human, mouse,
fruit fly and worm. Among these, six OPs, namely OP_23, OP_43, OP_44, OP_45,
OP_46 and OP_50, are not part of TOM, PAM or SAM complexes (i.e., they are
found simply because the keywords TOM, PAM and SAM appear somewhere in
their annotation). The remaining OP, which is OP_30, is part of the SAM complex
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
16
but also has lipoyl synthase activity. In our analysis this OP is classified in the fifth
module, which mainly includes proteins with enzymatic activities.
Protein data incompleteness and interaction data incompleteness can be responsible
for the relatively small size of the common PIN and lack of many of the known
mitochondrial complexes. As an example, TOM7 is reported in TrEMBL database
(which includes only predicted unconfirmed protein sequences). Therefore, there is
no biological observation which can confirm the activity of this protein in fruit fly,
and hence, TOM7 is not included in the common PIN. On the other hand, OP_a233
in Supplementary file 1A (i.e., TOM40) is present in all of the four species, but
there is no conserved interaction between this OP and any other OPs of the
common PIN.
3.3. PIN motifs and themes
In another part of our study, motif analysis was performed on the common PIN.
We analyzed all of the conserved motifs containing three, four, five and six nodes.
In the common PIN, 92% of these motifs are significantly overrepresented in
comparison with the randomized networks (Supplementary file 5A). “Triangle”
and also “complete quadrangle” have significantly higher Z-scores than the other
three- and four-node motifs, while “V-motif”, “U-motif” and “Paw” are not found
to be significantly overrepresented (Table 2). Similar patterns are observed in the
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
17
analysis of network motifs of the mitochondrial PIN of each species
(Supplementary file 5B).
We decided to investigate the biological relevance of overrepresented network
motifs by comparing the proteins with the modules in which they are located. It is
unlikely to draw biological conclusions in the analysis of three-node motifs in
undirected graphs (Milo et al., 2004). Since it is proposed that by analysis of
network themes one can obtain biologically relevant conclusions (Zhang et al.,
2005), we decided to study network themes instead of the three-node motifs.
Figure 5 shows four network motifs and three sets of network themes and their
appearance frequencies in the five modules of the common PIN. Explanation on
the possible biological relevance of these motifs and themes are presented in
Supplementary file 2.
While previous studies have reported that overrepresentation of network motifs
is related to biological processes (Wuchty et al., 2003), here we suggest that certain
network themes may also have specific biological functions. These results
emphasize that the motif and theme superfamilies have evolved for similar tasks
(Milo et al., 2004). The different biological functions not only correlated to the
topological properties, but also conserved during evolution.
3.4. Cross validation of PIN
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
18
Construction of the common PIN by strict rules results in a network with very
small false positive rate. In order to confirm the accuracy of the common PIN,
mitochondrial PIN of zebrafish was selected for leave-one-out cross validation.
Zebrafish was chosen for this analysis because of having an acceptable number of
known proteins in Swiss-Prot.
In the beginning of the cross-validation, we found a list of common OPs in all
five species. Then, in each iteration, common PIN of four species was constructed
and compared to the PIN of the remaining species. In Table 3, an edge of the
remaining species is said to be confirmed by the common PIN if the same
interaction is observed in all the four other species (the fourth column), or in at
least three of the four species (the sixth column).
The results of this analysis are shown in Table 3. Generally, there is an
acceptable agreement between the common PIN of four species and the PIN of the
remaining species which has not been used in the common PIN construction. When
at least three interactions are required for confirming a corresponding interaction in
the remaining species, the results are improved up to 94%.
The “conservedness” of significantly overrepresented network motifs (see Table
2) is also investigated. In Table 4 we report the frequency of each network motif.
In general, we found an acceptable agreement between the motifs in the common
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
19
PIN and the motifs of the remaining PIN. Comparing with the study of Wuchty et
al. (2003) (the last row of the table) the frequencies reported in our study are much
greater. This difference is presumably due to the fact that Wuchty et al. have
considered only those interactions which are determined experimentally (retrieved
from DIP database in 2003 (Xenarios et al., 2002)). Therefore, there is a
considerable probability that many interactions are not counted in their dataset,
which is equivalent to a large false negative rate.
4. Conclusions
In this paper, we introduce a network of common mitochondrial protein-protein
interactions using strict rules for finding common interactions. As a result, this
network has a minimal number of false positives. Additionally, in our study, both
computational and experimental data are exploited for construction, which
increases the number of included interactions in the common PIN.
We believe that application of the common PIN can stimulate more relevant
discussions on the biological properties of the protein-protein interaction network.
For example, finding a hub is more meaningful in the common PIN than a full PIN
of an organism, due to the reliability of the edges in the common PIN.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
20
Acknowledgements
The authors would like to thank Zhi Wang and Jianzhi Zhang (University of Michigan) for the
module finding program (the implementation of the simulated annealing algorithm). We also
thank Sadegh Azimzadeh Jamalkandi (NIGEB), Ali Sharifi-Zarchi (University of Tehran) and
Hadi Pourmohammadi (Shahid Beheshti University) for their valuable comments during this
project. This work is in part supported by a grant from Institute for Research in Fundamental
Sciences (IPM) (No. CS 1391-0-01).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
21
References
Amoutzias, G.D., Robertson, D.L., Bornberg-Bauer, E., 2004. The evolution of protein interaction networks in regulatory proteins. Comp. Funct. Genomics 5, 79-84.
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G., 2000. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25-29.
Bastian, M., Heymann, S., M, J., 2009. Gephi : An Open Source Software for Exploring and Manipulating Networks, International AAAI Conference on Weblogs and Social Media, Paris, France.
Berg, J., Lässig, M., Wagner, A., 2004. Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications. BMC Evol. Biol. 4, 51.
Bhatia, V.N., Perlman, D.H., Costello, C.E., McComb, M.E., 2009. Software tool for researching annotations of proteins: open-source protein annotation software with data visualization. Anal. Chem. 81, 9819-9823.
del Sol, A., Fujihashi, H., O'Meara, P., 2005. Topology of small-world networks of protein-protein complex structures. Bioinformatics 21, 1311-1315.
Dong, J., Horvath, S., 2007. Understanding network concepts in modules. BMC Syst. Biol. 1, 24. Estrada, E., 2010. Quantifying network heterogeneity. Phys. Rev. E. 82, 066102. Fraser, H.B., Wall, D.P., Hirsh, A.E., 2003. A simple dependence between protein evolution rate and
the number of protein-protein interactions. BMC Evol. Biol. 3, 11. Grindrod, P., Kibble, M., 2004. Review of uses of network and graph theory concepts within
proteomics. Expert Rev. Proteomics 1, 229-238. Guimerà, R., Amaral, L.A.N., 2005. Functional cartography of complex metabolic networks. Nature
433, 895-900. Hirsh, A.E., Fraser, H.B., 2001. Protein dispensability and rate of evolution. Nature 411, 1046-1049. Joy, M.P., Brock, A., Ingber, D.E., Huang, S., 2005. High-betweenness proteins in the yeast protein
interaction network. J. Biomed. Biotechnol. 2005, 96-103. Junker, B.H., Schreiber, F., 2007. Analysis of Biological Networks. John Wiley & Sons, Inc. Kiemer, L., Cesareni, G., 2007. Comparative interactomics: comparing apples and pears? Trends
Biotechnol. 25, 448-454. Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I., Sheffer, M., Alon, U., 2004.
Superfamilies of evolved and designed networks. Science 303, 1538-1542. Müller, M., Mentel, M., van Hellemond, J.J., Henze, K., Woehle, C., Gould, S.B., Yu, R.-Y., van der
Giezen, M., Tielens, A.G.M., Martin, W.F., 2012. Biochemistry and evolution of anaerobic energy metabolism in eukaryotes. Microbiol. Mol. Biol. Rev. 76, 444-495.
Perocchi, F., Jensen, L.J., Gagneur, J., Ahting, U., von Mering, C., Bork, P., Prokisch, H., Steinmetz, L.M., 2006. Assessing systems properties of yeast mitochondria through an interaction map of the organelle. PLoS Genet. 2, e170.
Pfeiffer, T., Soyer, O.S., Bonhoeffer, S., 2005. The evolution of connectivity in metabolic networks. PLoS Biol. 3, e228.
Pieroni, E., de la Fuente van Bentem, S., Mancosu, G., Capobianco, E., Hirt, H., de la Fuente, A., 2008. Protein networking: insights into global functional organization of proteomes. Proteomics 8, 799-816.
Reja, R., Venkatakrishnan, A.J., Lee, J., Kim, B.C., Ryu, J.W., Gong, S., Bhak, J., Park, D., 2009. MitoInteractome: mitochondrial protein interactome database, and its application in 'aging network' analysis. BMC Genomics 10, S20.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
22
Schreiber, F., Schwöbbermeyer, H., 2005. MAVisto: a tool for the exploration of network motifs. Bioinformatics 21, 3572-3574.
Shutt, T.E., Shadel, G.S., 2007. Expanding the mitochondrial interactome. Genome Biol. 8, 203-203. Smoot ME, O.K., Ruscheinski J, Wang PL, Ideker T., 2011. Cytoscape 2.8: new features for data
integration and network visualization. Bioinformatics 27, 431-432. Stuart, J.M., Segal, E., Koller, D., Kim, S.K., 2003. A gene-coexpression network for global discovery of
conserved genetic modules. Science 302, 249-255. Szklarczyk, D., Franceschini, A., Kuhn, M., Simonovic, M., Roth, A., Minguez, P., Doerks, T., Stark, M.,
Muller, J., Bork, P., Jensen, L.J., von Mering, C., 2011. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 39, D561-568.
Thomas, A., Cannings, R., Monk, N.A.M., Cannings, C., 2003. On the structure of protein-protein interaction networks. Biochem. Soc. Transac. 31, 1491-1496.
Wagner, A., 2003. How the global structure of protein interaction networks evolves. Proc. Biol. Sci. 270, 457-466.
Wang, Z., Zhang, J., 2007. In search of the biological significance of modular structures in protein networks. PLoS Comput. Biol. 3, e107.
Watts, D.J., 2004. The “New” Science of Networks. Annu. Rev. Sociol. 30, 243-270. Wuchty, S., Oltvai, Z.N., Barabási, A.L., 2003. Evolutionary conservation of motif constituents in the
yeast protein interaction network. Nat. Genet. 35, 176-179. Xenarios, I., Salwínski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D., 2002. DIP, the Database of
Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucl. Acids Res. 30, 303-305.
Yu, Q.B., Li, G., Wang, G., Sun, J.C., Wang, P.C., Wang, C., Mi, H.L., Ma, W.M., Cui, J., Cui, Y.L., Chong, K., Li, Y.X., Li, Y.H., Zhao, Z., Shi, T.L., Yang, Z.N., 2008. Construction of a chloroplast protein interaction network and functional mining of photosynthetic proteins in Arabidopsis thaliana. Cell Res. 18, 1007-1019.
Zhang, L.V., King, O.D., Wong, S.L., Goldberg, D.S., Tong, H.Y., Lesage, G., Andrews, B., Bussey, H., Roth, F.P., 2005. Motifs , themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J. Biol. 4, 6.
Zhu, Z., Yao, J., Johns, T., Fu, K., De Bie, I., Macmillan, C., Cuthbert, A.P., Newbold, R.F., Wang, J., Chevrette, M., Brown, G.K., Brown, R.M., Shoubridge, E.A., 1998. SURF1, encoding a factor involved in the biogenesis of cytochrome c oxidase, is mutated in Leigh syndrome. Nat. Genet. 20, 337-343.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
23
Figure legends
Figure 1: List of network motifs. Ten different network motifs, along with the names used in our
manuscript are presented.
Figure 2: (A) Example of an orthologous protein set, OP_44. An edge links protein X and
protein Y if the protein sequences have ≥30% identity in pairwise sequence alignment. (B)
Example of two similar OPs (OP_45 and OP_46). Three of the four proteins show ≥30% identity
in the OPs. The differences in proteins of fruit fly results different OPs.
Figure 3: (A) OPs multiplicity. It Shows the percentage of OPs (y-axis) and contains a particular
number of proteins from a single organism (x- axis). For example, there are 96.1% OPs that
include unique mouse proteins and 3.9% OPs that include a mouse protein repeated two times.
None OPs contain a mouse protein that is repeated three times. (B) Summary of pairwise
alignment results. Four bar plots show total number of mitochondrial proteins in each organism
mutually compared to others; human, mouse, fruit fly and worm respectively.
Figure 4: The 49 proteins which are persistently present in the five modules of the common PIN
(modules found by d the simulated annealing algorithm). Nodes with greater degrees are shown
as circles with greater diameters. Nodes in each module are shown with a unique color.
Figure 5: Distinct distributions of the four significantly overrepresented network motifs (Table
2) and three set of themes in the common PIN. The themes in panel (E) are all possible themes
made by combination of Triangle motifs sharing one certain edge. The resulting themes have ≥4
nodes and ≥5 edges. Panel (F) summarizes all themes made of Triangles, assuming that each pair
of Triangles share a unique edge. Panel (G) includes all themes which can be made by Triangles
sharing a certain node.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
24
Tables
Table 1: Properties of nodes in the common mitochondrial PIN. The module IDs, the values of four different
centrality measures like degree, closeness and betweenness are computed for all OPs in the common PIN. Proteins
are categorized based on the functions in which their modules are involved, namely, translation, inner membrane
import, TCA cycle, electron transport and metabolic activities.