Top Banner
Evolutionarily conserved motifs and modules in mitochondrial protein-protein interaction networks Mohieddin Jafari, Mehdi Sadeghi, Mehdi Mirzaie, Sayed-Amir Marashi, Mostafa Rezaei-Tavirani PII: S1567-7249(13)00250-X DOI: doi: 10.1016/j.mito.2013.09.006 Reference: MITOCH 859 To appear in: Mitochondrion Received date: 20 May 2013 Revised date: 18 August 2013 Accepted date: 23 September 2013 Please cite this article as: Jafari, Mohieddin, Sadeghi, Mehdi, Mirzaie, Mehdi, Marashi, Sayed-Amir, Rezaei-Tavirani, Mostafa, Evolutionarily conserved motifs and modules in mitochondrial protein-protein interaction networks, Mitochondrion (2013), doi: 10.1016/j.mito.2013.09.006 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
34

Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

Apr 30, 2023

Download

Documents

Shima Tavakol
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

�������� ����� ��

Evolutionarily conserved motifs and modules in mitochondrial protein-proteininteraction networks

Mohieddin Jafari, Mehdi Sadeghi, Mehdi Mirzaie, Sayed-Amir Marashi,Mostafa Rezaei-Tavirani

PII: S1567-7249(13)00250-XDOI: doi: 10.1016/j.mito.2013.09.006Reference: MITOCH 859

To appear in: Mitochondrion

Received date: 20 May 2013Revised date: 18 August 2013Accepted date: 23 September 2013

Please cite this article as: Jafari, Mohieddin, Sadeghi, Mehdi, Mirzaie, Mehdi, Marashi,Sayed-Amir, Rezaei-Tavirani, Mostafa, Evolutionarily conserved motifs and modulesin mitochondrial protein-protein interaction networks, Mitochondrion (2013), doi:10.1016/j.mito.2013.09.006

This is a PDF file of an unedited manuscript that has been accepted for publication.As a service to our customers we are providing this early version of the manuscript.The manuscript will undergo copyediting, typesetting, and review of the resulting proofbefore it is published in its final form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers thatapply to the journal pertain.

Page 2: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

1

Evolutionarily conserved motifs and modules in mitochondrial

protein-protein interaction networks

Mohieddin Jafari1, Mehdi Sadeghi

2*, Mehdi Mirzaie

1, 3, Sayed-Amir Marashi

4*, Mostafa Rezaei-Tavirani

1

1Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

2National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran, Iran.

3School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.

4Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran.

*Corresponding authors

Mehdi Sadeghi

National Institute of Genetic Engineering and Biotechnology, Pajoohesh Blvd., 17 Km Tehran-Karaj Highway,

Tehran, Iran. P.O. Box: 161/14965

Tel: +98-21 44580373

E-mail address: [email protected]

Sayed-Amir Marashi

Department of Biotechnology, College of Science, University of Tehran, Enghelab Ave., Tehran, Iran

E-mail address: [email protected]

Email addresses:

MJ: [email protected]

MM: [email protected]

SAM: [email protected]

MRT: [email protected]

Page 3: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

2

Abstract

Advances in organelle interactomics have led to new insights into organelle

functions. In this study, we considered the common mitochondrial PIN of four

evolutionarily distant eukaryotic species, namely Homo sapiens, Mus musculus,

Drosophila melanogaster and Caenorhabditis elegans. By comparative

interactomics analysis of mitochondrial PINs in these organisms, five conserved

modules were identified. Modules comprise the main mitochondrial tasks,

including proteins involved in translation process, mitochondrial import inner

membrane proteins, TCA cycle enzymes, mitochondrial electron transport chain,

and metabolic enzymes. Furthermore, we reemphasize that subgraphs of network,

i.e., motifs and themes, may represent evolutionarily conserved topological units

which are biologically significant.

Keywords: Comparative interactomics; Organelle interactomics; Mitochondria;

Evolution.

Abbreviation1

1 PIN: Protein Interaction Network

OP: Orthologous Protein

Page 4: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

3

1. Introduction

Protein-protein interaction network (PIN) analysis is one of the fastest growing

research areas in molecular systems biology (Kiemer and Cesareni, 2007).

Analysis of PINs has helped us in better understanding the cellular processes in a

systems level. In such studies, As a result of the inherent complexity of PINs, a

proposed strategy is to firstly partition the large-scale networks into isolated

subnetworks like organelles.

Amongst cellular organelles, mitochondrion has attracted a lot of attention in

proteomics and interactomics studies, mainly because of its importance in energy

production, apoptosis, aging and human diseases (Shutt and Shadel, 2007). In a

previous study, a yeast PIN including 876 proteins were constructed (Perocchi et

al., 2006). Analysis of this PIN, consequently, resulted in functional annotation of

the unknown proteins. In another study, a database called MitoInteractome was

created by predicting mitochondrial PINs in 74 species. Additionally, human PIN

data was used to construct the network of proteins involved in aging (Reja et al.,

2009).

In the following subsection, we formally introduce the definitions which will be

used throughout the rest of the paper.

Page 5: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

4

1.1. Basic definitions in PIN analysis

Each PIN contains proteins as nodes (or vertices), which are connected by edges

(or links) representing physical or functional interactions (Pieroni et al., 2008).

PINs are usually represented as undirected graphs, although interactions might be

directed. Distance between nodes i and j is the size of the shortest path between i

and j, i.e., the minimum number of edges which should be passed for travelling

from i to j.

Degree of a node is defined as the number of links attached to the node. A

network is said to be scale-free if the degree distribution of its nodes follows

power-law distribution (del Sol et al., 2005). In many real-world networks,

including PINs, node degree distribution follows a power-law distribution,

although some studies have reported that the degree distributions in PINs of yeast

and humans do not exactly follow power-law distribution (Grindrod and Kibble,

2004; Thomas et al., 2003).

Each “high-degree” node of a network is called a hub. A Global hub is a node

which is a hub in complete network, while a provincial hub is a node which is a

hub only in a subnetwork (Guimerà and Amaral, 2005).

A function that assigns a numerical value to a node is called centrality (Junker

and Schreiber, 2007). For example, degree is a centrality function. Closeness

Page 6: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

5

centrality of node v is defined as the inverse of the sum of all pairwise distances

between v and any other node in the network. Betweenness centrality of node v is

defined as the number of shortest paths (between any other nodes i and j) which

include v.

Heterogeneity of a network is defined as the coefficient of variation of the

degree distribution. Mathematically speaking, heterogeneity is equal to the root of

variance of degrees, divided by the mean of degrees (Dong and Horvath, 2007).

Scale-free networks typically have a heterogeneity value of 0.4 to 0.6 (Estrada,

2010).

Some studies have focused on different subgraphs which appear in a PIN,

including modules, motifs and themes. A partitioning of a network into

subnetworks is called a modular decomposition if each subnetwork includes a high

number of inside-subnetwork links and a low number of between-subnetwork links

(Guimerà and Amaral, 2005). Any small subgraph is called a network motif

(Wuchty et al., 2003). The list of all three- and four-node network motifs is

presented in Figure 1. If the frequency of a motif in a network is statistically

significantly higher than what is expected by chance, then this network motif is

said to be overrepresented. A network theme is a simple graph formed as a union

of identical network motifs (Zhang et al., 2005).

Page 7: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

6

1.2. Biological networks and evolution

It is widely accepted that fitness of elements within a biological system depends

on their interactions. Additionally, it is believed that networks are generally

optimized by natural selection (Pfeiffer et al., 2005). There are two lines of studies

related to network evolution.

Firstly, some studies have presented models for the evolution of network

structures. For example, it has been shown that the scale-free characteristics of

complex networks can arise from preferential attachment of new nodes to the

previous nodes with high degrees (hubs) (Watts, 2004). Clearly, by removing or

adding edges, changes may occur in the topological properties of networks

(Amoutzias et al., 2004; Berg et al., 2004; Wagner, 2003).

Secondly, some studies have investigated the properties real-world networks in

order to understand their evolutionary relationships. In such studies, conserved

network substructures are typically examined. For instance, PIN motifs in six

evolutionarily distant organisms have been previously investigated (Wuchty et al.,

2003), confirming that network motifs could be recognized as evolutionary

conserved topological units. Moreover, it was shown that the evolutionary rate of

proteins is straightforwardly related to fitness, necessity and their interactions to

Page 8: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

7

other proteins (Fraser et al., 2003; Hirsh and Fraser, 2001). In another study, gene-

expression correlation networks were used to identify functionally conserved

modules in four evolutionarily distant organisms (Stuart et al., 2003). It has been

suggested that conserved subgraphs in biological networks represent conservation

of similar functions (Milo et al., 2004).

1.3. The goal of the present work

In the present study, with a comparative interactomics approach, mitochondrial

PINs in four organisms (Mus musculus, Drosophila melanogaster, Caenorhabditis

elegans and Homo sapiens) are analyzed to discover a “common PIN” which

includes high-confidence protein-protein interactions. The resulting network, in

turn, helps in finding the conserved network motifs and modules. Finally, the

relationship between these substructures and their biological functions are

discussed.

2. Materials and Methods

2.1. Datasets

Mitochondrial protein IDs of four organisms, namely, Mus musculus (mouse),

Drosophila melanogaster (fruit fly), Caenorhabditis elegans (worm) and Homo

Page 9: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

8

sapiens (human) were retrieved from Swiss-Prot. Briefly, in UniProtKB dataset we

used Advanced Search to find the all the proteins in the desired “organism” by

choosing “subcellular locations” as “mitochondrion” and finally choosing the

“reviewed” proteins only (which merely includes manually annotated and reviewed

proteins of Swiss-Prot). For example, for human mitochondrial proteins the results

can be retrieved by searching“(organism: "Homo sapiens" AND annotation: (type:

location "mitochondrion")) AND reviewed: yes” in UniProtKB. For human,

mouse, fruit fly and worm, 915, 873, 199 and 183 proteins were retrieved,

respectively.

In the next step, STRING database was used to find the interactions among these

proteins (Szklarczyk et al., 2011). We used default setting of STRING, which

includes all methods for identifying protein-protein interactions (namely

“neighborhood”, “gene fusion”, “co-occurrence”, “co-expression”, “experiments”,

“databases” and “text mining”, with a confidence score of ≥0.4). All data were

extracted from UniProt release 2012_06 and STRING 9.0 (17 Jan. 2012).

For annotating proteins with gene ontology (GO) terms (Ashburner et al., 2000),

we used “biological processes” category. STRAP software (Bhatia et al., 2009)

was employed to retrieve the data from gene ontology database.

2.2. Orthologous protein sets and the “common PIN”

Page 10: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

9

The Needleman-Wunsch algorithm, as the exact pairwise alignment algorithm,

was used to find orthologous protein sets (OPs) among the four organisms. Each

OP is a set of four proteins from the four species, with mutual sequence identity of

≥30% (Fig. 2 and Supplementary file 1A). Some OPs contain similar proteins of

the same species, due to the existence of paralogous proteins. In such cases, the

corresponding proteins were merged if they have the same neighbors in their

corresponding networks.

In the next step, an edge was added between two OPs, say OP1=(p1h, p1m, p1f,

p1w) and OP2=(p2h, p2m, p2f, p2w), if protein pairs (p1h, p2h), (p1m, p2m), (p1f,

p2f) and (p1w, p2w) interact each other based on STRING. Therefore, those OPs

which do not satisfy this condition will not be used in the common PIN. At the

end, we obtain a network in which the nodes represent OPs and the edges represent

common interactions. With the above-mentioned procedure, we construct the

common PIN in a way that the number of redundant OPs is minimized.

2.3. Network analysis

Gephi (Bastian et al., 2009)and Cytoscape (Smoot ME, 2011) software packages

were applied to analyze the global properties of the common PIN. Using these

tools, various network properties like closeness centrality and betweenness

centrality were computed. The final visualization of common PIN was done by

Page 11: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

10

ForceAtlas2 layout in Gephi. In order to find modules, simulated annealing

algorithm (Wang and Zhang, 2007) was used. MAVisto software (Schreiber and

Schwöbbermeyer, 2005) was applied to detect network motifs with three, four, five

and six nodes. Z-scores and p-values were computed by generating one hundred

randomized networks. Similar to Zhang et al. (2005), we implemented the

procedure to find network themes (see Supplementary file 2).

2.4. Leave-one-out cross validation

PIN of Zebrafish (Danio rerio) was selected for validation of results obtained

from the analysis of the common PIN. The same procedure was followed to extract

zebrafish mitochondrial PIN data. Then, we obtained a list of common

mitochondrial OPs of all five species (Supplementary file 1B). Here, our goal is to

investigate whether it is possible to predict interactions among these OPs in one

species based on the common interactions of the four other species. A leave-one-

out cross validation was then performed as follows. At each iteration, PIN of one

species was compared to the common PIN of the four other species. In each

comparison, the number of common edges and network motifs were analyzed.

Page 12: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

11

3. Results and Discussion

3.1. Construction of the common PIN

It is known that there are many interactions between mitochondrial and

cytoplasmic proteins. However, mitochondrial proteins and pathways are more

conserved across different eukaryotic species compared to proteins and pathways

in cytoplasm (Müller et al., 2012). Therefore, we focused on the mitochondrial PIN

to include merely those proteins and pathways which are persistently present in the

four distantly-related species, namely human, mouse, fruit fly and worm.

In order to have the common PIN of different mitochondria, the redundancy of

orthologous protein sets (OPs, which are the nodes of the common PIN) should be

minimized. On the other hand, in the process of finding orthologous proteins, as a

result of different evolutionary events such as gene duplication, we found

examples of proteins in one species which are orthologous to a protein of another

species. Then, edges are added between those OP pairs which have conserved

interactions across the four species. Then, isolated single nodes were excluded.

The resulting PIN includes 80 nodes and 169 edges, and the overall percentage of

unique nodes (i.e., those OPs which share no proteins with other OPs) increased

from 39.1% to 60.1%. Still, there were some similar OPs which differ in only one

or two proteins (40.9%). Some of these nodes have the same neighbors. These

Page 13: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

12

nodes were merged as well. The final common PIN contains 51 nodes and 111

edges, including two connected components of size 2 and 49. The ratio of unique

OPs in the final network is 83.3% (Fig. 3). This value is comparable to what was

reported in a previous study (Stuart et al., 2003). The major connected component

of the common PIN, which contains 49 nodes and 110 edges, was used in the rest

of our analysis (Fig. 4 and Supplementary file 3).

In the common PIN, the average path length and diameter are 3.54 and 8,

respectively, which shows that this network is small-world (Yu et al., 2008).

Moreover, in this network average degree and heterogeneity are 4.49 and 0.68,

respectively. This means that the common PIN is scale-free.

3.2. Functional modules in the common PIN

In the common PIN, we found five modules by simulated annealing algorithm

(Guimerà and Amaral, 2005; Wang and Zhang, 2007). Each of these modules has a

good conformity with a specific biological function. The functional modules can

be classified into five major categories (Table1). The module related to the

translation process and ribosome structure contains eight unique OPs, including S7

and S16 (in 28S subunit), L11 (in 39S subunit), one ribosomal release factor, and

two translation elongation factors, and additionally two adenylate kinases. In this

Page 14: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

13

module, S7 is connected to all nodes, and therefore, it is a provincial hub that can

play a key role in translation process.

The second module is related to protein import into mitochondrial inner

membrane (eight proteins). Most of the OPs in this module are related to the TIM

complex (subunits 8, 9, 10, 13, 16 and 22). GrpE (a member of PAM complex) and

L24 (a 39S ribosomal protein) also appear in this module.

The OPs in the third module are mostly involved in TCA cycle, including citrate

synthase, three subunits of succinate dehydrogenase, 2-oxyglutarate

dehydrogenase, isocitrate dehydrogenase, succinyl-CoA ligase and frataxin. In this

module, succinate dehydrogenase iron-sulfur subunit (OP16) and ATP synthase

subunit alpha (OP4) are provincial and global hubs, respectively. OP16 has

connections to the electron transportation chain module (i.e., the fourth module,

see below), while OP4 has links to both the electron transportation chain and the

translation module.OP4 has the highest betweenness centrality value. This property

is related to the evolutionary importance of the ATP synthase subunit alpha as a

producer of ATP, the main product of mitochondria (Joy et al., 2005).

The fourth module, which is related to the electron transportation chain, includes

eight OPs; cytochrome b, four subunits of cytochrome c oxidase, chains 1 and 4 of

NADH-ubiquinone oxidoreductase and surfeit locus protein 1. The latter protein is

Page 15: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

14

probably involved in the biogenesis of the COX complex (Zhu et al., 1998). In this

module, the subunits of cytochrome c oxidase have the highest within-module,

which confirms that this protein complex has a central role in the electron

transportation chain.

The fifth module is involved in several metabolic processes, including three

enzymes of ubiquinone metabolism, three enzymes involved in the catabolism of

amino acids and two enzymes of the lipid and lipoyl metabolism.

In the constructed common PIN, some protein-protein interaction data are lost

due to our strict rules in finding and keeping OPs. However, module analysis

shows that the modular structure of the common PIN has a good agreement with

distinct biological functions in mitochondrion. In some previous studies (Wang and

Zhang, 2007), the opposite is shown, i.e., no correlation between structural and

functional modules was found. The difference between these observations is

presumably due to the fact that we included only highly-reliable and highly-

conserved interactions in our common PIN of four mitochondia, while the previous

studies used a much larger dataset of whole PINs of organisms which also includes

unreliable interactions.

In another study, modules in the common PIN are compared to the modules

found in the PIN of each species. More clearly, in case of each PIN at least five

Page 16: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

15

modules are found. Among these modules, we found those five modules which

matched best with the five modules of the common PIN based on their shared

proteins. For each of the modules of common PIN, frequencies of the shared

proteins are computed. The results are summarized in Supplementary file 4. This

comparison shows that most of these modules do not show a great variability

across the four species.

It should be emphasized that with our method for finding OPs and common

interactions, some well-known protein complexes may be overlooked. For

example, we did not find protein complexes for some known mitochondrial

function such as TOM, PAM and SAM complexes. To better investigate the reason

for disappearance of these complexes in the common PIN, we studied these three

complexes as follows. By searching the full-text in UniProtKB database, we found

a total number of 148 mitochondrial proteins related to (but not necessarily

involved in) these three complexes. By analyzing these 148 proteins, 48 OPs were

found. However, only 7 OPs have conserved interactions across human, mouse,

fruit fly and worm. Among these, six OPs, namely OP_23, OP_43, OP_44, OP_45,

OP_46 and OP_50, are not part of TOM, PAM or SAM complexes (i.e., they are

found simply because the keywords TOM, PAM and SAM appear somewhere in

their annotation). The remaining OP, which is OP_30, is part of the SAM complex

Page 17: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

16

but also has lipoyl synthase activity. In our analysis this OP is classified in the fifth

module, which mainly includes proteins with enzymatic activities.

Protein data incompleteness and interaction data incompleteness can be responsible

for the relatively small size of the common PIN and lack of many of the known

mitochondrial complexes. As an example, TOM7 is reported in TrEMBL database

(which includes only predicted unconfirmed protein sequences). Therefore, there is

no biological observation which can confirm the activity of this protein in fruit fly,

and hence, TOM7 is not included in the common PIN. On the other hand, OP_a233

in Supplementary file 1A (i.e., TOM40) is present in all of the four species, but

there is no conserved interaction between this OP and any other OPs of the

common PIN.

3.3. PIN motifs and themes

In another part of our study, motif analysis was performed on the common PIN.

We analyzed all of the conserved motifs containing three, four, five and six nodes.

In the common PIN, 92% of these motifs are significantly overrepresented in

comparison with the randomized networks (Supplementary file 5A). “Triangle”

and also “complete quadrangle” have significantly higher Z-scores than the other

three- and four-node motifs, while “V-motif”, “U-motif” and “Paw” are not found

to be significantly overrepresented (Table 2). Similar patterns are observed in the

Page 18: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

17

analysis of network motifs of the mitochondrial PIN of each species

(Supplementary file 5B).

We decided to investigate the biological relevance of overrepresented network

motifs by comparing the proteins with the modules in which they are located. It is

unlikely to draw biological conclusions in the analysis of three-node motifs in

undirected graphs (Milo et al., 2004). Since it is proposed that by analysis of

network themes one can obtain biologically relevant conclusions (Zhang et al.,

2005), we decided to study network themes instead of the three-node motifs.

Figure 5 shows four network motifs and three sets of network themes and their

appearance frequencies in the five modules of the common PIN. Explanation on

the possible biological relevance of these motifs and themes are presented in

Supplementary file 2.

While previous studies have reported that overrepresentation of network motifs

is related to biological processes (Wuchty et al., 2003), here we suggest that certain

network themes may also have specific biological functions. These results

emphasize that the motif and theme superfamilies have evolved for similar tasks

(Milo et al., 2004). The different biological functions not only correlated to the

topological properties, but also conserved during evolution.

3.4. Cross validation of PIN

Page 19: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

18

Construction of the common PIN by strict rules results in a network with very

small false positive rate. In order to confirm the accuracy of the common PIN,

mitochondrial PIN of zebrafish was selected for leave-one-out cross validation.

Zebrafish was chosen for this analysis because of having an acceptable number of

known proteins in Swiss-Prot.

In the beginning of the cross-validation, we found a list of common OPs in all

five species. Then, in each iteration, common PIN of four species was constructed

and compared to the PIN of the remaining species. In Table 3, an edge of the

remaining species is said to be confirmed by the common PIN if the same

interaction is observed in all the four other species (the fourth column), or in at

least three of the four species (the sixth column).

The results of this analysis are shown in Table 3. Generally, there is an

acceptable agreement between the common PIN of four species and the PIN of the

remaining species which has not been used in the common PIN construction. When

at least three interactions are required for confirming a corresponding interaction in

the remaining species, the results are improved up to 94%.

The “conservedness” of significantly overrepresented network motifs (see Table

2) is also investigated. In Table 4 we report the frequency of each network motif.

In general, we found an acceptable agreement between the motifs in the common

Page 20: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

19

PIN and the motifs of the remaining PIN. Comparing with the study of Wuchty et

al. (2003) (the last row of the table) the frequencies reported in our study are much

greater. This difference is presumably due to the fact that Wuchty et al. have

considered only those interactions which are determined experimentally (retrieved

from DIP database in 2003 (Xenarios et al., 2002)). Therefore, there is a

considerable probability that many interactions are not counted in their dataset,

which is equivalent to a large false negative rate.

4. Conclusions

In this paper, we introduce a network of common mitochondrial protein-protein

interactions using strict rules for finding common interactions. As a result, this

network has a minimal number of false positives. Additionally, in our study, both

computational and experimental data are exploited for construction, which

increases the number of included interactions in the common PIN.

We believe that application of the common PIN can stimulate more relevant

discussions on the biological properties of the protein-protein interaction network.

For example, finding a hub is more meaningful in the common PIN than a full PIN

of an organism, due to the reliability of the edges in the common PIN.

Page 21: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

20

Acknowledgements

The authors would like to thank Zhi Wang and Jianzhi Zhang (University of Michigan) for the

module finding program (the implementation of the simulated annealing algorithm). We also

thank Sadegh Azimzadeh Jamalkandi (NIGEB), Ali Sharifi-Zarchi (University of Tehran) and

Hadi Pourmohammadi (Shahid Beheshti University) for their valuable comments during this

project. This work is in part supported by a grant from Institute for Research in Fundamental

Sciences (IPM) (No. CS 1391-0-01).

Page 22: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

21

References

Amoutzias, G.D., Robertson, D.L., Bornberg-Bauer, E., 2004. The evolution of protein interaction networks in regulatory proteins. Comp. Funct. Genomics 5, 79-84.

Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G., 2000. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25-29.

Bastian, M., Heymann, S., M, J., 2009. Gephi : An Open Source Software for Exploring and Manipulating Networks, International AAAI Conference on Weblogs and Social Media, Paris, France.

Berg, J., Lässig, M., Wagner, A., 2004. Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications. BMC Evol. Biol. 4, 51.

Bhatia, V.N., Perlman, D.H., Costello, C.E., McComb, M.E., 2009. Software tool for researching annotations of proteins: open-source protein annotation software with data visualization. Anal. Chem. 81, 9819-9823.

del Sol, A., Fujihashi, H., O'Meara, P., 2005. Topology of small-world networks of protein-protein complex structures. Bioinformatics 21, 1311-1315.

Dong, J., Horvath, S., 2007. Understanding network concepts in modules. BMC Syst. Biol. 1, 24. Estrada, E., 2010. Quantifying network heterogeneity. Phys. Rev. E. 82, 066102. Fraser, H.B., Wall, D.P., Hirsh, A.E., 2003. A simple dependence between protein evolution rate and

the number of protein-protein interactions. BMC Evol. Biol. 3, 11. Grindrod, P., Kibble, M., 2004. Review of uses of network and graph theory concepts within

proteomics. Expert Rev. Proteomics 1, 229-238. Guimerà, R., Amaral, L.A.N., 2005. Functional cartography of complex metabolic networks. Nature

433, 895-900. Hirsh, A.E., Fraser, H.B., 2001. Protein dispensability and rate of evolution. Nature 411, 1046-1049. Joy, M.P., Brock, A., Ingber, D.E., Huang, S., 2005. High-betweenness proteins in the yeast protein

interaction network. J. Biomed. Biotechnol. 2005, 96-103. Junker, B.H., Schreiber, F., 2007. Analysis of Biological Networks. John Wiley & Sons, Inc. Kiemer, L., Cesareni, G., 2007. Comparative interactomics: comparing apples and pears? Trends

Biotechnol. 25, 448-454. Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I., Sheffer, M., Alon, U., 2004.

Superfamilies of evolved and designed networks. Science 303, 1538-1542. Müller, M., Mentel, M., van Hellemond, J.J., Henze, K., Woehle, C., Gould, S.B., Yu, R.-Y., van der

Giezen, M., Tielens, A.G.M., Martin, W.F., 2012. Biochemistry and evolution of anaerobic energy metabolism in eukaryotes. Microbiol. Mol. Biol. Rev. 76, 444-495.

Perocchi, F., Jensen, L.J., Gagneur, J., Ahting, U., von Mering, C., Bork, P., Prokisch, H., Steinmetz, L.M., 2006. Assessing systems properties of yeast mitochondria through an interaction map of the organelle. PLoS Genet. 2, e170.

Pfeiffer, T., Soyer, O.S., Bonhoeffer, S., 2005. The evolution of connectivity in metabolic networks. PLoS Biol. 3, e228.

Pieroni, E., de la Fuente van Bentem, S., Mancosu, G., Capobianco, E., Hirt, H., de la Fuente, A., 2008. Protein networking: insights into global functional organization of proteomes. Proteomics 8, 799-816.

Reja, R., Venkatakrishnan, A.J., Lee, J., Kim, B.C., Ryu, J.W., Gong, S., Bhak, J., Park, D., 2009. MitoInteractome: mitochondrial protein interactome database, and its application in 'aging network' analysis. BMC Genomics 10, S20.

Page 23: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

22

Schreiber, F., Schwöbbermeyer, H., 2005. MAVisto: a tool for the exploration of network motifs. Bioinformatics 21, 3572-3574.

Shutt, T.E., Shadel, G.S., 2007. Expanding the mitochondrial interactome. Genome Biol. 8, 203-203. Smoot ME, O.K., Ruscheinski J, Wang PL, Ideker T., 2011. Cytoscape 2.8: new features for data

integration and network visualization. Bioinformatics 27, 431-432. Stuart, J.M., Segal, E., Koller, D., Kim, S.K., 2003. A gene-coexpression network for global discovery of

conserved genetic modules. Science 302, 249-255. Szklarczyk, D., Franceschini, A., Kuhn, M., Simonovic, M., Roth, A., Minguez, P., Doerks, T., Stark, M.,

Muller, J., Bork, P., Jensen, L.J., von Mering, C., 2011. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 39, D561-568.

Thomas, A., Cannings, R., Monk, N.A.M., Cannings, C., 2003. On the structure of protein-protein interaction networks. Biochem. Soc. Transac. 31, 1491-1496.

Wagner, A., 2003. How the global structure of protein interaction networks evolves. Proc. Biol. Sci. 270, 457-466.

Wang, Z., Zhang, J., 2007. In search of the biological significance of modular structures in protein networks. PLoS Comput. Biol. 3, e107.

Watts, D.J., 2004. The “New” Science of Networks. Annu. Rev. Sociol. 30, 243-270. Wuchty, S., Oltvai, Z.N., Barabási, A.L., 2003. Evolutionary conservation of motif constituents in the

yeast protein interaction network. Nat. Genet. 35, 176-179. Xenarios, I., Salwínski, L., Duan, X.J., Higney, P., Kim, S.M., Eisenberg, D., 2002. DIP, the Database of

Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucl. Acids Res. 30, 303-305.

Yu, Q.B., Li, G., Wang, G., Sun, J.C., Wang, P.C., Wang, C., Mi, H.L., Ma, W.M., Cui, J., Cui, Y.L., Chong, K., Li, Y.X., Li, Y.H., Zhao, Z., Shi, T.L., Yang, Z.N., 2008. Construction of a chloroplast protein interaction network and functional mining of photosynthetic proteins in Arabidopsis thaliana. Cell Res. 18, 1007-1019.

Zhang, L.V., King, O.D., Wong, S.L., Goldberg, D.S., Tong, H.Y., Lesage, G., Andrews, B., Bussey, H., Roth, F.P., 2005. Motifs , themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J. Biol. 4, 6.

Zhu, Z., Yao, J., Johns, T., Fu, K., De Bie, I., Macmillan, C., Cuthbert, A.P., Newbold, R.F., Wang, J., Chevrette, M., Brown, G.K., Brown, R.M., Shoubridge, E.A., 1998. SURF1, encoding a factor involved in the biogenesis of cytochrome c oxidase, is mutated in Leigh syndrome. Nat. Genet. 20, 337-343.

Page 24: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

23

Figure legends

Figure 1: List of network motifs. Ten different network motifs, along with the names used in our

manuscript are presented.

Figure 2: (A) Example of an orthologous protein set, OP_44. An edge links protein X and

protein Y if the protein sequences have ≥30% identity in pairwise sequence alignment. (B)

Example of two similar OPs (OP_45 and OP_46). Three of the four proteins show ≥30% identity

in the OPs. The differences in proteins of fruit fly results different OPs.

Figure 3: (A) OPs multiplicity. It Shows the percentage of OPs (y-axis) and contains a particular

number of proteins from a single organism (x- axis). For example, there are 96.1% OPs that

include unique mouse proteins and 3.9% OPs that include a mouse protein repeated two times.

None OPs contain a mouse protein that is repeated three times. (B) Summary of pairwise

alignment results. Four bar plots show total number of mitochondrial proteins in each organism

mutually compared to others; human, mouse, fruit fly and worm respectively.

Figure 4: The 49 proteins which are persistently present in the five modules of the common PIN

(modules found by d the simulated annealing algorithm). Nodes with greater degrees are shown

as circles with greater diameters. Nodes in each module are shown with a unique color.

Figure 5: Distinct distributions of the four significantly overrepresented network motifs (Table

2) and three set of themes in the common PIN. The themes in panel (E) are all possible themes

made by combination of Triangle motifs sharing one certain edge. The resulting themes have ≥4

nodes and ≥5 edges. Panel (F) summarizes all themes made of Triangles, assuming that each pair

of Triangles share a unique edge. Panel (G) includes all themes which can be made by Triangles

sharing a certain node.

Page 25: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

24

Tables

Table 1: Properties of nodes in the common mitochondrial PIN. The module IDs, the values of four different

centrality measures like degree, closeness and betweenness are computed for all OPs in the common PIN. Proteins

are categorized based on the functions in which their modules are involved, namely, translation, inner membrane

import, TCA cycle, electron transport and metabolic activities.

OP ID Protein Name Degree Closeness

Centrality Betweenness

Centrality Module ID OP_38 28S ribosomal protein S7 11 2.56 260.05 1 OP_19 Elongation factor G 8 2.37 352.18 1 OP_35 39S ribosomal protein L11 8 2.5 166.63 1 OP_20 Elongation factor G 5 3 2.17 1 OP_21 Elongation factor Ts 5 3 0.70 1 OP_39 28S ribosomal protein S16 5 2.87 34.34 1 OP_37 Ribosome-releasing factor 2 4 3.02 0.67 1 OP_27 Adenylate kinase 2 3 3.08 0.20 1 OP_28 Adenylate kinase 2 2 3.1 0 1 OP_29 Adenylate kinase isoenzyme 4 1 3.54 0 1 OP_44 Mitochondrial import inner membrane translocase subunit TIM10 9 3.14 257.92 2 OP_51 Mitochondrial import inner membrane translocase subunit TIM9 5 4.04 2.5 2 OP_23 GrpE protein homolog 1 3 3.18 81.97 2 OP_46 Mitochondrial import inner membrane translocase subunit TIM13 3 4.08 0 2 OP_48 Mitochondrial import inner membrane translocase subunit TIM22 3 4.08 0 2 OP_36 39S ribosomal protein L24 2 3.17 25.8 2 OP_45 Mitochondrial import inner membrane translocase subunit TIM13 2 4.1 0 2 OP_50

Mitochondrial import inner membrane translocase subunit TIM8 A &

B 2 4.1 0 2 OP_47 Mitochondrial import inner membrane translocase subunit TIM16 1 4.17 0 2 OP_04 ATP synthase subunit alpha 13 2.35 580.52 3 OP_16 Succinate dehydrogenase [ubiquinone] iron-sulfur subunit 13 2.92 149.78 3 OP_15 Succinate dehydrogenase [ubiquinone] flavoprotein subunit 6 3.08 24.7 3 OP_40 Succinyl-CoA ligase [ADP/GDP-forming] subunit alpha 5 3.14 27.34 3 OP_06 Citrate synthase 4 3.17 26.84 3 OP_24 Translation factor GUF1 4 2.83 77.13 3 OP_17 Succinate dehydrogenase [ubiquinone] cytochrome b small subunit 3 3.85 0 3 OP_25 Isocitrate dehydrogenase [NAD] subunit alpha &betta & gamma 3 3.85 0.33 3 OP_18 Probable 2-oxoglutarate dehydrogenase 2 3.87 0 3 OP_26 Isocitrate dehydrogenase [NAD] subunit alpha &betta & gamma 2 3.87 0 3 OP_32 Mitochondrial tRNA-specific 2-thiouridylase 1 2 3.14 0 3 OP_22 Frataxin 1 3.89 0 3 OP_10 Cytochrome c oxidase subunit 1 8 3.1 61.23 4 OP_11 Cytochrome c oxidase subunit 2 8 3.77 5.63 4 OP_12 Cytochrome c oxidase subunit 3 8 3.77 5.63 4 OP_33 NADH-ubiquinone oxidoreductase chain 1 8 3 33.85 4 OP_05 ATP synthase subunit beta 7 3.02 22.06 4 OP_14 Cytochrome b 7 3.02 33 4 OP_13 Cytochrome c oxidase subunit 5A 6 3.04 18.81 4 OP_34 NADH-ubiquinone oxidoreductase chain 4 5 3.85 0 4 OP_41 Surfeit locus protein 1 3 4.02 0 4 OP_09 2-methoxy-6-polyprenyl-1,4-benzoquinol methylase 5 2.77 332 5 OP_01 3-hydroxyisobutyrate dehydrogenase 3 4.36 137 5 OP_30 Lipoyl synthase 3 3.54 219 5 OP_07 4-hydroxybenzoate polyprenyl transferase 2 3.73 0 5 OP_08 Ubiquinonebiosynthesis protein COQ4 homolog 2 3.73 0 5 OP_31 Methylmalonate-semialdehyde dehydrogenase [acylating] 2 5.33 47 5 OP_02 Medium-chain specific acyl-CoA dehydrogenase 1 6.31 0 5 OP_03 Aldehyde dehydrogenase X 1 5.37 0 5 OP_42 Dimethyladenosine transferase 1 1 4.52 0 5

Page 26: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

25

Table 2: The results of statistical analysis of all three- and four-node network motifs. Five of these network motifs

are significantly overrepresented. The significant motifs are asterisked.

Motif Shape MAVisto label Vertices Edges No. P-Value Z-Score

Triangle HNL 3 3 86 0 * 12.5

V-motif FS3 3 2 612 1 0

Divided Quadrangle QWTWL 4 5 332 0 * 20.7

Complete Quadrangle TV55L 4 6 30 0 * 29.0

Flag PYM13 4 4 1438 0 * 10.1

Quadrangle QWTU9 4 4 270 0 * 13.1

Paw PY6A3 4 3 1358 1 0

U-motif PYGYL 4 3 3073 0.72 -0.55

Page 27: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

26

Table 3: Results of the leave-one-out cross validation. The common PIN is evaluated with respect to the

“conservedness” of the edges. In the fourth column, an edge was assumed to be confirmed if it exists in all of the

four other PINs. In the sixth column, an edge was assumed to be confirmed if it exists in at least three other PINs. H:

Human; M: Mouse; F: Fruit fly; W: Worm; Z: Zebrafish.

Species involved

in the common

PIN

Compared to

Number of edges

shared by all 4

species

Confirmed

edges

(with 4 edges)

Number of

edges shared by

at least 3 species

Confirmed edges

(with at least 3

edges)

H, M, F, W Z 40 70% 57 86%

H, M, F, Z W 29 58% 39 76%

H, M, Z, W F 43 64% 64 94%

H, F, Z, W M 32 57% 44 79%

M, F, Z, W H 23 50% 37 76%

Page 28: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

27

Table 4: Evolutionary conservation of network motifs. In each row, common PIN of four species is constructed and

the ratios of their motifs which are present in the PIN of a fifth species are calculated. H: Human; M: Mouse; F:

Fruit fly; W: Worm; Z: Zebrafish; A: Arabidopsis thaliana; Y: yeast.

Species

involved in the

common PIN

Compared

to Triangle

Complete

quadrangle

Divided

quadrangle Quadrangle Flag Reference

H, M, F, W Z 65% 38% 37% 36% 36% Present work

H, M, F, Z W 73% 67% 62% 58% 54% Present work

H, M, Z, W F 89% 83% 79% 76% 73% Present work

H, F, Z, W M 61% 37% 37% 43% 34% Present work

M, F, Z, W H 61% 47% 53% 34% 35% Present work

A, H, M, F, W Y 21% 33% 19% 6.7% 7.7% (Wuchty et

al., 2003)

Page 29: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

28

Figure 1

Page 30: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

29

Figure 2

Page 31: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

30

Figure 3

Page 32: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

31

Figure 4

Page 33: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

32

Figure 5

Page 34: Evolutionarily conserved motifs and modules in mitochondrial protein–protein interaction networks

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

33

Highlights

>Comparative interactomics provides evolutionary conserved common network.

>We studied the common mitochondrial protein interaction network of four

eukaryotes.

>Comparative interactomics revealed evolutionarily conserved modules.

>The functional interpretation of the conserved modules is discussed.