Top Banner
Genetic Interaction Network as an Important Determinant of Gene Order in Genome Evolution Yu-Fei Yang, †,‡,1,2,3 Wenqing Cao, ‡,1,2,3 Shaohuan Wu, 1,2,3 and Wenfeng Qian* ,1,2,3 1 State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China 2 Key Laboratory of Genetic Network Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China 3 University of Chinese Academy of Sciences, Beijing, China Present address: Genetron Health Co., Ltd., Beijing, China These authors contributed equally to this work. *Corresponding author: E-mail: [email protected]. Associate editor: Jianzhi Zhang Abstract Although it is generally accepted that eukaryotic gene order is not random, the basic principles of gene arrangement on a chromosome remain poorly understood. Here, we extended existing population genetics theories that were based on two-locus models and proposed a hypothesis that genetic interaction networks drive the evolution of eukaryotic gene order. We predicted that genes with positive epistasis would move toward each other in evolution, during which a negative correlation between epistasis and gene distance formed. We tested and confirmed our prediction with com- putational simulations and empirical data analyses. Importantly, we demonstrated that gene order in the budding yeast could be successfully predicted from the genetic interaction network. Taken together, our study reveals the role of the genetic interaction network in the evolution of gene order, extends our understanding of the encoding principles in genomes, and potentially offers new strategies to improve synthetic biology. Key words: gene order, genetic interaction network, genetic recombination, fitness, yeast. Introduction With thousands of genomes being sequenced, it is increas- ingly being observed that gene order in a genome is not random (Hurst et al. 2004). For example, six genes in the allantoin degradation (DAL) pathway formed a cluster on chromosome IX during the evolution of Saccharomyces cer- evisiae (Wong and Wolfe 2005). More dramatically, three genes in the galactose utilization (GAL) pathway formed a cluster in multiple lineages independently during fungal evo- lution (Slot and Rokas 2010). However, the evolutionary prin- ciples underlying such nonrandom gene order are still elusive, except when neighboring genes form an operon (Lawrence 1999; Lawrence 2002; Qian and Zhang 2008; Zaslaver et al. 2011). A number of hypotheses have been proposed to explain the evolution of gene order. First, the clustering of genes with similar functions in the genome may facilitate their coordi- nated expression. Although neighboring genes indeed tend to have similar expression profiles (Cho et al. 1998; Cohen et al. 2000; Boutanaev et al. 2002; Spellman and Rubin 2002; Williams and Bowles 2004), such phenomena could also be explained by the “leaky” expression of neighboring genes (Spellman and Rubin 2002; Hurst et al. 2004; Liao and Zhang 2008; Ghanbarian and Hurst 2015). Second, house- keeping or essential genes tend to cluster in a genome (Lercher et al. 2002; Pal and Hurst 2003), a phenomenon that might be explained by natural selection to reduce gene expression noise (Batada and Hurst 2007). However, this the- ory cannot explain the nonrandom gene order within and between such clusters. Third, mutational bias, such as tandem gene duplication, could also lead to nonrandom gene order; however, after removing tandem duplicate genes, gene order is still nonrandom in the aspects described earlier (Hurst et al. 2004). Together, these observations suggest that additional evolutionary mechanisms exist to explain nonrandom gene order in the genome. The evolution of gene order may be driven by natural selection to optimize recombination frequencies among genes because gene order determines gene distance (D, de- fined as the number of genes between two genes on a chro- mosome) and gene distance is highly correlated with recombination frequency (supplementary fig. S1, Supplementary Material online, the budding yeast as an ex- ample). Several theoretical analyses suggested that the evolu- tion of recombination frequency between a pair of genes can be influenced by their epistatic interaction (Nei 1967, 1969; Eshel and Feldman 1970; Feldman et al. 1980; Kondrashov 1982, 1988; Charlesworth 1990; Kouyos et al. 2007; Charlesworth and Charlesworth 2011). Here, epistasis, or ge- netic interaction, refers to the phenomenon that the fitness effects of two mutations on two different genes are not in- dependent (Phillips 2008), and can be quantified as the dif- ference between the relative fitness of the double mutant Article ß The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] Open Access 3254 Mol. Biol. Evol. 34(12):3254–3266 doi:10.1093/molbev/msx264 Advance Access publication October 3, 2017 Downloaded from https://academic.oup.com/mbe/article-abstract/34/12/3254/4318638 by Institute of Geneties and Developmental Biology,CAS user on 03 December 2017
13

Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

Jun 12, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

Genetic Interaction Network as an Important Determinant ofGene Order in Genome Evolution

Yu-Fei YangdaggerDagger123 Wenqing CaoDagger123 Shaohuan Wu123 and Wenfeng Qian123

1State Key Laboratory of Plant Genomics Institute of Genetics and Developmental Biology Chinese Academy of Sciences Beijing China2Key Laboratory of Genetic Network Biology Institute of Genetics and Developmental Biology Chinese Academy of Sciences BeijingChina3University of Chinese Academy of Sciences Beijing ChinadaggerPresent address Genetron Health Co Ltd Beijing ChinaDaggerThese authors contributed equally to this work

Corresponding author E-mail wfqiangeneticsaccn

Associate editor Jianzhi Zhang

Abstract

Although it is generally accepted that eukaryotic gene order is not random the basic principles of gene arrangement on achromosome remain poorly understood Here we extended existing population genetics theories that were based ontwo-locus models and proposed a hypothesis that genetic interaction networks drive the evolution of eukaryotic geneorder We predicted that genes with positive epistasis would move toward each other in evolution during which anegative correlation between epistasis and gene distance formed We tested and confirmed our prediction with com-putational simulations and empirical data analyses Importantly we demonstrated that gene order in the budding yeastcould be successfully predicted from the genetic interaction network Taken together our study reveals the role of thegenetic interaction network in the evolution of gene order extends our understanding of the encoding principles ingenomes and potentially offers new strategies to improve synthetic biology

Key words gene order genetic interaction network genetic recombination fitness yeast

IntroductionWith thousands of genomes being sequenced it is increas-ingly being observed that gene order in a genome is notrandom (Hurst et al 2004) For example six genes in theallantoin degradation (DAL) pathway formed a cluster onchromosome IX during the evolution of Saccharomyces cer-evisiae (Wong and Wolfe 2005) More dramatically threegenes in the galactose utilization (GAL) pathway formed acluster in multiple lineages independently during fungal evo-lution (Slot and Rokas 2010) However the evolutionary prin-ciples underlying such nonrandom gene order are still elusiveexcept when neighboring genes form an operon (Lawrence1999 Lawrence 2002 Qian and Zhang 2008 Zaslaver et al2011)

A number of hypotheses have been proposed to explainthe evolution of gene order First the clustering of genes withsimilar functions in the genome may facilitate their coordi-nated expression Although neighboring genes indeed tend tohave similar expression profiles (Cho et al 1998 Cohen et al2000 Boutanaev et al 2002 Spellman and Rubin 2002Williams and Bowles 2004) such phenomena could also beexplained by the ldquoleakyrdquo expression of neighboring genes(Spellman and Rubin 2002 Hurst et al 2004 Liao andZhang 2008 Ghanbarian and Hurst 2015) Second house-keeping or essential genes tend to cluster in a genome(Lercher et al 2002 Pal and Hurst 2003) a phenomenon

that might be explained by natural selection to reduce geneexpression noise (Batada and Hurst 2007) However this the-ory cannot explain the nonrandom gene order within andbetween such clusters Third mutational bias such as tandemgene duplication could also lead to nonrandom gene orderhowever after removing tandem duplicate genes gene orderis still nonrandom in the aspects described earlier (Hurst et al2004) Together these observations suggest that additionalevolutionary mechanisms exist to explain nonrandom geneorder in the genome

The evolution of gene order may be driven by naturalselection to optimize recombination frequencies amonggenes because gene order determines gene distance (D de-fined as the number of genes between two genes on a chro-mosome) and gene distance is highly correlated withrecombination frequency (supplementary fig S1Supplementary Material online the budding yeast as an ex-ample) Several theoretical analyses suggested that the evolu-tion of recombination frequency between a pair of genes canbe influenced by their epistatic interaction (Nei 1967 1969Eshel and Feldman 1970 Feldman et al 1980 Kondrashov1982 1988 Charlesworth 1990 Kouyos et al 2007Charlesworth and Charlesworth 2011) Here epistasis or ge-netic interaction refers to the phenomenon that the fitnesseffects of two mutations on two different genes are not in-dependent (Phillips 2008) and can be quantified as the dif-ference between the relative fitness of the double mutant

Article

The Author 2017 Published by Oxford University Press on behalf of the Society for Molecular Biology and EvolutionThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(httpcreativecommonsorglicensesby-nc40) which permits non-commercial re-use distribution and reproduction in anymedium provided the original work is properly cited For commercial re-use please contact journalspermissionsoupcom Open Access3254 Mol Biol Evol 34(12)3254ndash3266 doi101093molbevmsx264 Advance Access publication October 3 2017Downloaded from httpsacademicoupcommbearticle-abstract341232544318638

by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

(xab) and the multiplicative expectation from those of twosingle mutants (xaBxAb) Previous theories could be summa-rized as two effects the short-term effect and the long-termeffect (fig 1A) On the one hand the linkage of geneticallyinteracting genes (regardless of sign) is advantageous in theshort term because it helps to maintain the epistasis-inducedlinkage disequilibrium (LD) which is favored by natural selec-tion (Nei 1967 1969 Eshel and Feldman 1970 Feldman et al1980 Kouyos et al 2007 Charlesworth and Charlesworth2011) (fig 1A the solid line supplementary note S1Supplementary Material online) Here the coefficient of LDis defined as the difference between the genotype frequencyof the wild-type individuals (XAB) and the multiplicative ex-pectation from the frequencies of the wild-type alleles (XAXB)On the other hand for two deleterious mutations geneticrecombination breaks the negative LD induced by negativeepistasis and thus increases the proportion of doublemutants This increase benefits the population in the longterm by facilitating the purge of deleterious mutations whichhas been suggested to be related to the origin of sexual re-production (Feldman et al 1980 Kondrashov 1982Charlesworth 1990) (fig 1A the dash line)

To summarize genetic recombination is never favored bynatural selection for positively epistatic gene pairs whereasfor negatively epistatic gene pairs the long- and short-termeffects of genetic recombination counteract each other(fig 1A) Therefore epistasis could drive the evolution of re-combination frequencies among genes on the same chromo-some potentially by altering gene order In this process anegative correlation between epistasis and gene distancehereafter referred to as EndashD correlation evolvesFurthermore this correlation would be especially strongamong positively epistatic gene pairs due to the synergisticcombination of the long- and short-term effects (fig 1A)However these theoretical predictions have never been rig-orously tested with empirical data In this study we testedthese predictions with both simulation and yeast empiricalepistasis data Our study thus reveals a basic principle in theevolution of gene order and enhances our power to decodeinformation from well-constructed genetic interaction net-works and thousands of sequenced genomes

Results

Negative EndashD Correlation Is Observed during in silicoEvolutionWe first performed in silico evolution in which two genes (Aand B) were considered each having a wild-type allele (A or B)and a deleterious allele (a or b) The relative fitness of thehaploid wild-type genotype (xAB) was defined as 1 and theepistasis (E) was defined as xabxaBxAb For two genotypeswith different gene distances (D) between A and B gene flowwithin this region is strictly prohibited because recombina-tion within the region between A and B leads to the gain orloss of genes after segregation (Wu and Ting 2004)Furthermore the ldquomodifierrdquo locus of recombination fre-quency is completely linked with A and B (Nei 1967 1969)making it possible to directly compare the fitness of

genotypes with different gene distances Therefore to inves-tigate the impact of gene distance on fitness we comparedthe average fitness of two populations over generations onewith D between A and B equal to 50 and the other withD equal to 0 Based on the empirical data from the buddingyeast S cerevisiae (Mancera et al 2008) these D values cor-respond to recombination frequencies Rfrac14 0264 and 0064respectively (supplementary fig S1 Supplementary Materialonline) In each generation we calculated the frequencychanges of genotypes by considering both natural selectionand genetic recombination Figure 1B shows the results of thefirst 100 generations of in silico evolution when long-termeffects begin to dominate the evolutionary process If epistasisbetween A and B was positive the population with Dfrac14 50was always outcompeted by the population with Dfrac14 0(fig 1B xDfrac14 50xDfrac14 0lt 0) Furthermore the fitness differ-ence increased with the magnitude of the epistasis value(fig 1B) In other words reduced D between two geneswith positive epistasis is favored by natural selection By con-trast if the epistasis between A and B was negative the pop-ulation with Dfrac14 50 exhibited a short-term disadvantagefollowed by a long-term advantage compared with the pop-ulation with Dfrac14 0 (fig 1B) As expected the long-term ad-vantage was due to an elevated purging rate of deleteriousalleles [fig 1C Xa (Dfrac14 50) Xa (Dfrac14 0)lt 0] A similar trend wasobserved when we compared a population with Dfrac14 50 andone with Dfrac14 100 (Rfrac14 0464 supplementary fig S2Supplementary Material online)

We also performed in silico evolution in a series of strainsin which D varied between 0 and 100 (Rfrac14 0064 and 0464respectively) We recorded the average fitness of each popu-lation at the 100th generation (x100) and identified the op-timal D Dopt for each epistasis value (fig 1D) Given that theeffective population size (Ne) in the budding yeast is 107

(Wagner 2005) the minimal selection coefficient that can bedetected for yeast is 107 In other words all D values thatreduce the relative fitness by lt107 are permitted duringevolution We calculated the mean of all permitted D values(dashed line in fig 1D) As predicted in our model a strongnegative EndashD correlation was observed from the results of insilico evolution Importantly such negative EndashD correlationwas also observed at the 50th and 200th generation (supple-mentary fig S3 Supplementary Material online) To furthertest whether the outcome of in silico evolution was sensitiveto population genetics parameters we examined various val-ues for initial allele frequencies and fitness defects We ob-served a negative EndashD correlation with all parameter sets(supplementary fig S4 Supplementary Material online)

Chromosomal Arrangement of Genes in Star-LikeMotifs of Genetic Interaction NetworksThe analyses we have described so far were based on two-locus processes In reality however a gene may have geneticinteractions with multiple genes that together form a com-plex genetic interaction network (Boone et al 2007) In sharpcontrast to the network topology genes are linearly alignedon a limited number of chromosomes and therefore theoptimization of pairwise gene distances may be restricted

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3255Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

by the chromosomal localization of other genes Thus itremains unknown whether the negative EndashD correlationwould evolve in the context of a highly connected networkof genetic interactions

We first examined the impact of epistasis on the chromo-somal order of genes in a star-like motif which is typical inempirical genetic interaction networks (Costanzo et al 2010)To this end we built a toy motif in which a hub gene interactswith nine partner genes with different epistasis values rangingfrom 0004 to 0004 (fig 2A) We fixed the chromosomallocation of the hub gene and attempted to place partnergenes on the same chromosome We focused on the epistasisand gene distance of hub-containing gene pairs and there-fore we placed all partner genes on the same side of the hubgene on the chromosome for convenience We calculatedx100 for each of the total (9frac14) 362880 possible gene ordersand found that x100 varied among them (fig 2B)Importantly the gene order with the highest x100 showeda perfect negative EndashD correlation (qEDfrac141 fig 2C)whereas the gene order with the lowest x100 showed a per-fect positive EndashD correlation (qEDfrac14 1 fig 2D) In fact we

found that x100 was negatively correlated with qED (fig 2Bqfrac14094 Plt 10100 Spearmanrsquos correlation) implyingthat the negative EndashD correlation itself is under natural se-lection To understand whether the negative correlation be-tween x100 and qED is still present under the parametersderived from empirical data we randomly chose epistasisvalues and fitness defects from two genome-wide studies inthe budding yeast (Costanzo et al 2010 2016) and still ob-served strong negative correlations between x100 and qED

(supplementary fig S5A and B and table S1 SupplementaryMaterial online) And we also confirmed that the negativecorrelation between x100 and qED was insensitive to D andinitial allele frequencies (supplementary fig S5C and D andtable S1 Supplementary Material online)

Next we calculated the distance (d) to the fittest geneorder shown in figure 2C which was defined as the num-ber of differently placed genes (fig 2E) for each possiblegene order We found that an increase in d reduced x100

(fig 2E) again emphasizing the impact of gene order onfitness To further investigate the impact of the range ofepistasis on x100 we generated a series of epistasis

A B

C D

FIG 1 Theoretical prediction and computational simulations of the negative EndashD correlation (A) The short- and long-term effects are in the samedirection when epistasis is positive but are in the opposite directions when epistasis is negative Thus a negative EndashD correlation can be predictedfrom the view of population genetics (B) Fitness differences between a strain with Dfrac1450 and a strain with Dfrac140 are plotted over 100 generationsduring simulations of in silico evolution (C) The difference in allele arsquos frequency (Xa) between strains with Dfrac1450 and Dfrac140 are plotted over 100generations during simulations of in silico evolution (D) The average fitness of a population at the 100th generation (x100) is plotted againstepistasis and D For each epistasis we defined all permitted D values with their resulting x100 If x100 is smaller than that of the optimal distance(Dopt) bylt 107 the minimal selective coefficient that can be detected by nature given the effective population size (Ne 107) of yeast D ispermitted The mean of all permitted D values is plotted against epistasis (dashed line)

Yang et al doi101093molbevmsx264 MBE

3256Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

ranging from 0004 to 0020 shuffled the gene order andrecalculated the average x100 of all gene orders with thesame d (fig 2E) We found that although it was always truethat higher d led to a reduction of x100 the increase in therange of epistasis values enlarged the fitness differencesamong gene orders (fig 2E)

Our model further predicted that positive epistasis shouldplay a more important role in the evolution of gene order(fig 1) such that altering the order of partner genes that havepositive epistasis with the hub gene would lead to a largerfitness reduction As expected we observed that the gene-order variants with changes exclusively to the positively

A

B

C D

E

F

FIG 2 The negative EndashD correlation in star-like motifs of genetic interaction networks (A) A toy model of a star-like motif in which gene A is thehub Gene A has positive epistasis with genes B C D and E and negative epistasis with genes G H I and J The range of epistasis is(0004 [0004]frac14) 0008 in this motif (B) Spearmanrsquos correlation coefficient between epistasis and D (qED) varies among gene orders Theaverage fitness at the 100th generation (x100) is negatively correlated with qED (C) The gene order with the highest x100 qEDfrac141 (D) The geneorder with the lowest x100 qEDfrac141 (E) The difference between a gene order and the gene order with the highest x100 (d) is defined as the number ofdifferently placed genes Two examples with dfrac14 3 are shown The heat map of the relative x100 (normalized to the highest x100) is shown Theaverage relative x100 decreases with the increase of d The reduction is more dramatic when the range of epistasis values is larger (F) Shufflingamong genes B C D and E (positive epistasis with the hub gene A) have larger impact on x100 than shuffling among genes G H I and J (negativeepistasis with the hub gene A) P values of one-tailed MannndashWhitney U test are shown The gray dashed line indicates the x100 of the optimal geneorder in panel (C)

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3257Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

epistatic genes generally had a larger reduction in fitness com-pared with those with changes exclusively to the negativelyepistatic genes (fig 2F)

Chromosomal Arrangement of Genes inAll-Connected Motifs of Genetic Interaction NetworksWe further examined the negative correlation between x100

and qED in all-connected motifs To this end we built a toy all-connected motif with five nodes and assigned epistasis valuesin the range of00036 to 00036 to edges (fig 3A) Again weobserved a strong negative correlation between x100 and qED

(fig 3B qfrac14087 Pfrac14 32 1038) The gene order with thehighest fitness had a strong negative EndashD correlation(qEDfrac14100 fig 3C) whereas the gene order with the lowestfitness had a positive EndashD correlation (qEDfrac14 039 fig 3D)Similarly we confirmed that the negative correlation betweenx100 and qED was insensitive to epistasis values fitnessdefects gene distances and initial allele frequencies (supple-mentary fig S6 and table S2 Supplementary Material online)Furthermore we observed that the fitness of a gene orderdecreased with the increase in its d (fig 3E) and this trend wasstronger when the range of epistasis was larger (fig 3E)

Negative EndashD Correlation in S cerevisiaeTo investigate whether the negative EndashD correlation is sup-ported by empirical evidence we retrieved the pairwise epis-tasis data generated by Costanzo et al who systematicallymeasured the vegetative growth rates of both single and dou-ble mutants in the budding yeast S cerevisiae and estimatedepistasis values for 26 million gene pairs (Costanzo et al2010 2016) As expected we observed a significant negativeEndashD correlation among linked genes (fig 4A qfrac14015Pfrac14 40 108 Nfrac14 1254) Consistent with this trend un-linked genes on the same chromosome exhibited lower epis-tasis values (fig 4A gray dashed line) As a control wepermutated the gene orders and recalculated the correlationcoefficients 1000 times We found that the negative EndashDcorrelation disappeared after permutation (fig 4BPlt 0001 permutation test)

These observations are potentially attributable to a num-ber of confounding factors The first is mutational bias suchas tandem duplication However duplicate genes tend tohave negative epistasis (Tischler et al 2006 Dean et al2008 DeLuna et al 2008 Musso et al 2008 Vavouri et al2008 Qian et al 2010) which should result in a positive EndashDcorrelation Nevertheless we controlled for this mutationalbias by randomly keeping only one gene in a gene family andstill observed the negative EndashD correlation (supplementaryfig S7A Supplementary Material online qfrac14017Pfrac14 18 105 Nfrac14 641)

Second genes with coordinated expression are clustered(Cho et al 1998 Cohen et al 2000 Boutanaev et al 2002Spellman and Rubin 2002 Williams and Bowles 2004) Ifcoordinately expressed genes tend to have positive epistasisthe negative EndashD correlation could result from these genesTo control for this effect we first inferred expression patternsimilarity for each pair of genes by calculating thecorrelation of gene expression levels in multiple conditions

(Qian and Zhang 2014) We did not observe a significantcorrelation between epistasis and expression similarity(qfrac140015 Pfrac14 06 Nfrac14 1202) Nevertheless we dividedthese gene pairs into two groups according to expressionsimilarity recalculated the EndashD correlation within each groupand still observed significant negative EndashD correlations (sup-plementary fig S7B and C Supplementary Material online)Similar results were obtained when we calculated the partialEndashD correlation after controlling for expression similarity(partial qfrac14014 Pfrac14 66 107 Nfrac14 1202) In additionwe also found that coordinated gene expression occurringthrough 3D chromatin interactions did not confound ourresults (supplementary fig S7D Supplementary Material on-line qfrac14017 Pfrac14 87 106 Nfrac14 704) which was not un-expected as 3D chromatin interactions do not influencerecombination frequency

Because functionally related genes are nonrandomlydistributed on chromosomes (Wong and Wolfe 2005

A

B

C

D

E

FIG 3 The negative EndashD correlation in all-connected motifs of ge-netic interaction networks (A) A toy model of an all-connected mo-tif (B) The fitness at the 100th generation (x100) is negativelycorrelated with qED (C) The gene order with the highest x100qEDfrac14100 (D) The gene order with the lowest x100 qEDfrac14039(E) The heat map of the relative x100 (normalized to the highestx100) is shown The relative x100 decreases with the increase of dThe reduction is more dramatic when the range of epistasis is larger

Yang et al doi101093molbevmsx264 MBE

3258Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Slot and Rokas 2010) and functional relationships betweengenes may lead to epistasis we next examined whetherfunctional relationships could confound the negativeEndashD correlation We observed similar negative EndashD corre-lations for gene pairs with an either high- or low semanticsimilarity of GO terms in molecular functions (supplemen-tary fig S7E and F Supplementary Material online) biolog-ical processes (supplementary fig S7G and HSupplementary Material online) and cellular components(supplementary fig S7I and J Supplementary Material on-line) Again we calculated partial correlations controllingfor semantic similarity of GO terms in molecular functions(partial qfrac14015 Pfrac14 73 107 Nfrac14 1151) biologicalprocesses (partial qfrac14015 Pfrac14 32 107 Nfrac14 1151)and cellular components (partial qfrac14015Pfrac14 32 107 Nfrac14 1151) All these results indicate thatthe functional relationship is not a confounding factor inthe negative EndashD correlation which is not unexpectedgiven that a large fraction of genetic interactions do notreflect functional relationships (He et al 2010 Costanzoet al 2016)

Finally we investigated the impact of gene expressionnoise on the negative EndashD correlation because it has beenproposed that essential genes were colocalized in open chro-matin regions to reduce gene expression noise (Batada andHurst 2007 Chen and Zhang 2016) To control for this effectwe first calculated the average gene expression noise(Newman et al 2006) for each gene pair We used the dis-tance of each coefficient of variation (CV) to a running me-dian of CV values (DM) to quantify gene expression noise(Newman et al 2006) in order to minimize the effect of geneexpression magnitude on gene expression noise We observeda negative EndashD correlation for gene pairs with an either high-or low average DM (supplementary fig S7K and LSupplementary Material online) Again we calculated thepartial correlation controlling for gene expression noise andstill observed a negative EndashD correlation (partial qfrac14016Pfrac14 97 103 Nfrac14 258)

Positive Epistasis Plays an Important Role in the Originof the Negative EndashD CorrelationOur model further predicted that the reduction of the dis-tance between positively epistatic genes should play a moreimportant role in the formation of the negative EndashD correla-tion (figs 1 and 2) Indeed a significant negative EndashD corre-lation was observed among positively epistatic gene pairs inS cerevisiae (fig 4C qfrac14014 Pfrac14 00064 Nfrac14 391) whereasno significant correlation was observed among negatively ep-istatic gene pairs (fig 4D qfrac140025 Pfrac14 047 Nfrac14 863) Wefurther verified the role of positive epistasis by shuffling epis-tasis values among all 391 positively epistatic gene pairs inS cerevisiae As expected the EndashD correlation was significantlyweakened after the permutation (fig 4E Pfrac14 0005 one-tailedpermutation test) By contrast no significant difference wasobserved after shuffling negative epistasis values (fig 4FPfrac14 0722 one-tailed permutation test) even though the lat-ter analysis shuffled more gene pairs (Nfrac14 863)

A B

C D

E F

G H

FIG 4 A negative EndashD correlation is observed in the empirical geneticinteraction network of the budding yeast S cerevisiae and positiveepistasis plays a more important role in its formation (A) A significantnegative EndashD correlation is observed in S cerevisiae Gene pairs areseparated into bins based on D with equal width of five genes Themean value of epistasis and the standard error of the mean (SEM)within each bin are shown Spearmanrsquos correlation coefficient q andcorresponding P values were calculated from the raw data (Nfrac141254)The gray dashed line shows the average epistasis among unlinkedgenes (Dgt100) (B) Distribution of EndashD correlation coefficients in1000 shuffled genomes The arrow indicates the observed correlationcoefficient in S cerevisiae (C and D) A significant negative EndashD cor-relation is observed among positively epistatic gene pairs (Nfrac14391)but not among negatively epistatic gene pairs (Nfrac14863) Spearmanrsquoscorrelation coefficient q and the corresponding P values are calcu-lated from the raw data The dashed line shows the average epistasisamong unlinked genes (E) The distribution of correlation coefficientsin 1000 artificial genomes in which values of positive epistasis areshuffled The arrow indicates the EndashD correlation coefficient in reality(F) Similar to (E) values of negative epistasis are shuffled (G and H)The proportion of gene pairs with significant positive epistasis is sig-nificantly correlated with D but that with significant negative epis-tasis is not SEMs are estimated based on binomial distribution Thedashed lines show the proportion of gene pairs with significant pos-itive or negative epistasis among unlinked genes

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3259Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

In studies by Costanzo et al epistasis was classified intothree categories significantly positive significantly negativeand nonsignificant (Costanzo et al 2010 2016) A strongnegative correlation was observed between gene distanceand the proportion of gene pairs with significant positiveepistasis (fig 4G qfrac14082 Pfrac14 55 105) whereas no sig-nificant correlation was observed between gene distance andthe proportion of gene pairs with significant negative epistasis(fig 4H qfrac14 043 Pfrac14 0088) All these observations empha-size the important role of positive epistasis in the origin of thenegative EndashD correlation

Epistasis-Driven Evolution of Gene Order after theWhole Genome Duplication in YeastThus far we have observed a negative EndashD correlation anddemonstrated that the correlation was mainly attributable topositively epistatic gene pairs The aforementioned theoriessimulations and empirical evidence led us to propose a hy-pothesis of epistasis-driven evolution of gene order in S cer-evisiae The whole genome duplication (WGD fig 5A) andthe subsequent extensive gene losses substantially changedthe epistatic relations among genes and markedly rewired thegenetic interaction network in yeast (Kellis et al 2004 Dixonet al 2008 Tischler et al 2008 VanderSluis et al 2010) Forinstance only 29 of the identified synthetic lethality is con-served between Schizosaccharomyces pombe and S cerevisiae(Dixon et al 2008) At the same time extensive chromosomerearrangement events occurred For example in areconstructed ancestral species before the WGD (Byrne andWolfe 2005) we identified the gene pairs that were located onthe same chromosome and surprisingly 846 of them arelocalized on different chromosomes in S cerevisiae Morestrikingly among gene pairs that are localized on the samechromosome in both the ancestral species and S cerevisiae996 differ in gene distance Based on these observations weproposed that the rewired genetic interaction network drovethe evolution of gene order resulting in numerous chromo-some rearrangement events (Kellis et al 2004) When genelosses ceased the rewiring of genetic interactions slowed andthe evolutionary force on gene distance also diminishedConsistently synteny relationships are strongly conserved inthe species of the Saccharomyces sensu stricto group (Kelliset al 2003)

Our model predicted that the negative EndashD correlationshould be weaker if the gene order in S cerevisiae has beenunchanged since the WGD The reason is that the gene orderin the ancestor was not subject to the natural selection im-posed by the genetic interaction network of the currentS cerevisiae genome Furthermore given the massive genelosses after the WGD the genetic interaction network cannotbe 100 conserved To test this prediction we calculated thegene distances in the reconstructed ancestral species men-tioned earlier (Byrne and Wolfe 2005) Indeed we found thatthe negative EndashD correlation disappeared when the genedistances in S cerevisiae were replaced by those in the ances-tral species (fig 5B qfrac14 69 103 Pfrac14 026 Nfrac14 26630) Thisobservation indicates that the negative EndashD correlation inS cerevisiae formed during the evolution of gene order after

the WGD Consistently we observed that positively epistaticgene pairs decreased their distances whereas negatively epi-static gene pairs increased their distances during evolution(fig 5C qfrac14017 Pfrac14 0061 Nfrac14 127)

To further test the role of positive epistasis in the evolutionof gene order we identified genes that were ancestrally linked(ie D 100 in the reconstructed ancestor) and examinedwhether they moved toward or away from each other duringevolution Consistent with our model genes with significantpositive epistasis were more likely to move toward each otherthan genes without significant epistasis (fig 5DPfrac14 15 103 two-tailed Fisherrsquos exact test) whereas thedifference between gene pairs with significant negative epis-tasis and those without significant epistasis was not signifi-cant (fig 5D Pfrac14 041 two-tailed Fisherrsquos exact test) Togetherthese observations support our hypothesis of epistasis-drivenevolution of gene order in yeast

Genetic Interaction Network Accurately PredictsGene Order in YeastFinally we determined whether the gene order in S cerevisiaecould be successfully predicted by the empirical data of ge-netic interaction networks (Costanzo et al 2010 2016) Tothis end we identified 22 all-connected three-node motifs in

A B

C D

FIG 5 The origin of the negative EndashD correlation after the WGD inyeast (A) Phylogenetic relationship among yeast species The blackarrow indicates the reconstructed ancestor and the gray arrow indi-cates the WGD event (B) Negative EndashD correlation is not observedwhen the gene order in S cerevisiae is replaced by that in the recon-structed ancestor Gene pairs are separated into bins based on D withequal width of five genes The mean and SEM of epistasis within eachbin are shown Spearmanrsquos correlation coefficient q and the corre-sponding P values were calculated from the raw data (Nfrac14 26630)The dashed line shows the average epistasis among unlinked genes(C) The change in D (S cerevisiaemdashthe reconstructed ancestor) isnegatively correlated with the epistasis in S cerevisiae (Nfrac14 127)(D) The D between a gene pair in S cerevisiae is compared withthat in the reconstructed ancestor Proportions of gene pairs movingtoward and away from each other among gene pairs with significantpositive epistasis (left) nonsignificant epistasis (middle) and signifi-cant negative epistasis (right) are shown

Yang et al doi101093molbevmsx264 MBE

3260Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

which all three genes are localized within a 100-gene range ona chromosome An example is shown in figure 6A in whichVMA22 positively interacts with SRB2 and negatively interactswith AIM17 and the epistatic interaction between SRB2 andAIM17 is weak All six possible gene orders are enumerated infigure 6B Among them the first gene order exhibits a perfectEndashD anticorrelation (qfrac141 fig 6C) which is exactly theprediction of our model In fact it is also the real order ofthese genes on chromosome VIII The second gene ordersuccessfully places positively epistatic genes close to eachother and negatively epistatic genes far from each otherTherefore it is generally consistent with our prediction(fig 6C) Either gene order was considered as being success-fully predicted by our model if it actually occurred in the yeastgenome

We examined the accuracy of our prediction at the geno-mic scale We first divided these 22 motifs into two groupsbased on the magnitude of repistasis Group L contains 11motifs with larger repistasis and Group S contains 11 motifswith smaller repistasis (fig 6D) We found that the gene ordersof seven (out of 11) motifs in Group L were precisely pre-dicted by our model and the accuracy was significantly higherthan the random expectation (176 fig 6E Pfrac14 0001 per-mutation test) By contrast the predictive accuracy in GroupS (2 out of 11) was not significantly different from the randomexpectation (fig 6E Pfrac14 0607 permutation test) because lowvariation among epistasis values in a motif reduces the selec-tive coefficients (figs 2E and 3E) More broadly the propor-tion of successful predictions (first and second gene orders infig 6B) is 100 in Group L significantly higher than the ran-dom expectation (333 fig 6F Plt 0001 permutation test)This high predictive power suggests that epistasis plays avital role in driving the evolution of gene order We fur-ther identified 243 and 1302 all-connected motifs withinthe range of 150 and 200 genes on the same chromosomerespectively and again confirmed the predictive power ofthe genetic interaction network (supplementary fig S8AndashF Supplementary Material online) Moreover the predic-tive power of epistasis on gene order is independent ofexpression similarity because the latter could not accu-rately predict gene order (supplementary fig S9AndashISupplementary Material online)

We then determined whether gene order could be pre-dicted when the genetic interaction network is incompleteTo this end we identified 92 star-like motifs in which all genesare on the same chromosome and at least one gene withD 40 to the hub gene For example SFH1 which encodes acomponent of a chromatin remodeling complex geneticallyinteracts with 90 genes on the same chromosome (fig 6Gshows 20 genes with Dlt 100 to SFH1) Because the negativeEndashD correlation is mainly contributed by positively epistaticgene pairs in theory (figs 1 and 2) our model predicted thatgenes having strong positive epistasis with SFH1 should belocated close to it on the chromosome Indeed these genes(VRP1 FKS1 RPL26A VPS38 DCR2) are located close to SFH1Specifically the gene (VRP1) having the strongest positiveepistatic interaction with SFH1 was the closest gene toSFH1 on the chromosome (fig 6H)

To examine the predictive accuracy at the genomic scalewe divided these 92 star-like motifs into two groups based onthe difference between the top two highest epistasis values inthe motif (Diffepistasis fig 6J) We found that the proportion ofsuccessful predictions was 326 in Group L and 217 inGroup S both of which were significantly higher than randomexpectation (fig 6K Plt 0001 for Group L and Pfrac14 0001 forGroup S permutation test) Furthermore despite the lowerpredictive power our model could still predict gene locationsin 707 star-like motifs in which the closest gene to the hubwas within Dfrac14 90 (supplementary fig S8G and HSupplementary Material online Pfrac14 0002 for Group L andPfrac14 0036 for Group S permutation test)

DiscussionIn our study we provided results from both computationalsimulation and empirical data analysis to support the role ofgenetic interaction networks in driving the evolution of geneorder We performed simulations with different sets ofparameters some of which were from empirical data orwithin the range of empirical data (Schacherer et al 2009Costanzo et al 2010) The negative EndashD correlation was ob-served with all parameter sets (supplementary figs S4ndashS6Supplementary Material online) consistent with both theo-retical predictions (fig 1A) and empirical results (fig 4)

The epistasis values were estimated mainly from null alleles(Costanzo et al 2010) thus it remains unknown whether ourconclusion applies to other deleterious alleles FortunatelyCostanzo et al also estimated 7786453 pairwise epistasisvalues for either decreased abundance by mRNA perturba-tion (DAmP) alleles or temperature-sensitive (ts) alleles withmutations typically changing coding sequences which offeredus the opportunity to address this question (Costanzo et al2010 2016) We found that the epistasis values of null alleleswere significantly correlated with those of other deleteriousalleles of the same gene (supplementary fig S10AndashLSupplementary Material online) suggesting that the negativeEndashD correlation is likely a general phenomenon for variousdeleterious alleles More importantly because DNA deletionevents that lead to null alleles are frequently observed in yeastnatural populations (Schacherer et al 2009) epistasis amongnull alleles by itself may be sufficiently strong to drive theevolution of gene distance

In principle if the frequencies of deleterious alleles in apopulation are significantly reduced by natural selectionthe advantage of genetic linkage could be small for this pop-ulation However numerous deleterious mutations (eg de-letion of a whole gene nonsense mutations missensemutations and mutations altering start codons) have beenobserved in natural populations (Liti et al 2009 Schachereret al 2009) suggesting that the frequencies of deleteriousalleles may not be low This phenomenon is probably dueto the antagonistic pleiotropic effects of a mutation in mul-tiple environments (Qian et al 2012)

It is worth noting that theoretical analysis predicts thatepistasis can drive the evolution of recombination frequencyrather than gene distance In addition to gene distance 1) the

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3261Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

lengths of intergenic regions and 2) recombination hotcold spots could also affect recombination frequencyNevertheless a strong correlation between gene distanceand physical distance that is the number of nucleotidesbetween two genes was observed (qfrac14 0998 Plt 10100Nfrac14 1624935) suggesting that the effect of (1) is negligi-ble Furthermore physical distance is highly correlatedwith recombination frequency when the physical distanceislt 180 kb (supplementary fig S1C SupplementaryMaterial online qfrac14 088 Plt 10100 Nfrac14 336227)

suggesting that the effect of (2) is also limited As a resultrecombination frequency and gene distance are highlycorrelated (supplementary fig S1B SupplementaryMaterial online qfrac14 087 Plt 10100 Nfrac14 337892 forgene pairs with D 100) Because the recombination fre-quency was only measured for 58 of gene pairs inS cerevisiae (Mancera et al 2008) and was unknown forthe reconstructed ancestral species we used gene dis-tance to approximate recombination frequency in thisstudy The optimization of gene distances among

A

HK

B C D

G

E

I

F

J

FIG 6 Gene orders can be predicted from the yeast genetic interaction network (A) A three-node all-connected motif in the yeast geneticinteraction network The standard deviation of epistasis values (repistasis) in this motif is 005 (B) Six possible gene orders are listed The first is theobserved gene order on chromosome VIII in yeast (C) The EndashD relationship for the six possible gene orders in (B) We consider the predictionsuccessful if the first two gene orders are observed because they exhibit negative EndashD correlations In particular the prediction is precise if the firstgene order is observed The dashed line represents Efrac14 0 (D) A histogram of the standard deviation of epistasis values (repistasis) based on whichgenes are divided into two groups with similar sizes Group S and Group L (E) The proportions of precise prediction of Group S Group L andrandom (R) expectations based on permutation Error bars represent standard deviations among 1000 permutations (F) The proportions ofsuccessful predictions similar to (E) (GndashI) A star-like motif in the yeast genetic interaction network and the gene order on chromosome XII inyeast The hub gene SFH1 and 19 genes with the distancelt100 to SFH1 are shown (J) A histogram of differences in the two largest epistasis valuesin the motif (Diffepistasis) based on which genes are divided into two groups with similar sizes Group S and Group L (K) Proportion of successfulpredictions of the closest gene in Group S and Group L Their respective random expectations based on permutation are shown in gray Error barsrepresent standard deviations among 1000 permutations

Yang et al doi101093molbevmsx264 MBE

3262Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

multiple gene pairs may often involve the rearrangementof genes on a chromosome

It is also worth noting that synthetic genetic array (SGA)the experimental strategy used to generate double mutants isbased on recombination between two null alleles (Tong et al2001 Costanzo et al 2010) Thus double mutants for linkedgenes may have a smaller initial frequency which may lead toinaccuracy in estimating fitness values for these mutants Todetermine whether such potential experimental bias has anyimpact on our results we calculated the partial EndashD correla-tion after controlling for the double mutant fitness valuesAgain we observed a strong negative EndashD correlation (partialqfrac14017 Pfrac14 13 109 Nfrac14 1254) This result was notunexpected because the fitness of double mutants andgene distance was only weakly correlated (qfrac14 0086Pfrac14 0002 Nfrac14 1254) More discussion on the caveats inthe analysis of empirical data is included in supplementarynote S2 Supplementary Material online

In a recently study 1800 genetic suppression interac-tions were identified in the budding yeast (van Leeuwen et al2016) which can be regarded as an extreme form of positivegenetic interactions Among them nine gene pairs are tightlylinked (D 2 supplementary table S3 SupplementaryMaterial online) This number is significantly greater thanthe random expectation (permutation test Pfrac14 001) againsupporting our model

The selection on clusters of locally adapted alleles whichsometimes were captured in chromosomal inversions hasbeen studied previously (Yeaman 2013 Kirkpatrick 2017) Inour study we proposed a hypothesis (the negative EndashD cor-relation) on deleterious alleles and tested this hypothesis withanalyses on empirical data and simulation both in the con-text of genetic interaction networks Therefore our resultsprovide additional clues for understanding the basic princi-ples of genome organization In particular the origin andmaintenance of clusters of genes in the same metabolic path-way which have puzzled scientists for years (Wong and Wolfe2005 Slot and Rokas 2010 Lang and Botstein 2011) could bewell explained by genetic interactions Genes in a linear met-abolic pathway exhibit positive epistasis because 1) doublemutants and single mutants have identical impacts ondestroying the function of the pathway (He et al 2010)and 2) the accumulation of deleterious intermediate prod-ucts resulting from a loss-of-function mutation in a down-stream gene of a metabolic pathway may be prevented by aloss-of-function mutation in an upstream gene (Wong andWolfe 2005 Slot and Rokas 2010 Lang and Botstein 2011)Because genetic linkage is advantageous among genes withpositive epistasis the clustering of genes in the same linearmetabolic pathway such as genes in the galactose utilizationpathway or allantoin degradation pathway is favored by nat-ural selection (Wong and Wolfe 2005 Slot and Rokas 2010)

Many factors have been reported to influence the evolu-tion of gene order such as tandem gene duplication positioneffects on gene expression noise (Batada and Hurst 2007Chen and Zhang 2016) coordinated gene expression amongneighboring genes (Cho et al 1998 Cohen et al 2000Boutanaev et al 2002 Spellman and Rubin 2002

Williams and Bowles 2004) clustering of functionally relatedgenes (Wong and Wolfe 2005 Slot and Rokas 2010) amongmany others In this study we provided evidence that inaddition to these factors the genetic interaction networkalso played an important role in driving the evolution ofgene order Because the empirical data of epistasis (fitnessvalues) are available in the budding yeast an evolutionarysimulation integrating many genetic interactions is possiblewhich makes epistasis unique among all factors driving theevolution of gene order Based on the empirical data of yeastgenetic interaction network our simulation indicates that theselective coefficient is on the order of 107 suggesting thatepistasis may play an important role in determining geneorder in yeast a species with a relative large effective popu-lation size (107) A negative EndashD correlation is expected to beobserved in species with a smaller effective population sizeonly if the range of epistasis is larger than that in yeast (figs 2and 3) Because the genome-wide empirical data of epistasisare unavailable in species with a smaller effective populationsize (eg humans and flies) it requires further investigationswhether the genetic interaction network plays a role in theevolution of gene order in these species in the future

Materials and Methods

GenomesThe genome annotation of S cerevisiae was downloaded fromthe Saccharomyces Genome Database (SGD httpwwwyeastgenomeorg version R64) The gene order of the recon-structed ancestor before the WGD (Gordon et al 2009) wasdownloaded from the Yeast Gene Order Browser (YGOB)(Byrne and Wolfe 2005)

SimulationTo examine the fitness effect of gene distance (D) two-locusdynamics under selection were simulated for populationswith different D (from 0 to 100 in 1-gene increments) aswell as a series of epistasis values (E from 0004 to 0004in 0001 increments) between two loci The recombinationfrequency (R) between these two loci was estimated from Dusing the equation below

R frac14 D 0004thorn 0064

which is the linear relationship estimated from the distancebetween two genes (D) and the empirical recombinationfrequencies (R) between them quantified in a previous study(Mancera et al 2008) The simulation was performed in hap-loid organisms in order to be in alignment with the analysesof empirical data in which epistasis values were quantified inhaploid yeast (Costanzo et al 2010 2016) A and B are wild-type alleles of two di-allelic loci on the same chromosome aand b are their deleterious alleles with fitness values ofxaBfrac14xAbfrac14 0992 The fitness of the wild-type (xAB) wasdefined as 1 and the epistasis E was defined as xabxaBxAbThe initial allele frequencies of a and b were both 01 The twoloci were initially under linkage equilibrium and therefore thefrequencies of AB(XAB) Ab(XAb) aB(XaB) and ab(Xab) were081 009 009 and 001 respectively After random mating

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3263Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

selection and recombination the frequencies of the fourgenotypes in the next generation were calculated with thefollowing equations (Nei 1967)

X0

AB frac14XABxAB

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

Ab frac14XAbxAb

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

aB frac14XaBxaB

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

ab frac14Xabxab

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

x is the average fitness of a population and was calculatedas follows

x frac14 XABxAB thorn Xabxab thorn XaBxaB thorn XAbxAb

For each population the average fitness (x) and thefrequencies of deleterious alleles (Xa and Xb) wererecorded in each generation of the simulation The differ-ence between the average fitness of two populations withdifferent D (Dfrac14 50 and Dfrac14 0) reflects whether linkage isfavored by natural selection or not The frequency differ-ence of the deleterious alleles between these two popu-lations is related to the long-term effects ofrecombination We also performed the simulation witha variety of parameter values (supplementary fig S4Supplementary Material online)

The average fitness of a population at the 100th generation(x100) was used to infer the fitness effect of gene distance at agiven epistasis value For each epistasis value we could iden-tify an optimal gene distance (Dopt) as well as a series ofpermitted gene distances (Dpermitted) The difference in x100

between Dpermitted and Dopt was lt107 The mean of allDpermitted values was calculated In addition our simulationdid not introduce new mutations to the population andtherefore the frequencies of deleterious alleles reduced overgenerations Nevertheless the negative EndashD correlation wasalso observed at the 50th and 200th generation (supplemen-tary fig S3 Supplementary Material online)

Toy MotifsWe built a toy motif in a genetic interaction network tosimulate the fitness effect of gene distance In the star-likemotif (fig 2A) a hub gene (A) interacted with nine partners(genes B C D E F G H I and J) with nine epistasis valuesranging from 0004 to 0004 in 0001 increments Weattempted to place nine partner genes on the same side ofthe hub gene on a chromosome Because we focused on theepistasis between the hub gene and the partners placingthe partners on different sides would not affect the resultsWe fixed the chromosomal location of the hub gene and thepartner genes were placed at Dfrac14 20 30 40 50 60 70 80 90and 100 from the hub gene The locations of the nine partnerswere shuffled while keeping the epistasis values unchanged

The fitness of a gene order was defined as the average x100 forall nine hub-containing gene pairs Thus we could calculatethe fitness for a total of (9frac14) 362880 possible gene ordersOther parameters (epistasis fitness defect D and initial allelefrequency) were also used to test whether the simulationresult was parameter sensitive (supplementary fig S5Supplementary Material online) Here epistasis values andfitness defects were randomly chosen from the empiricaldata in previous studies (Costanzo et al 2010 2016)

For the all-connected motif (fig 3A) five genes (A B C Dand E) interacted with each other with ten epistasis valuesranging from 00036 to 00036 in 00008 increments Thefive genes were placed on the same chromosome and thegene locations were set as 1 19 65 93 and 102 so that themaximum D in the motif was 100 The locations of these fivegenes were shuffled whereas the epistasis values were keptunchanged Similar to the star-like motifs the fitness of a geneorder was defined as the average x100 for all ten gene pairsThe fitness was calculated for a total of (5frac14) 120 possiblegene orders Other parameters (epistasis fitness defect D andinitial allele frequency) were also used to determine whetherthe simulation result was parameter sensitive (supplementaryfig S6 Supplementary Material online)

EpistasisEpistasis values were retrieved from the studies of Costanzoet al (Costanzo et al 2010 2016) Because the epistasis valuesof the overlapping gene pairs in these studies were highlycorrelated (rfrac14 071 Plt 10100 Pearsonrsquos correlationNfrac14 2604539) we merged their epistasis values A pair ofgenes was filtered if (a) the epistasis value between theirnull mutations was not a number (NaN) or (b) the epistasisvalues of the same null mutation pair had opposite signs inreciprocal crosses or in different studies We also removed thegene pairs with the null mutation in at least one gene result-ing in a higher fitness value than that of the wild-type Thisfiltering was performed because only a small number of an-tagonistic pleiotropic genes were detected in rich media(Qian et al 2012) Thus the elevated fitness observed upongene deletion in Costanzo et al was likely due to inaccurateestimation of fitness If epistasis between a pair of genes wasexamined multiple times we used the epistasis value with thesmallest P value Following Costanzo et al (2010) epistasisvalues with Plt 005 were classified as significant

In addition to the two data sets from Costanzo et al sev-eral other studies have also estimated epistasis values usinghigh-throughput strategies (Boone et al 2007) In principlewe could have included all of them in this study However forpractical reasons these data sets were not suitable for ourstudy First it would not be appropriate to combine otherdata sets with that of Costanzo et al because epistasis valuesfrom multiple studies potentially have different sources oferror especially if they followed different protocols Secondthe sample sizes of epistasis data of linked genes from otherstudies were not sufficiently large to examine the EndashD corre-lation Because the data set from Costanzo et al representsthe largest available so far containing 27 times more

Yang et al doi101093molbevmsx264 MBE

3264Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 2: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

(xab) and the multiplicative expectation from those of twosingle mutants (xaBxAb) Previous theories could be summa-rized as two effects the short-term effect and the long-termeffect (fig 1A) On the one hand the linkage of geneticallyinteracting genes (regardless of sign) is advantageous in theshort term because it helps to maintain the epistasis-inducedlinkage disequilibrium (LD) which is favored by natural selec-tion (Nei 1967 1969 Eshel and Feldman 1970 Feldman et al1980 Kouyos et al 2007 Charlesworth and Charlesworth2011) (fig 1A the solid line supplementary note S1Supplementary Material online) Here the coefficient of LDis defined as the difference between the genotype frequencyof the wild-type individuals (XAB) and the multiplicative ex-pectation from the frequencies of the wild-type alleles (XAXB)On the other hand for two deleterious mutations geneticrecombination breaks the negative LD induced by negativeepistasis and thus increases the proportion of doublemutants This increase benefits the population in the longterm by facilitating the purge of deleterious mutations whichhas been suggested to be related to the origin of sexual re-production (Feldman et al 1980 Kondrashov 1982Charlesworth 1990) (fig 1A the dash line)

To summarize genetic recombination is never favored bynatural selection for positively epistatic gene pairs whereasfor negatively epistatic gene pairs the long- and short-termeffects of genetic recombination counteract each other(fig 1A) Therefore epistasis could drive the evolution of re-combination frequencies among genes on the same chromo-some potentially by altering gene order In this process anegative correlation between epistasis and gene distancehereafter referred to as EndashD correlation evolvesFurthermore this correlation would be especially strongamong positively epistatic gene pairs due to the synergisticcombination of the long- and short-term effects (fig 1A)However these theoretical predictions have never been rig-orously tested with empirical data In this study we testedthese predictions with both simulation and yeast empiricalepistasis data Our study thus reveals a basic principle in theevolution of gene order and enhances our power to decodeinformation from well-constructed genetic interaction net-works and thousands of sequenced genomes

Results

Negative EndashD Correlation Is Observed during in silicoEvolutionWe first performed in silico evolution in which two genes (Aand B) were considered each having a wild-type allele (A or B)and a deleterious allele (a or b) The relative fitness of thehaploid wild-type genotype (xAB) was defined as 1 and theepistasis (E) was defined as xabxaBxAb For two genotypeswith different gene distances (D) between A and B gene flowwithin this region is strictly prohibited because recombina-tion within the region between A and B leads to the gain orloss of genes after segregation (Wu and Ting 2004)Furthermore the ldquomodifierrdquo locus of recombination fre-quency is completely linked with A and B (Nei 1967 1969)making it possible to directly compare the fitness of

genotypes with different gene distances Therefore to inves-tigate the impact of gene distance on fitness we comparedthe average fitness of two populations over generations onewith D between A and B equal to 50 and the other withD equal to 0 Based on the empirical data from the buddingyeast S cerevisiae (Mancera et al 2008) these D values cor-respond to recombination frequencies Rfrac14 0264 and 0064respectively (supplementary fig S1 Supplementary Materialonline) In each generation we calculated the frequencychanges of genotypes by considering both natural selectionand genetic recombination Figure 1B shows the results of thefirst 100 generations of in silico evolution when long-termeffects begin to dominate the evolutionary process If epistasisbetween A and B was positive the population with Dfrac14 50was always outcompeted by the population with Dfrac14 0(fig 1B xDfrac14 50xDfrac14 0lt 0) Furthermore the fitness differ-ence increased with the magnitude of the epistasis value(fig 1B) In other words reduced D between two geneswith positive epistasis is favored by natural selection By con-trast if the epistasis between A and B was negative the pop-ulation with Dfrac14 50 exhibited a short-term disadvantagefollowed by a long-term advantage compared with the pop-ulation with Dfrac14 0 (fig 1B) As expected the long-term ad-vantage was due to an elevated purging rate of deleteriousalleles [fig 1C Xa (Dfrac14 50) Xa (Dfrac14 0)lt 0] A similar trend wasobserved when we compared a population with Dfrac14 50 andone with Dfrac14 100 (Rfrac14 0464 supplementary fig S2Supplementary Material online)

We also performed in silico evolution in a series of strainsin which D varied between 0 and 100 (Rfrac14 0064 and 0464respectively) We recorded the average fitness of each popu-lation at the 100th generation (x100) and identified the op-timal D Dopt for each epistasis value (fig 1D) Given that theeffective population size (Ne) in the budding yeast is 107

(Wagner 2005) the minimal selection coefficient that can bedetected for yeast is 107 In other words all D values thatreduce the relative fitness by lt107 are permitted duringevolution We calculated the mean of all permitted D values(dashed line in fig 1D) As predicted in our model a strongnegative EndashD correlation was observed from the results of insilico evolution Importantly such negative EndashD correlationwas also observed at the 50th and 200th generation (supple-mentary fig S3 Supplementary Material online) To furthertest whether the outcome of in silico evolution was sensitiveto population genetics parameters we examined various val-ues for initial allele frequencies and fitness defects We ob-served a negative EndashD correlation with all parameter sets(supplementary fig S4 Supplementary Material online)

Chromosomal Arrangement of Genes in Star-LikeMotifs of Genetic Interaction NetworksThe analyses we have described so far were based on two-locus processes In reality however a gene may have geneticinteractions with multiple genes that together form a com-plex genetic interaction network (Boone et al 2007) In sharpcontrast to the network topology genes are linearly alignedon a limited number of chromosomes and therefore theoptimization of pairwise gene distances may be restricted

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3255Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

by the chromosomal localization of other genes Thus itremains unknown whether the negative EndashD correlationwould evolve in the context of a highly connected networkof genetic interactions

We first examined the impact of epistasis on the chromo-somal order of genes in a star-like motif which is typical inempirical genetic interaction networks (Costanzo et al 2010)To this end we built a toy motif in which a hub gene interactswith nine partner genes with different epistasis values rangingfrom 0004 to 0004 (fig 2A) We fixed the chromosomallocation of the hub gene and attempted to place partnergenes on the same chromosome We focused on the epistasisand gene distance of hub-containing gene pairs and there-fore we placed all partner genes on the same side of the hubgene on the chromosome for convenience We calculatedx100 for each of the total (9frac14) 362880 possible gene ordersand found that x100 varied among them (fig 2B)Importantly the gene order with the highest x100 showeda perfect negative EndashD correlation (qEDfrac141 fig 2C)whereas the gene order with the lowest x100 showed a per-fect positive EndashD correlation (qEDfrac14 1 fig 2D) In fact we

found that x100 was negatively correlated with qED (fig 2Bqfrac14094 Plt 10100 Spearmanrsquos correlation) implyingthat the negative EndashD correlation itself is under natural se-lection To understand whether the negative correlation be-tween x100 and qED is still present under the parametersderived from empirical data we randomly chose epistasisvalues and fitness defects from two genome-wide studies inthe budding yeast (Costanzo et al 2010 2016) and still ob-served strong negative correlations between x100 and qED

(supplementary fig S5A and B and table S1 SupplementaryMaterial online) And we also confirmed that the negativecorrelation between x100 and qED was insensitive to D andinitial allele frequencies (supplementary fig S5C and D andtable S1 Supplementary Material online)

Next we calculated the distance (d) to the fittest geneorder shown in figure 2C which was defined as the num-ber of differently placed genes (fig 2E) for each possiblegene order We found that an increase in d reduced x100

(fig 2E) again emphasizing the impact of gene order onfitness To further investigate the impact of the range ofepistasis on x100 we generated a series of epistasis

A B

C D

FIG 1 Theoretical prediction and computational simulations of the negative EndashD correlation (A) The short- and long-term effects are in the samedirection when epistasis is positive but are in the opposite directions when epistasis is negative Thus a negative EndashD correlation can be predictedfrom the view of population genetics (B) Fitness differences between a strain with Dfrac1450 and a strain with Dfrac140 are plotted over 100 generationsduring simulations of in silico evolution (C) The difference in allele arsquos frequency (Xa) between strains with Dfrac1450 and Dfrac140 are plotted over 100generations during simulations of in silico evolution (D) The average fitness of a population at the 100th generation (x100) is plotted againstepistasis and D For each epistasis we defined all permitted D values with their resulting x100 If x100 is smaller than that of the optimal distance(Dopt) bylt 107 the minimal selective coefficient that can be detected by nature given the effective population size (Ne 107) of yeast D ispermitted The mean of all permitted D values is plotted against epistasis (dashed line)

Yang et al doi101093molbevmsx264 MBE

3256Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

ranging from 0004 to 0020 shuffled the gene order andrecalculated the average x100 of all gene orders with thesame d (fig 2E) We found that although it was always truethat higher d led to a reduction of x100 the increase in therange of epistasis values enlarged the fitness differencesamong gene orders (fig 2E)

Our model further predicted that positive epistasis shouldplay a more important role in the evolution of gene order(fig 1) such that altering the order of partner genes that havepositive epistasis with the hub gene would lead to a largerfitness reduction As expected we observed that the gene-order variants with changes exclusively to the positively

A

B

C D

E

F

FIG 2 The negative EndashD correlation in star-like motifs of genetic interaction networks (A) A toy model of a star-like motif in which gene A is thehub Gene A has positive epistasis with genes B C D and E and negative epistasis with genes G H I and J The range of epistasis is(0004 [0004]frac14) 0008 in this motif (B) Spearmanrsquos correlation coefficient between epistasis and D (qED) varies among gene orders Theaverage fitness at the 100th generation (x100) is negatively correlated with qED (C) The gene order with the highest x100 qEDfrac141 (D) The geneorder with the lowest x100 qEDfrac141 (E) The difference between a gene order and the gene order with the highest x100 (d) is defined as the number ofdifferently placed genes Two examples with dfrac14 3 are shown The heat map of the relative x100 (normalized to the highest x100) is shown Theaverage relative x100 decreases with the increase of d The reduction is more dramatic when the range of epistasis values is larger (F) Shufflingamong genes B C D and E (positive epistasis with the hub gene A) have larger impact on x100 than shuffling among genes G H I and J (negativeepistasis with the hub gene A) P values of one-tailed MannndashWhitney U test are shown The gray dashed line indicates the x100 of the optimal geneorder in panel (C)

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3257Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

epistatic genes generally had a larger reduction in fitness com-pared with those with changes exclusively to the negativelyepistatic genes (fig 2F)

Chromosomal Arrangement of Genes inAll-Connected Motifs of Genetic Interaction NetworksWe further examined the negative correlation between x100

and qED in all-connected motifs To this end we built a toy all-connected motif with five nodes and assigned epistasis valuesin the range of00036 to 00036 to edges (fig 3A) Again weobserved a strong negative correlation between x100 and qED

(fig 3B qfrac14087 Pfrac14 32 1038) The gene order with thehighest fitness had a strong negative EndashD correlation(qEDfrac14100 fig 3C) whereas the gene order with the lowestfitness had a positive EndashD correlation (qEDfrac14 039 fig 3D)Similarly we confirmed that the negative correlation betweenx100 and qED was insensitive to epistasis values fitnessdefects gene distances and initial allele frequencies (supple-mentary fig S6 and table S2 Supplementary Material online)Furthermore we observed that the fitness of a gene orderdecreased with the increase in its d (fig 3E) and this trend wasstronger when the range of epistasis was larger (fig 3E)

Negative EndashD Correlation in S cerevisiaeTo investigate whether the negative EndashD correlation is sup-ported by empirical evidence we retrieved the pairwise epis-tasis data generated by Costanzo et al who systematicallymeasured the vegetative growth rates of both single and dou-ble mutants in the budding yeast S cerevisiae and estimatedepistasis values for 26 million gene pairs (Costanzo et al2010 2016) As expected we observed a significant negativeEndashD correlation among linked genes (fig 4A qfrac14015Pfrac14 40 108 Nfrac14 1254) Consistent with this trend un-linked genes on the same chromosome exhibited lower epis-tasis values (fig 4A gray dashed line) As a control wepermutated the gene orders and recalculated the correlationcoefficients 1000 times We found that the negative EndashDcorrelation disappeared after permutation (fig 4BPlt 0001 permutation test)

These observations are potentially attributable to a num-ber of confounding factors The first is mutational bias suchas tandem duplication However duplicate genes tend tohave negative epistasis (Tischler et al 2006 Dean et al2008 DeLuna et al 2008 Musso et al 2008 Vavouri et al2008 Qian et al 2010) which should result in a positive EndashDcorrelation Nevertheless we controlled for this mutationalbias by randomly keeping only one gene in a gene family andstill observed the negative EndashD correlation (supplementaryfig S7A Supplementary Material online qfrac14017Pfrac14 18 105 Nfrac14 641)

Second genes with coordinated expression are clustered(Cho et al 1998 Cohen et al 2000 Boutanaev et al 2002Spellman and Rubin 2002 Williams and Bowles 2004) Ifcoordinately expressed genes tend to have positive epistasisthe negative EndashD correlation could result from these genesTo control for this effect we first inferred expression patternsimilarity for each pair of genes by calculating thecorrelation of gene expression levels in multiple conditions

(Qian and Zhang 2014) We did not observe a significantcorrelation between epistasis and expression similarity(qfrac140015 Pfrac14 06 Nfrac14 1202) Nevertheless we dividedthese gene pairs into two groups according to expressionsimilarity recalculated the EndashD correlation within each groupand still observed significant negative EndashD correlations (sup-plementary fig S7B and C Supplementary Material online)Similar results were obtained when we calculated the partialEndashD correlation after controlling for expression similarity(partial qfrac14014 Pfrac14 66 107 Nfrac14 1202) In additionwe also found that coordinated gene expression occurringthrough 3D chromatin interactions did not confound ourresults (supplementary fig S7D Supplementary Material on-line qfrac14017 Pfrac14 87 106 Nfrac14 704) which was not un-expected as 3D chromatin interactions do not influencerecombination frequency

Because functionally related genes are nonrandomlydistributed on chromosomes (Wong and Wolfe 2005

A

B

C

D

E

FIG 3 The negative EndashD correlation in all-connected motifs of ge-netic interaction networks (A) A toy model of an all-connected mo-tif (B) The fitness at the 100th generation (x100) is negativelycorrelated with qED (C) The gene order with the highest x100qEDfrac14100 (D) The gene order with the lowest x100 qEDfrac14039(E) The heat map of the relative x100 (normalized to the highestx100) is shown The relative x100 decreases with the increase of dThe reduction is more dramatic when the range of epistasis is larger

Yang et al doi101093molbevmsx264 MBE

3258Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Slot and Rokas 2010) and functional relationships betweengenes may lead to epistasis we next examined whetherfunctional relationships could confound the negativeEndashD correlation We observed similar negative EndashD corre-lations for gene pairs with an either high- or low semanticsimilarity of GO terms in molecular functions (supplemen-tary fig S7E and F Supplementary Material online) biolog-ical processes (supplementary fig S7G and HSupplementary Material online) and cellular components(supplementary fig S7I and J Supplementary Material on-line) Again we calculated partial correlations controllingfor semantic similarity of GO terms in molecular functions(partial qfrac14015 Pfrac14 73 107 Nfrac14 1151) biologicalprocesses (partial qfrac14015 Pfrac14 32 107 Nfrac14 1151)and cellular components (partial qfrac14015Pfrac14 32 107 Nfrac14 1151) All these results indicate thatthe functional relationship is not a confounding factor inthe negative EndashD correlation which is not unexpectedgiven that a large fraction of genetic interactions do notreflect functional relationships (He et al 2010 Costanzoet al 2016)

Finally we investigated the impact of gene expressionnoise on the negative EndashD correlation because it has beenproposed that essential genes were colocalized in open chro-matin regions to reduce gene expression noise (Batada andHurst 2007 Chen and Zhang 2016) To control for this effectwe first calculated the average gene expression noise(Newman et al 2006) for each gene pair We used the dis-tance of each coefficient of variation (CV) to a running me-dian of CV values (DM) to quantify gene expression noise(Newman et al 2006) in order to minimize the effect of geneexpression magnitude on gene expression noise We observeda negative EndashD correlation for gene pairs with an either high-or low average DM (supplementary fig S7K and LSupplementary Material online) Again we calculated thepartial correlation controlling for gene expression noise andstill observed a negative EndashD correlation (partial qfrac14016Pfrac14 97 103 Nfrac14 258)

Positive Epistasis Plays an Important Role in the Originof the Negative EndashD CorrelationOur model further predicted that the reduction of the dis-tance between positively epistatic genes should play a moreimportant role in the formation of the negative EndashD correla-tion (figs 1 and 2) Indeed a significant negative EndashD corre-lation was observed among positively epistatic gene pairs inS cerevisiae (fig 4C qfrac14014 Pfrac14 00064 Nfrac14 391) whereasno significant correlation was observed among negatively ep-istatic gene pairs (fig 4D qfrac140025 Pfrac14 047 Nfrac14 863) Wefurther verified the role of positive epistasis by shuffling epis-tasis values among all 391 positively epistatic gene pairs inS cerevisiae As expected the EndashD correlation was significantlyweakened after the permutation (fig 4E Pfrac14 0005 one-tailedpermutation test) By contrast no significant difference wasobserved after shuffling negative epistasis values (fig 4FPfrac14 0722 one-tailed permutation test) even though the lat-ter analysis shuffled more gene pairs (Nfrac14 863)

A B

C D

E F

G H

FIG 4 A negative EndashD correlation is observed in the empirical geneticinteraction network of the budding yeast S cerevisiae and positiveepistasis plays a more important role in its formation (A) A significantnegative EndashD correlation is observed in S cerevisiae Gene pairs areseparated into bins based on D with equal width of five genes Themean value of epistasis and the standard error of the mean (SEM)within each bin are shown Spearmanrsquos correlation coefficient q andcorresponding P values were calculated from the raw data (Nfrac141254)The gray dashed line shows the average epistasis among unlinkedgenes (Dgt100) (B) Distribution of EndashD correlation coefficients in1000 shuffled genomes The arrow indicates the observed correlationcoefficient in S cerevisiae (C and D) A significant negative EndashD cor-relation is observed among positively epistatic gene pairs (Nfrac14391)but not among negatively epistatic gene pairs (Nfrac14863) Spearmanrsquoscorrelation coefficient q and the corresponding P values are calcu-lated from the raw data The dashed line shows the average epistasisamong unlinked genes (E) The distribution of correlation coefficientsin 1000 artificial genomes in which values of positive epistasis areshuffled The arrow indicates the EndashD correlation coefficient in reality(F) Similar to (E) values of negative epistasis are shuffled (G and H)The proportion of gene pairs with significant positive epistasis is sig-nificantly correlated with D but that with significant negative epis-tasis is not SEMs are estimated based on binomial distribution Thedashed lines show the proportion of gene pairs with significant pos-itive or negative epistasis among unlinked genes

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3259Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

In studies by Costanzo et al epistasis was classified intothree categories significantly positive significantly negativeand nonsignificant (Costanzo et al 2010 2016) A strongnegative correlation was observed between gene distanceand the proportion of gene pairs with significant positiveepistasis (fig 4G qfrac14082 Pfrac14 55 105) whereas no sig-nificant correlation was observed between gene distance andthe proportion of gene pairs with significant negative epistasis(fig 4H qfrac14 043 Pfrac14 0088) All these observations empha-size the important role of positive epistasis in the origin of thenegative EndashD correlation

Epistasis-Driven Evolution of Gene Order after theWhole Genome Duplication in YeastThus far we have observed a negative EndashD correlation anddemonstrated that the correlation was mainly attributable topositively epistatic gene pairs The aforementioned theoriessimulations and empirical evidence led us to propose a hy-pothesis of epistasis-driven evolution of gene order in S cer-evisiae The whole genome duplication (WGD fig 5A) andthe subsequent extensive gene losses substantially changedthe epistatic relations among genes and markedly rewired thegenetic interaction network in yeast (Kellis et al 2004 Dixonet al 2008 Tischler et al 2008 VanderSluis et al 2010) Forinstance only 29 of the identified synthetic lethality is con-served between Schizosaccharomyces pombe and S cerevisiae(Dixon et al 2008) At the same time extensive chromosomerearrangement events occurred For example in areconstructed ancestral species before the WGD (Byrne andWolfe 2005) we identified the gene pairs that were located onthe same chromosome and surprisingly 846 of them arelocalized on different chromosomes in S cerevisiae Morestrikingly among gene pairs that are localized on the samechromosome in both the ancestral species and S cerevisiae996 differ in gene distance Based on these observations weproposed that the rewired genetic interaction network drovethe evolution of gene order resulting in numerous chromo-some rearrangement events (Kellis et al 2004) When genelosses ceased the rewiring of genetic interactions slowed andthe evolutionary force on gene distance also diminishedConsistently synteny relationships are strongly conserved inthe species of the Saccharomyces sensu stricto group (Kelliset al 2003)

Our model predicted that the negative EndashD correlationshould be weaker if the gene order in S cerevisiae has beenunchanged since the WGD The reason is that the gene orderin the ancestor was not subject to the natural selection im-posed by the genetic interaction network of the currentS cerevisiae genome Furthermore given the massive genelosses after the WGD the genetic interaction network cannotbe 100 conserved To test this prediction we calculated thegene distances in the reconstructed ancestral species men-tioned earlier (Byrne and Wolfe 2005) Indeed we found thatthe negative EndashD correlation disappeared when the genedistances in S cerevisiae were replaced by those in the ances-tral species (fig 5B qfrac14 69 103 Pfrac14 026 Nfrac14 26630) Thisobservation indicates that the negative EndashD correlation inS cerevisiae formed during the evolution of gene order after

the WGD Consistently we observed that positively epistaticgene pairs decreased their distances whereas negatively epi-static gene pairs increased their distances during evolution(fig 5C qfrac14017 Pfrac14 0061 Nfrac14 127)

To further test the role of positive epistasis in the evolutionof gene order we identified genes that were ancestrally linked(ie D 100 in the reconstructed ancestor) and examinedwhether they moved toward or away from each other duringevolution Consistent with our model genes with significantpositive epistasis were more likely to move toward each otherthan genes without significant epistasis (fig 5DPfrac14 15 103 two-tailed Fisherrsquos exact test) whereas thedifference between gene pairs with significant negative epis-tasis and those without significant epistasis was not signifi-cant (fig 5D Pfrac14 041 two-tailed Fisherrsquos exact test) Togetherthese observations support our hypothesis of epistasis-drivenevolution of gene order in yeast

Genetic Interaction Network Accurately PredictsGene Order in YeastFinally we determined whether the gene order in S cerevisiaecould be successfully predicted by the empirical data of ge-netic interaction networks (Costanzo et al 2010 2016) Tothis end we identified 22 all-connected three-node motifs in

A B

C D

FIG 5 The origin of the negative EndashD correlation after the WGD inyeast (A) Phylogenetic relationship among yeast species The blackarrow indicates the reconstructed ancestor and the gray arrow indi-cates the WGD event (B) Negative EndashD correlation is not observedwhen the gene order in S cerevisiae is replaced by that in the recon-structed ancestor Gene pairs are separated into bins based on D withequal width of five genes The mean and SEM of epistasis within eachbin are shown Spearmanrsquos correlation coefficient q and the corre-sponding P values were calculated from the raw data (Nfrac14 26630)The dashed line shows the average epistasis among unlinked genes(C) The change in D (S cerevisiaemdashthe reconstructed ancestor) isnegatively correlated with the epistasis in S cerevisiae (Nfrac14 127)(D) The D between a gene pair in S cerevisiae is compared withthat in the reconstructed ancestor Proportions of gene pairs movingtoward and away from each other among gene pairs with significantpositive epistasis (left) nonsignificant epistasis (middle) and signifi-cant negative epistasis (right) are shown

Yang et al doi101093molbevmsx264 MBE

3260Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

which all three genes are localized within a 100-gene range ona chromosome An example is shown in figure 6A in whichVMA22 positively interacts with SRB2 and negatively interactswith AIM17 and the epistatic interaction between SRB2 andAIM17 is weak All six possible gene orders are enumerated infigure 6B Among them the first gene order exhibits a perfectEndashD anticorrelation (qfrac141 fig 6C) which is exactly theprediction of our model In fact it is also the real order ofthese genes on chromosome VIII The second gene ordersuccessfully places positively epistatic genes close to eachother and negatively epistatic genes far from each otherTherefore it is generally consistent with our prediction(fig 6C) Either gene order was considered as being success-fully predicted by our model if it actually occurred in the yeastgenome

We examined the accuracy of our prediction at the geno-mic scale We first divided these 22 motifs into two groupsbased on the magnitude of repistasis Group L contains 11motifs with larger repistasis and Group S contains 11 motifswith smaller repistasis (fig 6D) We found that the gene ordersof seven (out of 11) motifs in Group L were precisely pre-dicted by our model and the accuracy was significantly higherthan the random expectation (176 fig 6E Pfrac14 0001 per-mutation test) By contrast the predictive accuracy in GroupS (2 out of 11) was not significantly different from the randomexpectation (fig 6E Pfrac14 0607 permutation test) because lowvariation among epistasis values in a motif reduces the selec-tive coefficients (figs 2E and 3E) More broadly the propor-tion of successful predictions (first and second gene orders infig 6B) is 100 in Group L significantly higher than the ran-dom expectation (333 fig 6F Plt 0001 permutation test)This high predictive power suggests that epistasis plays avital role in driving the evolution of gene order We fur-ther identified 243 and 1302 all-connected motifs withinthe range of 150 and 200 genes on the same chromosomerespectively and again confirmed the predictive power ofthe genetic interaction network (supplementary fig S8AndashF Supplementary Material online) Moreover the predic-tive power of epistasis on gene order is independent ofexpression similarity because the latter could not accu-rately predict gene order (supplementary fig S9AndashISupplementary Material online)

We then determined whether gene order could be pre-dicted when the genetic interaction network is incompleteTo this end we identified 92 star-like motifs in which all genesare on the same chromosome and at least one gene withD 40 to the hub gene For example SFH1 which encodes acomponent of a chromatin remodeling complex geneticallyinteracts with 90 genes on the same chromosome (fig 6Gshows 20 genes with Dlt 100 to SFH1) Because the negativeEndashD correlation is mainly contributed by positively epistaticgene pairs in theory (figs 1 and 2) our model predicted thatgenes having strong positive epistasis with SFH1 should belocated close to it on the chromosome Indeed these genes(VRP1 FKS1 RPL26A VPS38 DCR2) are located close to SFH1Specifically the gene (VRP1) having the strongest positiveepistatic interaction with SFH1 was the closest gene toSFH1 on the chromosome (fig 6H)

To examine the predictive accuracy at the genomic scalewe divided these 92 star-like motifs into two groups based onthe difference between the top two highest epistasis values inthe motif (Diffepistasis fig 6J) We found that the proportion ofsuccessful predictions was 326 in Group L and 217 inGroup S both of which were significantly higher than randomexpectation (fig 6K Plt 0001 for Group L and Pfrac14 0001 forGroup S permutation test) Furthermore despite the lowerpredictive power our model could still predict gene locationsin 707 star-like motifs in which the closest gene to the hubwas within Dfrac14 90 (supplementary fig S8G and HSupplementary Material online Pfrac14 0002 for Group L andPfrac14 0036 for Group S permutation test)

DiscussionIn our study we provided results from both computationalsimulation and empirical data analysis to support the role ofgenetic interaction networks in driving the evolution of geneorder We performed simulations with different sets ofparameters some of which were from empirical data orwithin the range of empirical data (Schacherer et al 2009Costanzo et al 2010) The negative EndashD correlation was ob-served with all parameter sets (supplementary figs S4ndashS6Supplementary Material online) consistent with both theo-retical predictions (fig 1A) and empirical results (fig 4)

The epistasis values were estimated mainly from null alleles(Costanzo et al 2010) thus it remains unknown whether ourconclusion applies to other deleterious alleles FortunatelyCostanzo et al also estimated 7786453 pairwise epistasisvalues for either decreased abundance by mRNA perturba-tion (DAmP) alleles or temperature-sensitive (ts) alleles withmutations typically changing coding sequences which offeredus the opportunity to address this question (Costanzo et al2010 2016) We found that the epistasis values of null alleleswere significantly correlated with those of other deleteriousalleles of the same gene (supplementary fig S10AndashLSupplementary Material online) suggesting that the negativeEndashD correlation is likely a general phenomenon for variousdeleterious alleles More importantly because DNA deletionevents that lead to null alleles are frequently observed in yeastnatural populations (Schacherer et al 2009) epistasis amongnull alleles by itself may be sufficiently strong to drive theevolution of gene distance

In principle if the frequencies of deleterious alleles in apopulation are significantly reduced by natural selectionthe advantage of genetic linkage could be small for this pop-ulation However numerous deleterious mutations (eg de-letion of a whole gene nonsense mutations missensemutations and mutations altering start codons) have beenobserved in natural populations (Liti et al 2009 Schachereret al 2009) suggesting that the frequencies of deleteriousalleles may not be low This phenomenon is probably dueto the antagonistic pleiotropic effects of a mutation in mul-tiple environments (Qian et al 2012)

It is worth noting that theoretical analysis predicts thatepistasis can drive the evolution of recombination frequencyrather than gene distance In addition to gene distance 1) the

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3261Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

lengths of intergenic regions and 2) recombination hotcold spots could also affect recombination frequencyNevertheless a strong correlation between gene distanceand physical distance that is the number of nucleotidesbetween two genes was observed (qfrac14 0998 Plt 10100Nfrac14 1624935) suggesting that the effect of (1) is negligi-ble Furthermore physical distance is highly correlatedwith recombination frequency when the physical distanceislt 180 kb (supplementary fig S1C SupplementaryMaterial online qfrac14 088 Plt 10100 Nfrac14 336227)

suggesting that the effect of (2) is also limited As a resultrecombination frequency and gene distance are highlycorrelated (supplementary fig S1B SupplementaryMaterial online qfrac14 087 Plt 10100 Nfrac14 337892 forgene pairs with D 100) Because the recombination fre-quency was only measured for 58 of gene pairs inS cerevisiae (Mancera et al 2008) and was unknown forthe reconstructed ancestral species we used gene dis-tance to approximate recombination frequency in thisstudy The optimization of gene distances among

A

HK

B C D

G

E

I

F

J

FIG 6 Gene orders can be predicted from the yeast genetic interaction network (A) A three-node all-connected motif in the yeast geneticinteraction network The standard deviation of epistasis values (repistasis) in this motif is 005 (B) Six possible gene orders are listed The first is theobserved gene order on chromosome VIII in yeast (C) The EndashD relationship for the six possible gene orders in (B) We consider the predictionsuccessful if the first two gene orders are observed because they exhibit negative EndashD correlations In particular the prediction is precise if the firstgene order is observed The dashed line represents Efrac14 0 (D) A histogram of the standard deviation of epistasis values (repistasis) based on whichgenes are divided into two groups with similar sizes Group S and Group L (E) The proportions of precise prediction of Group S Group L andrandom (R) expectations based on permutation Error bars represent standard deviations among 1000 permutations (F) The proportions ofsuccessful predictions similar to (E) (GndashI) A star-like motif in the yeast genetic interaction network and the gene order on chromosome XII inyeast The hub gene SFH1 and 19 genes with the distancelt100 to SFH1 are shown (J) A histogram of differences in the two largest epistasis valuesin the motif (Diffepistasis) based on which genes are divided into two groups with similar sizes Group S and Group L (K) Proportion of successfulpredictions of the closest gene in Group S and Group L Their respective random expectations based on permutation are shown in gray Error barsrepresent standard deviations among 1000 permutations

Yang et al doi101093molbevmsx264 MBE

3262Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

multiple gene pairs may often involve the rearrangementof genes on a chromosome

It is also worth noting that synthetic genetic array (SGA)the experimental strategy used to generate double mutants isbased on recombination between two null alleles (Tong et al2001 Costanzo et al 2010) Thus double mutants for linkedgenes may have a smaller initial frequency which may lead toinaccuracy in estimating fitness values for these mutants Todetermine whether such potential experimental bias has anyimpact on our results we calculated the partial EndashD correla-tion after controlling for the double mutant fitness valuesAgain we observed a strong negative EndashD correlation (partialqfrac14017 Pfrac14 13 109 Nfrac14 1254) This result was notunexpected because the fitness of double mutants andgene distance was only weakly correlated (qfrac14 0086Pfrac14 0002 Nfrac14 1254) More discussion on the caveats inthe analysis of empirical data is included in supplementarynote S2 Supplementary Material online

In a recently study 1800 genetic suppression interac-tions were identified in the budding yeast (van Leeuwen et al2016) which can be regarded as an extreme form of positivegenetic interactions Among them nine gene pairs are tightlylinked (D 2 supplementary table S3 SupplementaryMaterial online) This number is significantly greater thanthe random expectation (permutation test Pfrac14 001) againsupporting our model

The selection on clusters of locally adapted alleles whichsometimes were captured in chromosomal inversions hasbeen studied previously (Yeaman 2013 Kirkpatrick 2017) Inour study we proposed a hypothesis (the negative EndashD cor-relation) on deleterious alleles and tested this hypothesis withanalyses on empirical data and simulation both in the con-text of genetic interaction networks Therefore our resultsprovide additional clues for understanding the basic princi-ples of genome organization In particular the origin andmaintenance of clusters of genes in the same metabolic path-way which have puzzled scientists for years (Wong and Wolfe2005 Slot and Rokas 2010 Lang and Botstein 2011) could bewell explained by genetic interactions Genes in a linear met-abolic pathway exhibit positive epistasis because 1) doublemutants and single mutants have identical impacts ondestroying the function of the pathway (He et al 2010)and 2) the accumulation of deleterious intermediate prod-ucts resulting from a loss-of-function mutation in a down-stream gene of a metabolic pathway may be prevented by aloss-of-function mutation in an upstream gene (Wong andWolfe 2005 Slot and Rokas 2010 Lang and Botstein 2011)Because genetic linkage is advantageous among genes withpositive epistasis the clustering of genes in the same linearmetabolic pathway such as genes in the galactose utilizationpathway or allantoin degradation pathway is favored by nat-ural selection (Wong and Wolfe 2005 Slot and Rokas 2010)

Many factors have been reported to influence the evolu-tion of gene order such as tandem gene duplication positioneffects on gene expression noise (Batada and Hurst 2007Chen and Zhang 2016) coordinated gene expression amongneighboring genes (Cho et al 1998 Cohen et al 2000Boutanaev et al 2002 Spellman and Rubin 2002

Williams and Bowles 2004) clustering of functionally relatedgenes (Wong and Wolfe 2005 Slot and Rokas 2010) amongmany others In this study we provided evidence that inaddition to these factors the genetic interaction networkalso played an important role in driving the evolution ofgene order Because the empirical data of epistasis (fitnessvalues) are available in the budding yeast an evolutionarysimulation integrating many genetic interactions is possiblewhich makes epistasis unique among all factors driving theevolution of gene order Based on the empirical data of yeastgenetic interaction network our simulation indicates that theselective coefficient is on the order of 107 suggesting thatepistasis may play an important role in determining geneorder in yeast a species with a relative large effective popu-lation size (107) A negative EndashD correlation is expected to beobserved in species with a smaller effective population sizeonly if the range of epistasis is larger than that in yeast (figs 2and 3) Because the genome-wide empirical data of epistasisare unavailable in species with a smaller effective populationsize (eg humans and flies) it requires further investigationswhether the genetic interaction network plays a role in theevolution of gene order in these species in the future

Materials and Methods

GenomesThe genome annotation of S cerevisiae was downloaded fromthe Saccharomyces Genome Database (SGD httpwwwyeastgenomeorg version R64) The gene order of the recon-structed ancestor before the WGD (Gordon et al 2009) wasdownloaded from the Yeast Gene Order Browser (YGOB)(Byrne and Wolfe 2005)

SimulationTo examine the fitness effect of gene distance (D) two-locusdynamics under selection were simulated for populationswith different D (from 0 to 100 in 1-gene increments) aswell as a series of epistasis values (E from 0004 to 0004in 0001 increments) between two loci The recombinationfrequency (R) between these two loci was estimated from Dusing the equation below

R frac14 D 0004thorn 0064

which is the linear relationship estimated from the distancebetween two genes (D) and the empirical recombinationfrequencies (R) between them quantified in a previous study(Mancera et al 2008) The simulation was performed in hap-loid organisms in order to be in alignment with the analysesof empirical data in which epistasis values were quantified inhaploid yeast (Costanzo et al 2010 2016) A and B are wild-type alleles of two di-allelic loci on the same chromosome aand b are their deleterious alleles with fitness values ofxaBfrac14xAbfrac14 0992 The fitness of the wild-type (xAB) wasdefined as 1 and the epistasis E was defined as xabxaBxAbThe initial allele frequencies of a and b were both 01 The twoloci were initially under linkage equilibrium and therefore thefrequencies of AB(XAB) Ab(XAb) aB(XaB) and ab(Xab) were081 009 009 and 001 respectively After random mating

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3263Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

selection and recombination the frequencies of the fourgenotypes in the next generation were calculated with thefollowing equations (Nei 1967)

X0

AB frac14XABxAB

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

Ab frac14XAbxAb

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

aB frac14XaBxaB

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

ab frac14Xabxab

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

x is the average fitness of a population and was calculatedas follows

x frac14 XABxAB thorn Xabxab thorn XaBxaB thorn XAbxAb

For each population the average fitness (x) and thefrequencies of deleterious alleles (Xa and Xb) wererecorded in each generation of the simulation The differ-ence between the average fitness of two populations withdifferent D (Dfrac14 50 and Dfrac14 0) reflects whether linkage isfavored by natural selection or not The frequency differ-ence of the deleterious alleles between these two popu-lations is related to the long-term effects ofrecombination We also performed the simulation witha variety of parameter values (supplementary fig S4Supplementary Material online)

The average fitness of a population at the 100th generation(x100) was used to infer the fitness effect of gene distance at agiven epistasis value For each epistasis value we could iden-tify an optimal gene distance (Dopt) as well as a series ofpermitted gene distances (Dpermitted) The difference in x100

between Dpermitted and Dopt was lt107 The mean of allDpermitted values was calculated In addition our simulationdid not introduce new mutations to the population andtherefore the frequencies of deleterious alleles reduced overgenerations Nevertheless the negative EndashD correlation wasalso observed at the 50th and 200th generation (supplemen-tary fig S3 Supplementary Material online)

Toy MotifsWe built a toy motif in a genetic interaction network tosimulate the fitness effect of gene distance In the star-likemotif (fig 2A) a hub gene (A) interacted with nine partners(genes B C D E F G H I and J) with nine epistasis valuesranging from 0004 to 0004 in 0001 increments Weattempted to place nine partner genes on the same side ofthe hub gene on a chromosome Because we focused on theepistasis between the hub gene and the partners placingthe partners on different sides would not affect the resultsWe fixed the chromosomal location of the hub gene and thepartner genes were placed at Dfrac14 20 30 40 50 60 70 80 90and 100 from the hub gene The locations of the nine partnerswere shuffled while keeping the epistasis values unchanged

The fitness of a gene order was defined as the average x100 forall nine hub-containing gene pairs Thus we could calculatethe fitness for a total of (9frac14) 362880 possible gene ordersOther parameters (epistasis fitness defect D and initial allelefrequency) were also used to test whether the simulationresult was parameter sensitive (supplementary fig S5Supplementary Material online) Here epistasis values andfitness defects were randomly chosen from the empiricaldata in previous studies (Costanzo et al 2010 2016)

For the all-connected motif (fig 3A) five genes (A B C Dand E) interacted with each other with ten epistasis valuesranging from 00036 to 00036 in 00008 increments Thefive genes were placed on the same chromosome and thegene locations were set as 1 19 65 93 and 102 so that themaximum D in the motif was 100 The locations of these fivegenes were shuffled whereas the epistasis values were keptunchanged Similar to the star-like motifs the fitness of a geneorder was defined as the average x100 for all ten gene pairsThe fitness was calculated for a total of (5frac14) 120 possiblegene orders Other parameters (epistasis fitness defect D andinitial allele frequency) were also used to determine whetherthe simulation result was parameter sensitive (supplementaryfig S6 Supplementary Material online)

EpistasisEpistasis values were retrieved from the studies of Costanzoet al (Costanzo et al 2010 2016) Because the epistasis valuesof the overlapping gene pairs in these studies were highlycorrelated (rfrac14 071 Plt 10100 Pearsonrsquos correlationNfrac14 2604539) we merged their epistasis values A pair ofgenes was filtered if (a) the epistasis value between theirnull mutations was not a number (NaN) or (b) the epistasisvalues of the same null mutation pair had opposite signs inreciprocal crosses or in different studies We also removed thegene pairs with the null mutation in at least one gene result-ing in a higher fitness value than that of the wild-type Thisfiltering was performed because only a small number of an-tagonistic pleiotropic genes were detected in rich media(Qian et al 2012) Thus the elevated fitness observed upongene deletion in Costanzo et al was likely due to inaccurateestimation of fitness If epistasis between a pair of genes wasexamined multiple times we used the epistasis value with thesmallest P value Following Costanzo et al (2010) epistasisvalues with Plt 005 were classified as significant

In addition to the two data sets from Costanzo et al sev-eral other studies have also estimated epistasis values usinghigh-throughput strategies (Boone et al 2007) In principlewe could have included all of them in this study However forpractical reasons these data sets were not suitable for ourstudy First it would not be appropriate to combine otherdata sets with that of Costanzo et al because epistasis valuesfrom multiple studies potentially have different sources oferror especially if they followed different protocols Secondthe sample sizes of epistasis data of linked genes from otherstudies were not sufficiently large to examine the EndashD corre-lation Because the data set from Costanzo et al representsthe largest available so far containing 27 times more

Yang et al doi101093molbevmsx264 MBE

3264Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 3: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

by the chromosomal localization of other genes Thus itremains unknown whether the negative EndashD correlationwould evolve in the context of a highly connected networkof genetic interactions

We first examined the impact of epistasis on the chromo-somal order of genes in a star-like motif which is typical inempirical genetic interaction networks (Costanzo et al 2010)To this end we built a toy motif in which a hub gene interactswith nine partner genes with different epistasis values rangingfrom 0004 to 0004 (fig 2A) We fixed the chromosomallocation of the hub gene and attempted to place partnergenes on the same chromosome We focused on the epistasisand gene distance of hub-containing gene pairs and there-fore we placed all partner genes on the same side of the hubgene on the chromosome for convenience We calculatedx100 for each of the total (9frac14) 362880 possible gene ordersand found that x100 varied among them (fig 2B)Importantly the gene order with the highest x100 showeda perfect negative EndashD correlation (qEDfrac141 fig 2C)whereas the gene order with the lowest x100 showed a per-fect positive EndashD correlation (qEDfrac14 1 fig 2D) In fact we

found that x100 was negatively correlated with qED (fig 2Bqfrac14094 Plt 10100 Spearmanrsquos correlation) implyingthat the negative EndashD correlation itself is under natural se-lection To understand whether the negative correlation be-tween x100 and qED is still present under the parametersderived from empirical data we randomly chose epistasisvalues and fitness defects from two genome-wide studies inthe budding yeast (Costanzo et al 2010 2016) and still ob-served strong negative correlations between x100 and qED

(supplementary fig S5A and B and table S1 SupplementaryMaterial online) And we also confirmed that the negativecorrelation between x100 and qED was insensitive to D andinitial allele frequencies (supplementary fig S5C and D andtable S1 Supplementary Material online)

Next we calculated the distance (d) to the fittest geneorder shown in figure 2C which was defined as the num-ber of differently placed genes (fig 2E) for each possiblegene order We found that an increase in d reduced x100

(fig 2E) again emphasizing the impact of gene order onfitness To further investigate the impact of the range ofepistasis on x100 we generated a series of epistasis

A B

C D

FIG 1 Theoretical prediction and computational simulations of the negative EndashD correlation (A) The short- and long-term effects are in the samedirection when epistasis is positive but are in the opposite directions when epistasis is negative Thus a negative EndashD correlation can be predictedfrom the view of population genetics (B) Fitness differences between a strain with Dfrac1450 and a strain with Dfrac140 are plotted over 100 generationsduring simulations of in silico evolution (C) The difference in allele arsquos frequency (Xa) between strains with Dfrac1450 and Dfrac140 are plotted over 100generations during simulations of in silico evolution (D) The average fitness of a population at the 100th generation (x100) is plotted againstepistasis and D For each epistasis we defined all permitted D values with their resulting x100 If x100 is smaller than that of the optimal distance(Dopt) bylt 107 the minimal selective coefficient that can be detected by nature given the effective population size (Ne 107) of yeast D ispermitted The mean of all permitted D values is plotted against epistasis (dashed line)

Yang et al doi101093molbevmsx264 MBE

3256Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

ranging from 0004 to 0020 shuffled the gene order andrecalculated the average x100 of all gene orders with thesame d (fig 2E) We found that although it was always truethat higher d led to a reduction of x100 the increase in therange of epistasis values enlarged the fitness differencesamong gene orders (fig 2E)

Our model further predicted that positive epistasis shouldplay a more important role in the evolution of gene order(fig 1) such that altering the order of partner genes that havepositive epistasis with the hub gene would lead to a largerfitness reduction As expected we observed that the gene-order variants with changes exclusively to the positively

A

B

C D

E

F

FIG 2 The negative EndashD correlation in star-like motifs of genetic interaction networks (A) A toy model of a star-like motif in which gene A is thehub Gene A has positive epistasis with genes B C D and E and negative epistasis with genes G H I and J The range of epistasis is(0004 [0004]frac14) 0008 in this motif (B) Spearmanrsquos correlation coefficient between epistasis and D (qED) varies among gene orders Theaverage fitness at the 100th generation (x100) is negatively correlated with qED (C) The gene order with the highest x100 qEDfrac141 (D) The geneorder with the lowest x100 qEDfrac141 (E) The difference between a gene order and the gene order with the highest x100 (d) is defined as the number ofdifferently placed genes Two examples with dfrac14 3 are shown The heat map of the relative x100 (normalized to the highest x100) is shown Theaverage relative x100 decreases with the increase of d The reduction is more dramatic when the range of epistasis values is larger (F) Shufflingamong genes B C D and E (positive epistasis with the hub gene A) have larger impact on x100 than shuffling among genes G H I and J (negativeepistasis with the hub gene A) P values of one-tailed MannndashWhitney U test are shown The gray dashed line indicates the x100 of the optimal geneorder in panel (C)

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3257Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

epistatic genes generally had a larger reduction in fitness com-pared with those with changes exclusively to the negativelyepistatic genes (fig 2F)

Chromosomal Arrangement of Genes inAll-Connected Motifs of Genetic Interaction NetworksWe further examined the negative correlation between x100

and qED in all-connected motifs To this end we built a toy all-connected motif with five nodes and assigned epistasis valuesin the range of00036 to 00036 to edges (fig 3A) Again weobserved a strong negative correlation between x100 and qED

(fig 3B qfrac14087 Pfrac14 32 1038) The gene order with thehighest fitness had a strong negative EndashD correlation(qEDfrac14100 fig 3C) whereas the gene order with the lowestfitness had a positive EndashD correlation (qEDfrac14 039 fig 3D)Similarly we confirmed that the negative correlation betweenx100 and qED was insensitive to epistasis values fitnessdefects gene distances and initial allele frequencies (supple-mentary fig S6 and table S2 Supplementary Material online)Furthermore we observed that the fitness of a gene orderdecreased with the increase in its d (fig 3E) and this trend wasstronger when the range of epistasis was larger (fig 3E)

Negative EndashD Correlation in S cerevisiaeTo investigate whether the negative EndashD correlation is sup-ported by empirical evidence we retrieved the pairwise epis-tasis data generated by Costanzo et al who systematicallymeasured the vegetative growth rates of both single and dou-ble mutants in the budding yeast S cerevisiae and estimatedepistasis values for 26 million gene pairs (Costanzo et al2010 2016) As expected we observed a significant negativeEndashD correlation among linked genes (fig 4A qfrac14015Pfrac14 40 108 Nfrac14 1254) Consistent with this trend un-linked genes on the same chromosome exhibited lower epis-tasis values (fig 4A gray dashed line) As a control wepermutated the gene orders and recalculated the correlationcoefficients 1000 times We found that the negative EndashDcorrelation disappeared after permutation (fig 4BPlt 0001 permutation test)

These observations are potentially attributable to a num-ber of confounding factors The first is mutational bias suchas tandem duplication However duplicate genes tend tohave negative epistasis (Tischler et al 2006 Dean et al2008 DeLuna et al 2008 Musso et al 2008 Vavouri et al2008 Qian et al 2010) which should result in a positive EndashDcorrelation Nevertheless we controlled for this mutationalbias by randomly keeping only one gene in a gene family andstill observed the negative EndashD correlation (supplementaryfig S7A Supplementary Material online qfrac14017Pfrac14 18 105 Nfrac14 641)

Second genes with coordinated expression are clustered(Cho et al 1998 Cohen et al 2000 Boutanaev et al 2002Spellman and Rubin 2002 Williams and Bowles 2004) Ifcoordinately expressed genes tend to have positive epistasisthe negative EndashD correlation could result from these genesTo control for this effect we first inferred expression patternsimilarity for each pair of genes by calculating thecorrelation of gene expression levels in multiple conditions

(Qian and Zhang 2014) We did not observe a significantcorrelation between epistasis and expression similarity(qfrac140015 Pfrac14 06 Nfrac14 1202) Nevertheless we dividedthese gene pairs into two groups according to expressionsimilarity recalculated the EndashD correlation within each groupand still observed significant negative EndashD correlations (sup-plementary fig S7B and C Supplementary Material online)Similar results were obtained when we calculated the partialEndashD correlation after controlling for expression similarity(partial qfrac14014 Pfrac14 66 107 Nfrac14 1202) In additionwe also found that coordinated gene expression occurringthrough 3D chromatin interactions did not confound ourresults (supplementary fig S7D Supplementary Material on-line qfrac14017 Pfrac14 87 106 Nfrac14 704) which was not un-expected as 3D chromatin interactions do not influencerecombination frequency

Because functionally related genes are nonrandomlydistributed on chromosomes (Wong and Wolfe 2005

A

B

C

D

E

FIG 3 The negative EndashD correlation in all-connected motifs of ge-netic interaction networks (A) A toy model of an all-connected mo-tif (B) The fitness at the 100th generation (x100) is negativelycorrelated with qED (C) The gene order with the highest x100qEDfrac14100 (D) The gene order with the lowest x100 qEDfrac14039(E) The heat map of the relative x100 (normalized to the highestx100) is shown The relative x100 decreases with the increase of dThe reduction is more dramatic when the range of epistasis is larger

Yang et al doi101093molbevmsx264 MBE

3258Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Slot and Rokas 2010) and functional relationships betweengenes may lead to epistasis we next examined whetherfunctional relationships could confound the negativeEndashD correlation We observed similar negative EndashD corre-lations for gene pairs with an either high- or low semanticsimilarity of GO terms in molecular functions (supplemen-tary fig S7E and F Supplementary Material online) biolog-ical processes (supplementary fig S7G and HSupplementary Material online) and cellular components(supplementary fig S7I and J Supplementary Material on-line) Again we calculated partial correlations controllingfor semantic similarity of GO terms in molecular functions(partial qfrac14015 Pfrac14 73 107 Nfrac14 1151) biologicalprocesses (partial qfrac14015 Pfrac14 32 107 Nfrac14 1151)and cellular components (partial qfrac14015Pfrac14 32 107 Nfrac14 1151) All these results indicate thatthe functional relationship is not a confounding factor inthe negative EndashD correlation which is not unexpectedgiven that a large fraction of genetic interactions do notreflect functional relationships (He et al 2010 Costanzoet al 2016)

Finally we investigated the impact of gene expressionnoise on the negative EndashD correlation because it has beenproposed that essential genes were colocalized in open chro-matin regions to reduce gene expression noise (Batada andHurst 2007 Chen and Zhang 2016) To control for this effectwe first calculated the average gene expression noise(Newman et al 2006) for each gene pair We used the dis-tance of each coefficient of variation (CV) to a running me-dian of CV values (DM) to quantify gene expression noise(Newman et al 2006) in order to minimize the effect of geneexpression magnitude on gene expression noise We observeda negative EndashD correlation for gene pairs with an either high-or low average DM (supplementary fig S7K and LSupplementary Material online) Again we calculated thepartial correlation controlling for gene expression noise andstill observed a negative EndashD correlation (partial qfrac14016Pfrac14 97 103 Nfrac14 258)

Positive Epistasis Plays an Important Role in the Originof the Negative EndashD CorrelationOur model further predicted that the reduction of the dis-tance between positively epistatic genes should play a moreimportant role in the formation of the negative EndashD correla-tion (figs 1 and 2) Indeed a significant negative EndashD corre-lation was observed among positively epistatic gene pairs inS cerevisiae (fig 4C qfrac14014 Pfrac14 00064 Nfrac14 391) whereasno significant correlation was observed among negatively ep-istatic gene pairs (fig 4D qfrac140025 Pfrac14 047 Nfrac14 863) Wefurther verified the role of positive epistasis by shuffling epis-tasis values among all 391 positively epistatic gene pairs inS cerevisiae As expected the EndashD correlation was significantlyweakened after the permutation (fig 4E Pfrac14 0005 one-tailedpermutation test) By contrast no significant difference wasobserved after shuffling negative epistasis values (fig 4FPfrac14 0722 one-tailed permutation test) even though the lat-ter analysis shuffled more gene pairs (Nfrac14 863)

A B

C D

E F

G H

FIG 4 A negative EndashD correlation is observed in the empirical geneticinteraction network of the budding yeast S cerevisiae and positiveepistasis plays a more important role in its formation (A) A significantnegative EndashD correlation is observed in S cerevisiae Gene pairs areseparated into bins based on D with equal width of five genes Themean value of epistasis and the standard error of the mean (SEM)within each bin are shown Spearmanrsquos correlation coefficient q andcorresponding P values were calculated from the raw data (Nfrac141254)The gray dashed line shows the average epistasis among unlinkedgenes (Dgt100) (B) Distribution of EndashD correlation coefficients in1000 shuffled genomes The arrow indicates the observed correlationcoefficient in S cerevisiae (C and D) A significant negative EndashD cor-relation is observed among positively epistatic gene pairs (Nfrac14391)but not among negatively epistatic gene pairs (Nfrac14863) Spearmanrsquoscorrelation coefficient q and the corresponding P values are calcu-lated from the raw data The dashed line shows the average epistasisamong unlinked genes (E) The distribution of correlation coefficientsin 1000 artificial genomes in which values of positive epistasis areshuffled The arrow indicates the EndashD correlation coefficient in reality(F) Similar to (E) values of negative epistasis are shuffled (G and H)The proportion of gene pairs with significant positive epistasis is sig-nificantly correlated with D but that with significant negative epis-tasis is not SEMs are estimated based on binomial distribution Thedashed lines show the proportion of gene pairs with significant pos-itive or negative epistasis among unlinked genes

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3259Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

In studies by Costanzo et al epistasis was classified intothree categories significantly positive significantly negativeand nonsignificant (Costanzo et al 2010 2016) A strongnegative correlation was observed between gene distanceand the proportion of gene pairs with significant positiveepistasis (fig 4G qfrac14082 Pfrac14 55 105) whereas no sig-nificant correlation was observed between gene distance andthe proportion of gene pairs with significant negative epistasis(fig 4H qfrac14 043 Pfrac14 0088) All these observations empha-size the important role of positive epistasis in the origin of thenegative EndashD correlation

Epistasis-Driven Evolution of Gene Order after theWhole Genome Duplication in YeastThus far we have observed a negative EndashD correlation anddemonstrated that the correlation was mainly attributable topositively epistatic gene pairs The aforementioned theoriessimulations and empirical evidence led us to propose a hy-pothesis of epistasis-driven evolution of gene order in S cer-evisiae The whole genome duplication (WGD fig 5A) andthe subsequent extensive gene losses substantially changedthe epistatic relations among genes and markedly rewired thegenetic interaction network in yeast (Kellis et al 2004 Dixonet al 2008 Tischler et al 2008 VanderSluis et al 2010) Forinstance only 29 of the identified synthetic lethality is con-served between Schizosaccharomyces pombe and S cerevisiae(Dixon et al 2008) At the same time extensive chromosomerearrangement events occurred For example in areconstructed ancestral species before the WGD (Byrne andWolfe 2005) we identified the gene pairs that were located onthe same chromosome and surprisingly 846 of them arelocalized on different chromosomes in S cerevisiae Morestrikingly among gene pairs that are localized on the samechromosome in both the ancestral species and S cerevisiae996 differ in gene distance Based on these observations weproposed that the rewired genetic interaction network drovethe evolution of gene order resulting in numerous chromo-some rearrangement events (Kellis et al 2004) When genelosses ceased the rewiring of genetic interactions slowed andthe evolutionary force on gene distance also diminishedConsistently synteny relationships are strongly conserved inthe species of the Saccharomyces sensu stricto group (Kelliset al 2003)

Our model predicted that the negative EndashD correlationshould be weaker if the gene order in S cerevisiae has beenunchanged since the WGD The reason is that the gene orderin the ancestor was not subject to the natural selection im-posed by the genetic interaction network of the currentS cerevisiae genome Furthermore given the massive genelosses after the WGD the genetic interaction network cannotbe 100 conserved To test this prediction we calculated thegene distances in the reconstructed ancestral species men-tioned earlier (Byrne and Wolfe 2005) Indeed we found thatthe negative EndashD correlation disappeared when the genedistances in S cerevisiae were replaced by those in the ances-tral species (fig 5B qfrac14 69 103 Pfrac14 026 Nfrac14 26630) Thisobservation indicates that the negative EndashD correlation inS cerevisiae formed during the evolution of gene order after

the WGD Consistently we observed that positively epistaticgene pairs decreased their distances whereas negatively epi-static gene pairs increased their distances during evolution(fig 5C qfrac14017 Pfrac14 0061 Nfrac14 127)

To further test the role of positive epistasis in the evolutionof gene order we identified genes that were ancestrally linked(ie D 100 in the reconstructed ancestor) and examinedwhether they moved toward or away from each other duringevolution Consistent with our model genes with significantpositive epistasis were more likely to move toward each otherthan genes without significant epistasis (fig 5DPfrac14 15 103 two-tailed Fisherrsquos exact test) whereas thedifference between gene pairs with significant negative epis-tasis and those without significant epistasis was not signifi-cant (fig 5D Pfrac14 041 two-tailed Fisherrsquos exact test) Togetherthese observations support our hypothesis of epistasis-drivenevolution of gene order in yeast

Genetic Interaction Network Accurately PredictsGene Order in YeastFinally we determined whether the gene order in S cerevisiaecould be successfully predicted by the empirical data of ge-netic interaction networks (Costanzo et al 2010 2016) Tothis end we identified 22 all-connected three-node motifs in

A B

C D

FIG 5 The origin of the negative EndashD correlation after the WGD inyeast (A) Phylogenetic relationship among yeast species The blackarrow indicates the reconstructed ancestor and the gray arrow indi-cates the WGD event (B) Negative EndashD correlation is not observedwhen the gene order in S cerevisiae is replaced by that in the recon-structed ancestor Gene pairs are separated into bins based on D withequal width of five genes The mean and SEM of epistasis within eachbin are shown Spearmanrsquos correlation coefficient q and the corre-sponding P values were calculated from the raw data (Nfrac14 26630)The dashed line shows the average epistasis among unlinked genes(C) The change in D (S cerevisiaemdashthe reconstructed ancestor) isnegatively correlated with the epistasis in S cerevisiae (Nfrac14 127)(D) The D between a gene pair in S cerevisiae is compared withthat in the reconstructed ancestor Proportions of gene pairs movingtoward and away from each other among gene pairs with significantpositive epistasis (left) nonsignificant epistasis (middle) and signifi-cant negative epistasis (right) are shown

Yang et al doi101093molbevmsx264 MBE

3260Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

which all three genes are localized within a 100-gene range ona chromosome An example is shown in figure 6A in whichVMA22 positively interacts with SRB2 and negatively interactswith AIM17 and the epistatic interaction between SRB2 andAIM17 is weak All six possible gene orders are enumerated infigure 6B Among them the first gene order exhibits a perfectEndashD anticorrelation (qfrac141 fig 6C) which is exactly theprediction of our model In fact it is also the real order ofthese genes on chromosome VIII The second gene ordersuccessfully places positively epistatic genes close to eachother and negatively epistatic genes far from each otherTherefore it is generally consistent with our prediction(fig 6C) Either gene order was considered as being success-fully predicted by our model if it actually occurred in the yeastgenome

We examined the accuracy of our prediction at the geno-mic scale We first divided these 22 motifs into two groupsbased on the magnitude of repistasis Group L contains 11motifs with larger repistasis and Group S contains 11 motifswith smaller repistasis (fig 6D) We found that the gene ordersof seven (out of 11) motifs in Group L were precisely pre-dicted by our model and the accuracy was significantly higherthan the random expectation (176 fig 6E Pfrac14 0001 per-mutation test) By contrast the predictive accuracy in GroupS (2 out of 11) was not significantly different from the randomexpectation (fig 6E Pfrac14 0607 permutation test) because lowvariation among epistasis values in a motif reduces the selec-tive coefficients (figs 2E and 3E) More broadly the propor-tion of successful predictions (first and second gene orders infig 6B) is 100 in Group L significantly higher than the ran-dom expectation (333 fig 6F Plt 0001 permutation test)This high predictive power suggests that epistasis plays avital role in driving the evolution of gene order We fur-ther identified 243 and 1302 all-connected motifs withinthe range of 150 and 200 genes on the same chromosomerespectively and again confirmed the predictive power ofthe genetic interaction network (supplementary fig S8AndashF Supplementary Material online) Moreover the predic-tive power of epistasis on gene order is independent ofexpression similarity because the latter could not accu-rately predict gene order (supplementary fig S9AndashISupplementary Material online)

We then determined whether gene order could be pre-dicted when the genetic interaction network is incompleteTo this end we identified 92 star-like motifs in which all genesare on the same chromosome and at least one gene withD 40 to the hub gene For example SFH1 which encodes acomponent of a chromatin remodeling complex geneticallyinteracts with 90 genes on the same chromosome (fig 6Gshows 20 genes with Dlt 100 to SFH1) Because the negativeEndashD correlation is mainly contributed by positively epistaticgene pairs in theory (figs 1 and 2) our model predicted thatgenes having strong positive epistasis with SFH1 should belocated close to it on the chromosome Indeed these genes(VRP1 FKS1 RPL26A VPS38 DCR2) are located close to SFH1Specifically the gene (VRP1) having the strongest positiveepistatic interaction with SFH1 was the closest gene toSFH1 on the chromosome (fig 6H)

To examine the predictive accuracy at the genomic scalewe divided these 92 star-like motifs into two groups based onthe difference between the top two highest epistasis values inthe motif (Diffepistasis fig 6J) We found that the proportion ofsuccessful predictions was 326 in Group L and 217 inGroup S both of which were significantly higher than randomexpectation (fig 6K Plt 0001 for Group L and Pfrac14 0001 forGroup S permutation test) Furthermore despite the lowerpredictive power our model could still predict gene locationsin 707 star-like motifs in which the closest gene to the hubwas within Dfrac14 90 (supplementary fig S8G and HSupplementary Material online Pfrac14 0002 for Group L andPfrac14 0036 for Group S permutation test)

DiscussionIn our study we provided results from both computationalsimulation and empirical data analysis to support the role ofgenetic interaction networks in driving the evolution of geneorder We performed simulations with different sets ofparameters some of which were from empirical data orwithin the range of empirical data (Schacherer et al 2009Costanzo et al 2010) The negative EndashD correlation was ob-served with all parameter sets (supplementary figs S4ndashS6Supplementary Material online) consistent with both theo-retical predictions (fig 1A) and empirical results (fig 4)

The epistasis values were estimated mainly from null alleles(Costanzo et al 2010) thus it remains unknown whether ourconclusion applies to other deleterious alleles FortunatelyCostanzo et al also estimated 7786453 pairwise epistasisvalues for either decreased abundance by mRNA perturba-tion (DAmP) alleles or temperature-sensitive (ts) alleles withmutations typically changing coding sequences which offeredus the opportunity to address this question (Costanzo et al2010 2016) We found that the epistasis values of null alleleswere significantly correlated with those of other deleteriousalleles of the same gene (supplementary fig S10AndashLSupplementary Material online) suggesting that the negativeEndashD correlation is likely a general phenomenon for variousdeleterious alleles More importantly because DNA deletionevents that lead to null alleles are frequently observed in yeastnatural populations (Schacherer et al 2009) epistasis amongnull alleles by itself may be sufficiently strong to drive theevolution of gene distance

In principle if the frequencies of deleterious alleles in apopulation are significantly reduced by natural selectionthe advantage of genetic linkage could be small for this pop-ulation However numerous deleterious mutations (eg de-letion of a whole gene nonsense mutations missensemutations and mutations altering start codons) have beenobserved in natural populations (Liti et al 2009 Schachereret al 2009) suggesting that the frequencies of deleteriousalleles may not be low This phenomenon is probably dueto the antagonistic pleiotropic effects of a mutation in mul-tiple environments (Qian et al 2012)

It is worth noting that theoretical analysis predicts thatepistasis can drive the evolution of recombination frequencyrather than gene distance In addition to gene distance 1) the

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3261Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

lengths of intergenic regions and 2) recombination hotcold spots could also affect recombination frequencyNevertheless a strong correlation between gene distanceand physical distance that is the number of nucleotidesbetween two genes was observed (qfrac14 0998 Plt 10100Nfrac14 1624935) suggesting that the effect of (1) is negligi-ble Furthermore physical distance is highly correlatedwith recombination frequency when the physical distanceislt 180 kb (supplementary fig S1C SupplementaryMaterial online qfrac14 088 Plt 10100 Nfrac14 336227)

suggesting that the effect of (2) is also limited As a resultrecombination frequency and gene distance are highlycorrelated (supplementary fig S1B SupplementaryMaterial online qfrac14 087 Plt 10100 Nfrac14 337892 forgene pairs with D 100) Because the recombination fre-quency was only measured for 58 of gene pairs inS cerevisiae (Mancera et al 2008) and was unknown forthe reconstructed ancestral species we used gene dis-tance to approximate recombination frequency in thisstudy The optimization of gene distances among

A

HK

B C D

G

E

I

F

J

FIG 6 Gene orders can be predicted from the yeast genetic interaction network (A) A three-node all-connected motif in the yeast geneticinteraction network The standard deviation of epistasis values (repistasis) in this motif is 005 (B) Six possible gene orders are listed The first is theobserved gene order on chromosome VIII in yeast (C) The EndashD relationship for the six possible gene orders in (B) We consider the predictionsuccessful if the first two gene orders are observed because they exhibit negative EndashD correlations In particular the prediction is precise if the firstgene order is observed The dashed line represents Efrac14 0 (D) A histogram of the standard deviation of epistasis values (repistasis) based on whichgenes are divided into two groups with similar sizes Group S and Group L (E) The proportions of precise prediction of Group S Group L andrandom (R) expectations based on permutation Error bars represent standard deviations among 1000 permutations (F) The proportions ofsuccessful predictions similar to (E) (GndashI) A star-like motif in the yeast genetic interaction network and the gene order on chromosome XII inyeast The hub gene SFH1 and 19 genes with the distancelt100 to SFH1 are shown (J) A histogram of differences in the two largest epistasis valuesin the motif (Diffepistasis) based on which genes are divided into two groups with similar sizes Group S and Group L (K) Proportion of successfulpredictions of the closest gene in Group S and Group L Their respective random expectations based on permutation are shown in gray Error barsrepresent standard deviations among 1000 permutations

Yang et al doi101093molbevmsx264 MBE

3262Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

multiple gene pairs may often involve the rearrangementof genes on a chromosome

It is also worth noting that synthetic genetic array (SGA)the experimental strategy used to generate double mutants isbased on recombination between two null alleles (Tong et al2001 Costanzo et al 2010) Thus double mutants for linkedgenes may have a smaller initial frequency which may lead toinaccuracy in estimating fitness values for these mutants Todetermine whether such potential experimental bias has anyimpact on our results we calculated the partial EndashD correla-tion after controlling for the double mutant fitness valuesAgain we observed a strong negative EndashD correlation (partialqfrac14017 Pfrac14 13 109 Nfrac14 1254) This result was notunexpected because the fitness of double mutants andgene distance was only weakly correlated (qfrac14 0086Pfrac14 0002 Nfrac14 1254) More discussion on the caveats inthe analysis of empirical data is included in supplementarynote S2 Supplementary Material online

In a recently study 1800 genetic suppression interac-tions were identified in the budding yeast (van Leeuwen et al2016) which can be regarded as an extreme form of positivegenetic interactions Among them nine gene pairs are tightlylinked (D 2 supplementary table S3 SupplementaryMaterial online) This number is significantly greater thanthe random expectation (permutation test Pfrac14 001) againsupporting our model

The selection on clusters of locally adapted alleles whichsometimes were captured in chromosomal inversions hasbeen studied previously (Yeaman 2013 Kirkpatrick 2017) Inour study we proposed a hypothesis (the negative EndashD cor-relation) on deleterious alleles and tested this hypothesis withanalyses on empirical data and simulation both in the con-text of genetic interaction networks Therefore our resultsprovide additional clues for understanding the basic princi-ples of genome organization In particular the origin andmaintenance of clusters of genes in the same metabolic path-way which have puzzled scientists for years (Wong and Wolfe2005 Slot and Rokas 2010 Lang and Botstein 2011) could bewell explained by genetic interactions Genes in a linear met-abolic pathway exhibit positive epistasis because 1) doublemutants and single mutants have identical impacts ondestroying the function of the pathway (He et al 2010)and 2) the accumulation of deleterious intermediate prod-ucts resulting from a loss-of-function mutation in a down-stream gene of a metabolic pathway may be prevented by aloss-of-function mutation in an upstream gene (Wong andWolfe 2005 Slot and Rokas 2010 Lang and Botstein 2011)Because genetic linkage is advantageous among genes withpositive epistasis the clustering of genes in the same linearmetabolic pathway such as genes in the galactose utilizationpathway or allantoin degradation pathway is favored by nat-ural selection (Wong and Wolfe 2005 Slot and Rokas 2010)

Many factors have been reported to influence the evolu-tion of gene order such as tandem gene duplication positioneffects on gene expression noise (Batada and Hurst 2007Chen and Zhang 2016) coordinated gene expression amongneighboring genes (Cho et al 1998 Cohen et al 2000Boutanaev et al 2002 Spellman and Rubin 2002

Williams and Bowles 2004) clustering of functionally relatedgenes (Wong and Wolfe 2005 Slot and Rokas 2010) amongmany others In this study we provided evidence that inaddition to these factors the genetic interaction networkalso played an important role in driving the evolution ofgene order Because the empirical data of epistasis (fitnessvalues) are available in the budding yeast an evolutionarysimulation integrating many genetic interactions is possiblewhich makes epistasis unique among all factors driving theevolution of gene order Based on the empirical data of yeastgenetic interaction network our simulation indicates that theselective coefficient is on the order of 107 suggesting thatepistasis may play an important role in determining geneorder in yeast a species with a relative large effective popu-lation size (107) A negative EndashD correlation is expected to beobserved in species with a smaller effective population sizeonly if the range of epistasis is larger than that in yeast (figs 2and 3) Because the genome-wide empirical data of epistasisare unavailable in species with a smaller effective populationsize (eg humans and flies) it requires further investigationswhether the genetic interaction network plays a role in theevolution of gene order in these species in the future

Materials and Methods

GenomesThe genome annotation of S cerevisiae was downloaded fromthe Saccharomyces Genome Database (SGD httpwwwyeastgenomeorg version R64) The gene order of the recon-structed ancestor before the WGD (Gordon et al 2009) wasdownloaded from the Yeast Gene Order Browser (YGOB)(Byrne and Wolfe 2005)

SimulationTo examine the fitness effect of gene distance (D) two-locusdynamics under selection were simulated for populationswith different D (from 0 to 100 in 1-gene increments) aswell as a series of epistasis values (E from 0004 to 0004in 0001 increments) between two loci The recombinationfrequency (R) between these two loci was estimated from Dusing the equation below

R frac14 D 0004thorn 0064

which is the linear relationship estimated from the distancebetween two genes (D) and the empirical recombinationfrequencies (R) between them quantified in a previous study(Mancera et al 2008) The simulation was performed in hap-loid organisms in order to be in alignment with the analysesof empirical data in which epistasis values were quantified inhaploid yeast (Costanzo et al 2010 2016) A and B are wild-type alleles of two di-allelic loci on the same chromosome aand b are their deleterious alleles with fitness values ofxaBfrac14xAbfrac14 0992 The fitness of the wild-type (xAB) wasdefined as 1 and the epistasis E was defined as xabxaBxAbThe initial allele frequencies of a and b were both 01 The twoloci were initially under linkage equilibrium and therefore thefrequencies of AB(XAB) Ab(XAb) aB(XaB) and ab(Xab) were081 009 009 and 001 respectively After random mating

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3263Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

selection and recombination the frequencies of the fourgenotypes in the next generation were calculated with thefollowing equations (Nei 1967)

X0

AB frac14XABxAB

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

Ab frac14XAbxAb

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

aB frac14XaBxaB

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

ab frac14Xabxab

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

x is the average fitness of a population and was calculatedas follows

x frac14 XABxAB thorn Xabxab thorn XaBxaB thorn XAbxAb

For each population the average fitness (x) and thefrequencies of deleterious alleles (Xa and Xb) wererecorded in each generation of the simulation The differ-ence between the average fitness of two populations withdifferent D (Dfrac14 50 and Dfrac14 0) reflects whether linkage isfavored by natural selection or not The frequency differ-ence of the deleterious alleles between these two popu-lations is related to the long-term effects ofrecombination We also performed the simulation witha variety of parameter values (supplementary fig S4Supplementary Material online)

The average fitness of a population at the 100th generation(x100) was used to infer the fitness effect of gene distance at agiven epistasis value For each epistasis value we could iden-tify an optimal gene distance (Dopt) as well as a series ofpermitted gene distances (Dpermitted) The difference in x100

between Dpermitted and Dopt was lt107 The mean of allDpermitted values was calculated In addition our simulationdid not introduce new mutations to the population andtherefore the frequencies of deleterious alleles reduced overgenerations Nevertheless the negative EndashD correlation wasalso observed at the 50th and 200th generation (supplemen-tary fig S3 Supplementary Material online)

Toy MotifsWe built a toy motif in a genetic interaction network tosimulate the fitness effect of gene distance In the star-likemotif (fig 2A) a hub gene (A) interacted with nine partners(genes B C D E F G H I and J) with nine epistasis valuesranging from 0004 to 0004 in 0001 increments Weattempted to place nine partner genes on the same side ofthe hub gene on a chromosome Because we focused on theepistasis between the hub gene and the partners placingthe partners on different sides would not affect the resultsWe fixed the chromosomal location of the hub gene and thepartner genes were placed at Dfrac14 20 30 40 50 60 70 80 90and 100 from the hub gene The locations of the nine partnerswere shuffled while keeping the epistasis values unchanged

The fitness of a gene order was defined as the average x100 forall nine hub-containing gene pairs Thus we could calculatethe fitness for a total of (9frac14) 362880 possible gene ordersOther parameters (epistasis fitness defect D and initial allelefrequency) were also used to test whether the simulationresult was parameter sensitive (supplementary fig S5Supplementary Material online) Here epistasis values andfitness defects were randomly chosen from the empiricaldata in previous studies (Costanzo et al 2010 2016)

For the all-connected motif (fig 3A) five genes (A B C Dand E) interacted with each other with ten epistasis valuesranging from 00036 to 00036 in 00008 increments Thefive genes were placed on the same chromosome and thegene locations were set as 1 19 65 93 and 102 so that themaximum D in the motif was 100 The locations of these fivegenes were shuffled whereas the epistasis values were keptunchanged Similar to the star-like motifs the fitness of a geneorder was defined as the average x100 for all ten gene pairsThe fitness was calculated for a total of (5frac14) 120 possiblegene orders Other parameters (epistasis fitness defect D andinitial allele frequency) were also used to determine whetherthe simulation result was parameter sensitive (supplementaryfig S6 Supplementary Material online)

EpistasisEpistasis values were retrieved from the studies of Costanzoet al (Costanzo et al 2010 2016) Because the epistasis valuesof the overlapping gene pairs in these studies were highlycorrelated (rfrac14 071 Plt 10100 Pearsonrsquos correlationNfrac14 2604539) we merged their epistasis values A pair ofgenes was filtered if (a) the epistasis value between theirnull mutations was not a number (NaN) or (b) the epistasisvalues of the same null mutation pair had opposite signs inreciprocal crosses or in different studies We also removed thegene pairs with the null mutation in at least one gene result-ing in a higher fitness value than that of the wild-type Thisfiltering was performed because only a small number of an-tagonistic pleiotropic genes were detected in rich media(Qian et al 2012) Thus the elevated fitness observed upongene deletion in Costanzo et al was likely due to inaccurateestimation of fitness If epistasis between a pair of genes wasexamined multiple times we used the epistasis value with thesmallest P value Following Costanzo et al (2010) epistasisvalues with Plt 005 were classified as significant

In addition to the two data sets from Costanzo et al sev-eral other studies have also estimated epistasis values usinghigh-throughput strategies (Boone et al 2007) In principlewe could have included all of them in this study However forpractical reasons these data sets were not suitable for ourstudy First it would not be appropriate to combine otherdata sets with that of Costanzo et al because epistasis valuesfrom multiple studies potentially have different sources oferror especially if they followed different protocols Secondthe sample sizes of epistasis data of linked genes from otherstudies were not sufficiently large to examine the EndashD corre-lation Because the data set from Costanzo et al representsthe largest available so far containing 27 times more

Yang et al doi101093molbevmsx264 MBE

3264Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 4: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

ranging from 0004 to 0020 shuffled the gene order andrecalculated the average x100 of all gene orders with thesame d (fig 2E) We found that although it was always truethat higher d led to a reduction of x100 the increase in therange of epistasis values enlarged the fitness differencesamong gene orders (fig 2E)

Our model further predicted that positive epistasis shouldplay a more important role in the evolution of gene order(fig 1) such that altering the order of partner genes that havepositive epistasis with the hub gene would lead to a largerfitness reduction As expected we observed that the gene-order variants with changes exclusively to the positively

A

B

C D

E

F

FIG 2 The negative EndashD correlation in star-like motifs of genetic interaction networks (A) A toy model of a star-like motif in which gene A is thehub Gene A has positive epistasis with genes B C D and E and negative epistasis with genes G H I and J The range of epistasis is(0004 [0004]frac14) 0008 in this motif (B) Spearmanrsquos correlation coefficient between epistasis and D (qED) varies among gene orders Theaverage fitness at the 100th generation (x100) is negatively correlated with qED (C) The gene order with the highest x100 qEDfrac141 (D) The geneorder with the lowest x100 qEDfrac141 (E) The difference between a gene order and the gene order with the highest x100 (d) is defined as the number ofdifferently placed genes Two examples with dfrac14 3 are shown The heat map of the relative x100 (normalized to the highest x100) is shown Theaverage relative x100 decreases with the increase of d The reduction is more dramatic when the range of epistasis values is larger (F) Shufflingamong genes B C D and E (positive epistasis with the hub gene A) have larger impact on x100 than shuffling among genes G H I and J (negativeepistasis with the hub gene A) P values of one-tailed MannndashWhitney U test are shown The gray dashed line indicates the x100 of the optimal geneorder in panel (C)

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3257Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

epistatic genes generally had a larger reduction in fitness com-pared with those with changes exclusively to the negativelyepistatic genes (fig 2F)

Chromosomal Arrangement of Genes inAll-Connected Motifs of Genetic Interaction NetworksWe further examined the negative correlation between x100

and qED in all-connected motifs To this end we built a toy all-connected motif with five nodes and assigned epistasis valuesin the range of00036 to 00036 to edges (fig 3A) Again weobserved a strong negative correlation between x100 and qED

(fig 3B qfrac14087 Pfrac14 32 1038) The gene order with thehighest fitness had a strong negative EndashD correlation(qEDfrac14100 fig 3C) whereas the gene order with the lowestfitness had a positive EndashD correlation (qEDfrac14 039 fig 3D)Similarly we confirmed that the negative correlation betweenx100 and qED was insensitive to epistasis values fitnessdefects gene distances and initial allele frequencies (supple-mentary fig S6 and table S2 Supplementary Material online)Furthermore we observed that the fitness of a gene orderdecreased with the increase in its d (fig 3E) and this trend wasstronger when the range of epistasis was larger (fig 3E)

Negative EndashD Correlation in S cerevisiaeTo investigate whether the negative EndashD correlation is sup-ported by empirical evidence we retrieved the pairwise epis-tasis data generated by Costanzo et al who systematicallymeasured the vegetative growth rates of both single and dou-ble mutants in the budding yeast S cerevisiae and estimatedepistasis values for 26 million gene pairs (Costanzo et al2010 2016) As expected we observed a significant negativeEndashD correlation among linked genes (fig 4A qfrac14015Pfrac14 40 108 Nfrac14 1254) Consistent with this trend un-linked genes on the same chromosome exhibited lower epis-tasis values (fig 4A gray dashed line) As a control wepermutated the gene orders and recalculated the correlationcoefficients 1000 times We found that the negative EndashDcorrelation disappeared after permutation (fig 4BPlt 0001 permutation test)

These observations are potentially attributable to a num-ber of confounding factors The first is mutational bias suchas tandem duplication However duplicate genes tend tohave negative epistasis (Tischler et al 2006 Dean et al2008 DeLuna et al 2008 Musso et al 2008 Vavouri et al2008 Qian et al 2010) which should result in a positive EndashDcorrelation Nevertheless we controlled for this mutationalbias by randomly keeping only one gene in a gene family andstill observed the negative EndashD correlation (supplementaryfig S7A Supplementary Material online qfrac14017Pfrac14 18 105 Nfrac14 641)

Second genes with coordinated expression are clustered(Cho et al 1998 Cohen et al 2000 Boutanaev et al 2002Spellman and Rubin 2002 Williams and Bowles 2004) Ifcoordinately expressed genes tend to have positive epistasisthe negative EndashD correlation could result from these genesTo control for this effect we first inferred expression patternsimilarity for each pair of genes by calculating thecorrelation of gene expression levels in multiple conditions

(Qian and Zhang 2014) We did not observe a significantcorrelation between epistasis and expression similarity(qfrac140015 Pfrac14 06 Nfrac14 1202) Nevertheless we dividedthese gene pairs into two groups according to expressionsimilarity recalculated the EndashD correlation within each groupand still observed significant negative EndashD correlations (sup-plementary fig S7B and C Supplementary Material online)Similar results were obtained when we calculated the partialEndashD correlation after controlling for expression similarity(partial qfrac14014 Pfrac14 66 107 Nfrac14 1202) In additionwe also found that coordinated gene expression occurringthrough 3D chromatin interactions did not confound ourresults (supplementary fig S7D Supplementary Material on-line qfrac14017 Pfrac14 87 106 Nfrac14 704) which was not un-expected as 3D chromatin interactions do not influencerecombination frequency

Because functionally related genes are nonrandomlydistributed on chromosomes (Wong and Wolfe 2005

A

B

C

D

E

FIG 3 The negative EndashD correlation in all-connected motifs of ge-netic interaction networks (A) A toy model of an all-connected mo-tif (B) The fitness at the 100th generation (x100) is negativelycorrelated with qED (C) The gene order with the highest x100qEDfrac14100 (D) The gene order with the lowest x100 qEDfrac14039(E) The heat map of the relative x100 (normalized to the highestx100) is shown The relative x100 decreases with the increase of dThe reduction is more dramatic when the range of epistasis is larger

Yang et al doi101093molbevmsx264 MBE

3258Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Slot and Rokas 2010) and functional relationships betweengenes may lead to epistasis we next examined whetherfunctional relationships could confound the negativeEndashD correlation We observed similar negative EndashD corre-lations for gene pairs with an either high- or low semanticsimilarity of GO terms in molecular functions (supplemen-tary fig S7E and F Supplementary Material online) biolog-ical processes (supplementary fig S7G and HSupplementary Material online) and cellular components(supplementary fig S7I and J Supplementary Material on-line) Again we calculated partial correlations controllingfor semantic similarity of GO terms in molecular functions(partial qfrac14015 Pfrac14 73 107 Nfrac14 1151) biologicalprocesses (partial qfrac14015 Pfrac14 32 107 Nfrac14 1151)and cellular components (partial qfrac14015Pfrac14 32 107 Nfrac14 1151) All these results indicate thatthe functional relationship is not a confounding factor inthe negative EndashD correlation which is not unexpectedgiven that a large fraction of genetic interactions do notreflect functional relationships (He et al 2010 Costanzoet al 2016)

Finally we investigated the impact of gene expressionnoise on the negative EndashD correlation because it has beenproposed that essential genes were colocalized in open chro-matin regions to reduce gene expression noise (Batada andHurst 2007 Chen and Zhang 2016) To control for this effectwe first calculated the average gene expression noise(Newman et al 2006) for each gene pair We used the dis-tance of each coefficient of variation (CV) to a running me-dian of CV values (DM) to quantify gene expression noise(Newman et al 2006) in order to minimize the effect of geneexpression magnitude on gene expression noise We observeda negative EndashD correlation for gene pairs with an either high-or low average DM (supplementary fig S7K and LSupplementary Material online) Again we calculated thepartial correlation controlling for gene expression noise andstill observed a negative EndashD correlation (partial qfrac14016Pfrac14 97 103 Nfrac14 258)

Positive Epistasis Plays an Important Role in the Originof the Negative EndashD CorrelationOur model further predicted that the reduction of the dis-tance between positively epistatic genes should play a moreimportant role in the formation of the negative EndashD correla-tion (figs 1 and 2) Indeed a significant negative EndashD corre-lation was observed among positively epistatic gene pairs inS cerevisiae (fig 4C qfrac14014 Pfrac14 00064 Nfrac14 391) whereasno significant correlation was observed among negatively ep-istatic gene pairs (fig 4D qfrac140025 Pfrac14 047 Nfrac14 863) Wefurther verified the role of positive epistasis by shuffling epis-tasis values among all 391 positively epistatic gene pairs inS cerevisiae As expected the EndashD correlation was significantlyweakened after the permutation (fig 4E Pfrac14 0005 one-tailedpermutation test) By contrast no significant difference wasobserved after shuffling negative epistasis values (fig 4FPfrac14 0722 one-tailed permutation test) even though the lat-ter analysis shuffled more gene pairs (Nfrac14 863)

A B

C D

E F

G H

FIG 4 A negative EndashD correlation is observed in the empirical geneticinteraction network of the budding yeast S cerevisiae and positiveepistasis plays a more important role in its formation (A) A significantnegative EndashD correlation is observed in S cerevisiae Gene pairs areseparated into bins based on D with equal width of five genes Themean value of epistasis and the standard error of the mean (SEM)within each bin are shown Spearmanrsquos correlation coefficient q andcorresponding P values were calculated from the raw data (Nfrac141254)The gray dashed line shows the average epistasis among unlinkedgenes (Dgt100) (B) Distribution of EndashD correlation coefficients in1000 shuffled genomes The arrow indicates the observed correlationcoefficient in S cerevisiae (C and D) A significant negative EndashD cor-relation is observed among positively epistatic gene pairs (Nfrac14391)but not among negatively epistatic gene pairs (Nfrac14863) Spearmanrsquoscorrelation coefficient q and the corresponding P values are calcu-lated from the raw data The dashed line shows the average epistasisamong unlinked genes (E) The distribution of correlation coefficientsin 1000 artificial genomes in which values of positive epistasis areshuffled The arrow indicates the EndashD correlation coefficient in reality(F) Similar to (E) values of negative epistasis are shuffled (G and H)The proportion of gene pairs with significant positive epistasis is sig-nificantly correlated with D but that with significant negative epis-tasis is not SEMs are estimated based on binomial distribution Thedashed lines show the proportion of gene pairs with significant pos-itive or negative epistasis among unlinked genes

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3259Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

In studies by Costanzo et al epistasis was classified intothree categories significantly positive significantly negativeand nonsignificant (Costanzo et al 2010 2016) A strongnegative correlation was observed between gene distanceand the proportion of gene pairs with significant positiveepistasis (fig 4G qfrac14082 Pfrac14 55 105) whereas no sig-nificant correlation was observed between gene distance andthe proportion of gene pairs with significant negative epistasis(fig 4H qfrac14 043 Pfrac14 0088) All these observations empha-size the important role of positive epistasis in the origin of thenegative EndashD correlation

Epistasis-Driven Evolution of Gene Order after theWhole Genome Duplication in YeastThus far we have observed a negative EndashD correlation anddemonstrated that the correlation was mainly attributable topositively epistatic gene pairs The aforementioned theoriessimulations and empirical evidence led us to propose a hy-pothesis of epistasis-driven evolution of gene order in S cer-evisiae The whole genome duplication (WGD fig 5A) andthe subsequent extensive gene losses substantially changedthe epistatic relations among genes and markedly rewired thegenetic interaction network in yeast (Kellis et al 2004 Dixonet al 2008 Tischler et al 2008 VanderSluis et al 2010) Forinstance only 29 of the identified synthetic lethality is con-served between Schizosaccharomyces pombe and S cerevisiae(Dixon et al 2008) At the same time extensive chromosomerearrangement events occurred For example in areconstructed ancestral species before the WGD (Byrne andWolfe 2005) we identified the gene pairs that were located onthe same chromosome and surprisingly 846 of them arelocalized on different chromosomes in S cerevisiae Morestrikingly among gene pairs that are localized on the samechromosome in both the ancestral species and S cerevisiae996 differ in gene distance Based on these observations weproposed that the rewired genetic interaction network drovethe evolution of gene order resulting in numerous chromo-some rearrangement events (Kellis et al 2004) When genelosses ceased the rewiring of genetic interactions slowed andthe evolutionary force on gene distance also diminishedConsistently synteny relationships are strongly conserved inthe species of the Saccharomyces sensu stricto group (Kelliset al 2003)

Our model predicted that the negative EndashD correlationshould be weaker if the gene order in S cerevisiae has beenunchanged since the WGD The reason is that the gene orderin the ancestor was not subject to the natural selection im-posed by the genetic interaction network of the currentS cerevisiae genome Furthermore given the massive genelosses after the WGD the genetic interaction network cannotbe 100 conserved To test this prediction we calculated thegene distances in the reconstructed ancestral species men-tioned earlier (Byrne and Wolfe 2005) Indeed we found thatthe negative EndashD correlation disappeared when the genedistances in S cerevisiae were replaced by those in the ances-tral species (fig 5B qfrac14 69 103 Pfrac14 026 Nfrac14 26630) Thisobservation indicates that the negative EndashD correlation inS cerevisiae formed during the evolution of gene order after

the WGD Consistently we observed that positively epistaticgene pairs decreased their distances whereas negatively epi-static gene pairs increased their distances during evolution(fig 5C qfrac14017 Pfrac14 0061 Nfrac14 127)

To further test the role of positive epistasis in the evolutionof gene order we identified genes that were ancestrally linked(ie D 100 in the reconstructed ancestor) and examinedwhether they moved toward or away from each other duringevolution Consistent with our model genes with significantpositive epistasis were more likely to move toward each otherthan genes without significant epistasis (fig 5DPfrac14 15 103 two-tailed Fisherrsquos exact test) whereas thedifference between gene pairs with significant negative epis-tasis and those without significant epistasis was not signifi-cant (fig 5D Pfrac14 041 two-tailed Fisherrsquos exact test) Togetherthese observations support our hypothesis of epistasis-drivenevolution of gene order in yeast

Genetic Interaction Network Accurately PredictsGene Order in YeastFinally we determined whether the gene order in S cerevisiaecould be successfully predicted by the empirical data of ge-netic interaction networks (Costanzo et al 2010 2016) Tothis end we identified 22 all-connected three-node motifs in

A B

C D

FIG 5 The origin of the negative EndashD correlation after the WGD inyeast (A) Phylogenetic relationship among yeast species The blackarrow indicates the reconstructed ancestor and the gray arrow indi-cates the WGD event (B) Negative EndashD correlation is not observedwhen the gene order in S cerevisiae is replaced by that in the recon-structed ancestor Gene pairs are separated into bins based on D withequal width of five genes The mean and SEM of epistasis within eachbin are shown Spearmanrsquos correlation coefficient q and the corre-sponding P values were calculated from the raw data (Nfrac14 26630)The dashed line shows the average epistasis among unlinked genes(C) The change in D (S cerevisiaemdashthe reconstructed ancestor) isnegatively correlated with the epistasis in S cerevisiae (Nfrac14 127)(D) The D between a gene pair in S cerevisiae is compared withthat in the reconstructed ancestor Proportions of gene pairs movingtoward and away from each other among gene pairs with significantpositive epistasis (left) nonsignificant epistasis (middle) and signifi-cant negative epistasis (right) are shown

Yang et al doi101093molbevmsx264 MBE

3260Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

which all three genes are localized within a 100-gene range ona chromosome An example is shown in figure 6A in whichVMA22 positively interacts with SRB2 and negatively interactswith AIM17 and the epistatic interaction between SRB2 andAIM17 is weak All six possible gene orders are enumerated infigure 6B Among them the first gene order exhibits a perfectEndashD anticorrelation (qfrac141 fig 6C) which is exactly theprediction of our model In fact it is also the real order ofthese genes on chromosome VIII The second gene ordersuccessfully places positively epistatic genes close to eachother and negatively epistatic genes far from each otherTherefore it is generally consistent with our prediction(fig 6C) Either gene order was considered as being success-fully predicted by our model if it actually occurred in the yeastgenome

We examined the accuracy of our prediction at the geno-mic scale We first divided these 22 motifs into two groupsbased on the magnitude of repistasis Group L contains 11motifs with larger repistasis and Group S contains 11 motifswith smaller repistasis (fig 6D) We found that the gene ordersof seven (out of 11) motifs in Group L were precisely pre-dicted by our model and the accuracy was significantly higherthan the random expectation (176 fig 6E Pfrac14 0001 per-mutation test) By contrast the predictive accuracy in GroupS (2 out of 11) was not significantly different from the randomexpectation (fig 6E Pfrac14 0607 permutation test) because lowvariation among epistasis values in a motif reduces the selec-tive coefficients (figs 2E and 3E) More broadly the propor-tion of successful predictions (first and second gene orders infig 6B) is 100 in Group L significantly higher than the ran-dom expectation (333 fig 6F Plt 0001 permutation test)This high predictive power suggests that epistasis plays avital role in driving the evolution of gene order We fur-ther identified 243 and 1302 all-connected motifs withinthe range of 150 and 200 genes on the same chromosomerespectively and again confirmed the predictive power ofthe genetic interaction network (supplementary fig S8AndashF Supplementary Material online) Moreover the predic-tive power of epistasis on gene order is independent ofexpression similarity because the latter could not accu-rately predict gene order (supplementary fig S9AndashISupplementary Material online)

We then determined whether gene order could be pre-dicted when the genetic interaction network is incompleteTo this end we identified 92 star-like motifs in which all genesare on the same chromosome and at least one gene withD 40 to the hub gene For example SFH1 which encodes acomponent of a chromatin remodeling complex geneticallyinteracts with 90 genes on the same chromosome (fig 6Gshows 20 genes with Dlt 100 to SFH1) Because the negativeEndashD correlation is mainly contributed by positively epistaticgene pairs in theory (figs 1 and 2) our model predicted thatgenes having strong positive epistasis with SFH1 should belocated close to it on the chromosome Indeed these genes(VRP1 FKS1 RPL26A VPS38 DCR2) are located close to SFH1Specifically the gene (VRP1) having the strongest positiveepistatic interaction with SFH1 was the closest gene toSFH1 on the chromosome (fig 6H)

To examine the predictive accuracy at the genomic scalewe divided these 92 star-like motifs into two groups based onthe difference between the top two highest epistasis values inthe motif (Diffepistasis fig 6J) We found that the proportion ofsuccessful predictions was 326 in Group L and 217 inGroup S both of which were significantly higher than randomexpectation (fig 6K Plt 0001 for Group L and Pfrac14 0001 forGroup S permutation test) Furthermore despite the lowerpredictive power our model could still predict gene locationsin 707 star-like motifs in which the closest gene to the hubwas within Dfrac14 90 (supplementary fig S8G and HSupplementary Material online Pfrac14 0002 for Group L andPfrac14 0036 for Group S permutation test)

DiscussionIn our study we provided results from both computationalsimulation and empirical data analysis to support the role ofgenetic interaction networks in driving the evolution of geneorder We performed simulations with different sets ofparameters some of which were from empirical data orwithin the range of empirical data (Schacherer et al 2009Costanzo et al 2010) The negative EndashD correlation was ob-served with all parameter sets (supplementary figs S4ndashS6Supplementary Material online) consistent with both theo-retical predictions (fig 1A) and empirical results (fig 4)

The epistasis values were estimated mainly from null alleles(Costanzo et al 2010) thus it remains unknown whether ourconclusion applies to other deleterious alleles FortunatelyCostanzo et al also estimated 7786453 pairwise epistasisvalues for either decreased abundance by mRNA perturba-tion (DAmP) alleles or temperature-sensitive (ts) alleles withmutations typically changing coding sequences which offeredus the opportunity to address this question (Costanzo et al2010 2016) We found that the epistasis values of null alleleswere significantly correlated with those of other deleteriousalleles of the same gene (supplementary fig S10AndashLSupplementary Material online) suggesting that the negativeEndashD correlation is likely a general phenomenon for variousdeleterious alleles More importantly because DNA deletionevents that lead to null alleles are frequently observed in yeastnatural populations (Schacherer et al 2009) epistasis amongnull alleles by itself may be sufficiently strong to drive theevolution of gene distance

In principle if the frequencies of deleterious alleles in apopulation are significantly reduced by natural selectionthe advantage of genetic linkage could be small for this pop-ulation However numerous deleterious mutations (eg de-letion of a whole gene nonsense mutations missensemutations and mutations altering start codons) have beenobserved in natural populations (Liti et al 2009 Schachereret al 2009) suggesting that the frequencies of deleteriousalleles may not be low This phenomenon is probably dueto the antagonistic pleiotropic effects of a mutation in mul-tiple environments (Qian et al 2012)

It is worth noting that theoretical analysis predicts thatepistasis can drive the evolution of recombination frequencyrather than gene distance In addition to gene distance 1) the

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3261Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

lengths of intergenic regions and 2) recombination hotcold spots could also affect recombination frequencyNevertheless a strong correlation between gene distanceand physical distance that is the number of nucleotidesbetween two genes was observed (qfrac14 0998 Plt 10100Nfrac14 1624935) suggesting that the effect of (1) is negligi-ble Furthermore physical distance is highly correlatedwith recombination frequency when the physical distanceislt 180 kb (supplementary fig S1C SupplementaryMaterial online qfrac14 088 Plt 10100 Nfrac14 336227)

suggesting that the effect of (2) is also limited As a resultrecombination frequency and gene distance are highlycorrelated (supplementary fig S1B SupplementaryMaterial online qfrac14 087 Plt 10100 Nfrac14 337892 forgene pairs with D 100) Because the recombination fre-quency was only measured for 58 of gene pairs inS cerevisiae (Mancera et al 2008) and was unknown forthe reconstructed ancestral species we used gene dis-tance to approximate recombination frequency in thisstudy The optimization of gene distances among

A

HK

B C D

G

E

I

F

J

FIG 6 Gene orders can be predicted from the yeast genetic interaction network (A) A three-node all-connected motif in the yeast geneticinteraction network The standard deviation of epistasis values (repistasis) in this motif is 005 (B) Six possible gene orders are listed The first is theobserved gene order on chromosome VIII in yeast (C) The EndashD relationship for the six possible gene orders in (B) We consider the predictionsuccessful if the first two gene orders are observed because they exhibit negative EndashD correlations In particular the prediction is precise if the firstgene order is observed The dashed line represents Efrac14 0 (D) A histogram of the standard deviation of epistasis values (repistasis) based on whichgenes are divided into two groups with similar sizes Group S and Group L (E) The proportions of precise prediction of Group S Group L andrandom (R) expectations based on permutation Error bars represent standard deviations among 1000 permutations (F) The proportions ofsuccessful predictions similar to (E) (GndashI) A star-like motif in the yeast genetic interaction network and the gene order on chromosome XII inyeast The hub gene SFH1 and 19 genes with the distancelt100 to SFH1 are shown (J) A histogram of differences in the two largest epistasis valuesin the motif (Diffepistasis) based on which genes are divided into two groups with similar sizes Group S and Group L (K) Proportion of successfulpredictions of the closest gene in Group S and Group L Their respective random expectations based on permutation are shown in gray Error barsrepresent standard deviations among 1000 permutations

Yang et al doi101093molbevmsx264 MBE

3262Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

multiple gene pairs may often involve the rearrangementof genes on a chromosome

It is also worth noting that synthetic genetic array (SGA)the experimental strategy used to generate double mutants isbased on recombination between two null alleles (Tong et al2001 Costanzo et al 2010) Thus double mutants for linkedgenes may have a smaller initial frequency which may lead toinaccuracy in estimating fitness values for these mutants Todetermine whether such potential experimental bias has anyimpact on our results we calculated the partial EndashD correla-tion after controlling for the double mutant fitness valuesAgain we observed a strong negative EndashD correlation (partialqfrac14017 Pfrac14 13 109 Nfrac14 1254) This result was notunexpected because the fitness of double mutants andgene distance was only weakly correlated (qfrac14 0086Pfrac14 0002 Nfrac14 1254) More discussion on the caveats inthe analysis of empirical data is included in supplementarynote S2 Supplementary Material online

In a recently study 1800 genetic suppression interac-tions were identified in the budding yeast (van Leeuwen et al2016) which can be regarded as an extreme form of positivegenetic interactions Among them nine gene pairs are tightlylinked (D 2 supplementary table S3 SupplementaryMaterial online) This number is significantly greater thanthe random expectation (permutation test Pfrac14 001) againsupporting our model

The selection on clusters of locally adapted alleles whichsometimes were captured in chromosomal inversions hasbeen studied previously (Yeaman 2013 Kirkpatrick 2017) Inour study we proposed a hypothesis (the negative EndashD cor-relation) on deleterious alleles and tested this hypothesis withanalyses on empirical data and simulation both in the con-text of genetic interaction networks Therefore our resultsprovide additional clues for understanding the basic princi-ples of genome organization In particular the origin andmaintenance of clusters of genes in the same metabolic path-way which have puzzled scientists for years (Wong and Wolfe2005 Slot and Rokas 2010 Lang and Botstein 2011) could bewell explained by genetic interactions Genes in a linear met-abolic pathway exhibit positive epistasis because 1) doublemutants and single mutants have identical impacts ondestroying the function of the pathway (He et al 2010)and 2) the accumulation of deleterious intermediate prod-ucts resulting from a loss-of-function mutation in a down-stream gene of a metabolic pathway may be prevented by aloss-of-function mutation in an upstream gene (Wong andWolfe 2005 Slot and Rokas 2010 Lang and Botstein 2011)Because genetic linkage is advantageous among genes withpositive epistasis the clustering of genes in the same linearmetabolic pathway such as genes in the galactose utilizationpathway or allantoin degradation pathway is favored by nat-ural selection (Wong and Wolfe 2005 Slot and Rokas 2010)

Many factors have been reported to influence the evolu-tion of gene order such as tandem gene duplication positioneffects on gene expression noise (Batada and Hurst 2007Chen and Zhang 2016) coordinated gene expression amongneighboring genes (Cho et al 1998 Cohen et al 2000Boutanaev et al 2002 Spellman and Rubin 2002

Williams and Bowles 2004) clustering of functionally relatedgenes (Wong and Wolfe 2005 Slot and Rokas 2010) amongmany others In this study we provided evidence that inaddition to these factors the genetic interaction networkalso played an important role in driving the evolution ofgene order Because the empirical data of epistasis (fitnessvalues) are available in the budding yeast an evolutionarysimulation integrating many genetic interactions is possiblewhich makes epistasis unique among all factors driving theevolution of gene order Based on the empirical data of yeastgenetic interaction network our simulation indicates that theselective coefficient is on the order of 107 suggesting thatepistasis may play an important role in determining geneorder in yeast a species with a relative large effective popu-lation size (107) A negative EndashD correlation is expected to beobserved in species with a smaller effective population sizeonly if the range of epistasis is larger than that in yeast (figs 2and 3) Because the genome-wide empirical data of epistasisare unavailable in species with a smaller effective populationsize (eg humans and flies) it requires further investigationswhether the genetic interaction network plays a role in theevolution of gene order in these species in the future

Materials and Methods

GenomesThe genome annotation of S cerevisiae was downloaded fromthe Saccharomyces Genome Database (SGD httpwwwyeastgenomeorg version R64) The gene order of the recon-structed ancestor before the WGD (Gordon et al 2009) wasdownloaded from the Yeast Gene Order Browser (YGOB)(Byrne and Wolfe 2005)

SimulationTo examine the fitness effect of gene distance (D) two-locusdynamics under selection were simulated for populationswith different D (from 0 to 100 in 1-gene increments) aswell as a series of epistasis values (E from 0004 to 0004in 0001 increments) between two loci The recombinationfrequency (R) between these two loci was estimated from Dusing the equation below

R frac14 D 0004thorn 0064

which is the linear relationship estimated from the distancebetween two genes (D) and the empirical recombinationfrequencies (R) between them quantified in a previous study(Mancera et al 2008) The simulation was performed in hap-loid organisms in order to be in alignment with the analysesof empirical data in which epistasis values were quantified inhaploid yeast (Costanzo et al 2010 2016) A and B are wild-type alleles of two di-allelic loci on the same chromosome aand b are their deleterious alleles with fitness values ofxaBfrac14xAbfrac14 0992 The fitness of the wild-type (xAB) wasdefined as 1 and the epistasis E was defined as xabxaBxAbThe initial allele frequencies of a and b were both 01 The twoloci were initially under linkage equilibrium and therefore thefrequencies of AB(XAB) Ab(XAb) aB(XaB) and ab(Xab) were081 009 009 and 001 respectively After random mating

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3263Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

selection and recombination the frequencies of the fourgenotypes in the next generation were calculated with thefollowing equations (Nei 1967)

X0

AB frac14XABxAB

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

Ab frac14XAbxAb

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

aB frac14XaBxaB

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

ab frac14Xabxab

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

x is the average fitness of a population and was calculatedas follows

x frac14 XABxAB thorn Xabxab thorn XaBxaB thorn XAbxAb

For each population the average fitness (x) and thefrequencies of deleterious alleles (Xa and Xb) wererecorded in each generation of the simulation The differ-ence between the average fitness of two populations withdifferent D (Dfrac14 50 and Dfrac14 0) reflects whether linkage isfavored by natural selection or not The frequency differ-ence of the deleterious alleles between these two popu-lations is related to the long-term effects ofrecombination We also performed the simulation witha variety of parameter values (supplementary fig S4Supplementary Material online)

The average fitness of a population at the 100th generation(x100) was used to infer the fitness effect of gene distance at agiven epistasis value For each epistasis value we could iden-tify an optimal gene distance (Dopt) as well as a series ofpermitted gene distances (Dpermitted) The difference in x100

between Dpermitted and Dopt was lt107 The mean of allDpermitted values was calculated In addition our simulationdid not introduce new mutations to the population andtherefore the frequencies of deleterious alleles reduced overgenerations Nevertheless the negative EndashD correlation wasalso observed at the 50th and 200th generation (supplemen-tary fig S3 Supplementary Material online)

Toy MotifsWe built a toy motif in a genetic interaction network tosimulate the fitness effect of gene distance In the star-likemotif (fig 2A) a hub gene (A) interacted with nine partners(genes B C D E F G H I and J) with nine epistasis valuesranging from 0004 to 0004 in 0001 increments Weattempted to place nine partner genes on the same side ofthe hub gene on a chromosome Because we focused on theepistasis between the hub gene and the partners placingthe partners on different sides would not affect the resultsWe fixed the chromosomal location of the hub gene and thepartner genes were placed at Dfrac14 20 30 40 50 60 70 80 90and 100 from the hub gene The locations of the nine partnerswere shuffled while keeping the epistasis values unchanged

The fitness of a gene order was defined as the average x100 forall nine hub-containing gene pairs Thus we could calculatethe fitness for a total of (9frac14) 362880 possible gene ordersOther parameters (epistasis fitness defect D and initial allelefrequency) were also used to test whether the simulationresult was parameter sensitive (supplementary fig S5Supplementary Material online) Here epistasis values andfitness defects were randomly chosen from the empiricaldata in previous studies (Costanzo et al 2010 2016)

For the all-connected motif (fig 3A) five genes (A B C Dand E) interacted with each other with ten epistasis valuesranging from 00036 to 00036 in 00008 increments Thefive genes were placed on the same chromosome and thegene locations were set as 1 19 65 93 and 102 so that themaximum D in the motif was 100 The locations of these fivegenes were shuffled whereas the epistasis values were keptunchanged Similar to the star-like motifs the fitness of a geneorder was defined as the average x100 for all ten gene pairsThe fitness was calculated for a total of (5frac14) 120 possiblegene orders Other parameters (epistasis fitness defect D andinitial allele frequency) were also used to determine whetherthe simulation result was parameter sensitive (supplementaryfig S6 Supplementary Material online)

EpistasisEpistasis values were retrieved from the studies of Costanzoet al (Costanzo et al 2010 2016) Because the epistasis valuesof the overlapping gene pairs in these studies were highlycorrelated (rfrac14 071 Plt 10100 Pearsonrsquos correlationNfrac14 2604539) we merged their epistasis values A pair ofgenes was filtered if (a) the epistasis value between theirnull mutations was not a number (NaN) or (b) the epistasisvalues of the same null mutation pair had opposite signs inreciprocal crosses or in different studies We also removed thegene pairs with the null mutation in at least one gene result-ing in a higher fitness value than that of the wild-type Thisfiltering was performed because only a small number of an-tagonistic pleiotropic genes were detected in rich media(Qian et al 2012) Thus the elevated fitness observed upongene deletion in Costanzo et al was likely due to inaccurateestimation of fitness If epistasis between a pair of genes wasexamined multiple times we used the epistasis value with thesmallest P value Following Costanzo et al (2010) epistasisvalues with Plt 005 were classified as significant

In addition to the two data sets from Costanzo et al sev-eral other studies have also estimated epistasis values usinghigh-throughput strategies (Boone et al 2007) In principlewe could have included all of them in this study However forpractical reasons these data sets were not suitable for ourstudy First it would not be appropriate to combine otherdata sets with that of Costanzo et al because epistasis valuesfrom multiple studies potentially have different sources oferror especially if they followed different protocols Secondthe sample sizes of epistasis data of linked genes from otherstudies were not sufficiently large to examine the EndashD corre-lation Because the data set from Costanzo et al representsthe largest available so far containing 27 times more

Yang et al doi101093molbevmsx264 MBE

3264Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 5: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

epistatic genes generally had a larger reduction in fitness com-pared with those with changes exclusively to the negativelyepistatic genes (fig 2F)

Chromosomal Arrangement of Genes inAll-Connected Motifs of Genetic Interaction NetworksWe further examined the negative correlation between x100

and qED in all-connected motifs To this end we built a toy all-connected motif with five nodes and assigned epistasis valuesin the range of00036 to 00036 to edges (fig 3A) Again weobserved a strong negative correlation between x100 and qED

(fig 3B qfrac14087 Pfrac14 32 1038) The gene order with thehighest fitness had a strong negative EndashD correlation(qEDfrac14100 fig 3C) whereas the gene order with the lowestfitness had a positive EndashD correlation (qEDfrac14 039 fig 3D)Similarly we confirmed that the negative correlation betweenx100 and qED was insensitive to epistasis values fitnessdefects gene distances and initial allele frequencies (supple-mentary fig S6 and table S2 Supplementary Material online)Furthermore we observed that the fitness of a gene orderdecreased with the increase in its d (fig 3E) and this trend wasstronger when the range of epistasis was larger (fig 3E)

Negative EndashD Correlation in S cerevisiaeTo investigate whether the negative EndashD correlation is sup-ported by empirical evidence we retrieved the pairwise epis-tasis data generated by Costanzo et al who systematicallymeasured the vegetative growth rates of both single and dou-ble mutants in the budding yeast S cerevisiae and estimatedepistasis values for 26 million gene pairs (Costanzo et al2010 2016) As expected we observed a significant negativeEndashD correlation among linked genes (fig 4A qfrac14015Pfrac14 40 108 Nfrac14 1254) Consistent with this trend un-linked genes on the same chromosome exhibited lower epis-tasis values (fig 4A gray dashed line) As a control wepermutated the gene orders and recalculated the correlationcoefficients 1000 times We found that the negative EndashDcorrelation disappeared after permutation (fig 4BPlt 0001 permutation test)

These observations are potentially attributable to a num-ber of confounding factors The first is mutational bias suchas tandem duplication However duplicate genes tend tohave negative epistasis (Tischler et al 2006 Dean et al2008 DeLuna et al 2008 Musso et al 2008 Vavouri et al2008 Qian et al 2010) which should result in a positive EndashDcorrelation Nevertheless we controlled for this mutationalbias by randomly keeping only one gene in a gene family andstill observed the negative EndashD correlation (supplementaryfig S7A Supplementary Material online qfrac14017Pfrac14 18 105 Nfrac14 641)

Second genes with coordinated expression are clustered(Cho et al 1998 Cohen et al 2000 Boutanaev et al 2002Spellman and Rubin 2002 Williams and Bowles 2004) Ifcoordinately expressed genes tend to have positive epistasisthe negative EndashD correlation could result from these genesTo control for this effect we first inferred expression patternsimilarity for each pair of genes by calculating thecorrelation of gene expression levels in multiple conditions

(Qian and Zhang 2014) We did not observe a significantcorrelation between epistasis and expression similarity(qfrac140015 Pfrac14 06 Nfrac14 1202) Nevertheless we dividedthese gene pairs into two groups according to expressionsimilarity recalculated the EndashD correlation within each groupand still observed significant negative EndashD correlations (sup-plementary fig S7B and C Supplementary Material online)Similar results were obtained when we calculated the partialEndashD correlation after controlling for expression similarity(partial qfrac14014 Pfrac14 66 107 Nfrac14 1202) In additionwe also found that coordinated gene expression occurringthrough 3D chromatin interactions did not confound ourresults (supplementary fig S7D Supplementary Material on-line qfrac14017 Pfrac14 87 106 Nfrac14 704) which was not un-expected as 3D chromatin interactions do not influencerecombination frequency

Because functionally related genes are nonrandomlydistributed on chromosomes (Wong and Wolfe 2005

A

B

C

D

E

FIG 3 The negative EndashD correlation in all-connected motifs of ge-netic interaction networks (A) A toy model of an all-connected mo-tif (B) The fitness at the 100th generation (x100) is negativelycorrelated with qED (C) The gene order with the highest x100qEDfrac14100 (D) The gene order with the lowest x100 qEDfrac14039(E) The heat map of the relative x100 (normalized to the highestx100) is shown The relative x100 decreases with the increase of dThe reduction is more dramatic when the range of epistasis is larger

Yang et al doi101093molbevmsx264 MBE

3258Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Slot and Rokas 2010) and functional relationships betweengenes may lead to epistasis we next examined whetherfunctional relationships could confound the negativeEndashD correlation We observed similar negative EndashD corre-lations for gene pairs with an either high- or low semanticsimilarity of GO terms in molecular functions (supplemen-tary fig S7E and F Supplementary Material online) biolog-ical processes (supplementary fig S7G and HSupplementary Material online) and cellular components(supplementary fig S7I and J Supplementary Material on-line) Again we calculated partial correlations controllingfor semantic similarity of GO terms in molecular functions(partial qfrac14015 Pfrac14 73 107 Nfrac14 1151) biologicalprocesses (partial qfrac14015 Pfrac14 32 107 Nfrac14 1151)and cellular components (partial qfrac14015Pfrac14 32 107 Nfrac14 1151) All these results indicate thatthe functional relationship is not a confounding factor inthe negative EndashD correlation which is not unexpectedgiven that a large fraction of genetic interactions do notreflect functional relationships (He et al 2010 Costanzoet al 2016)

Finally we investigated the impact of gene expressionnoise on the negative EndashD correlation because it has beenproposed that essential genes were colocalized in open chro-matin regions to reduce gene expression noise (Batada andHurst 2007 Chen and Zhang 2016) To control for this effectwe first calculated the average gene expression noise(Newman et al 2006) for each gene pair We used the dis-tance of each coefficient of variation (CV) to a running me-dian of CV values (DM) to quantify gene expression noise(Newman et al 2006) in order to minimize the effect of geneexpression magnitude on gene expression noise We observeda negative EndashD correlation for gene pairs with an either high-or low average DM (supplementary fig S7K and LSupplementary Material online) Again we calculated thepartial correlation controlling for gene expression noise andstill observed a negative EndashD correlation (partial qfrac14016Pfrac14 97 103 Nfrac14 258)

Positive Epistasis Plays an Important Role in the Originof the Negative EndashD CorrelationOur model further predicted that the reduction of the dis-tance between positively epistatic genes should play a moreimportant role in the formation of the negative EndashD correla-tion (figs 1 and 2) Indeed a significant negative EndashD corre-lation was observed among positively epistatic gene pairs inS cerevisiae (fig 4C qfrac14014 Pfrac14 00064 Nfrac14 391) whereasno significant correlation was observed among negatively ep-istatic gene pairs (fig 4D qfrac140025 Pfrac14 047 Nfrac14 863) Wefurther verified the role of positive epistasis by shuffling epis-tasis values among all 391 positively epistatic gene pairs inS cerevisiae As expected the EndashD correlation was significantlyweakened after the permutation (fig 4E Pfrac14 0005 one-tailedpermutation test) By contrast no significant difference wasobserved after shuffling negative epistasis values (fig 4FPfrac14 0722 one-tailed permutation test) even though the lat-ter analysis shuffled more gene pairs (Nfrac14 863)

A B

C D

E F

G H

FIG 4 A negative EndashD correlation is observed in the empirical geneticinteraction network of the budding yeast S cerevisiae and positiveepistasis plays a more important role in its formation (A) A significantnegative EndashD correlation is observed in S cerevisiae Gene pairs areseparated into bins based on D with equal width of five genes Themean value of epistasis and the standard error of the mean (SEM)within each bin are shown Spearmanrsquos correlation coefficient q andcorresponding P values were calculated from the raw data (Nfrac141254)The gray dashed line shows the average epistasis among unlinkedgenes (Dgt100) (B) Distribution of EndashD correlation coefficients in1000 shuffled genomes The arrow indicates the observed correlationcoefficient in S cerevisiae (C and D) A significant negative EndashD cor-relation is observed among positively epistatic gene pairs (Nfrac14391)but not among negatively epistatic gene pairs (Nfrac14863) Spearmanrsquoscorrelation coefficient q and the corresponding P values are calcu-lated from the raw data The dashed line shows the average epistasisamong unlinked genes (E) The distribution of correlation coefficientsin 1000 artificial genomes in which values of positive epistasis areshuffled The arrow indicates the EndashD correlation coefficient in reality(F) Similar to (E) values of negative epistasis are shuffled (G and H)The proportion of gene pairs with significant positive epistasis is sig-nificantly correlated with D but that with significant negative epis-tasis is not SEMs are estimated based on binomial distribution Thedashed lines show the proportion of gene pairs with significant pos-itive or negative epistasis among unlinked genes

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3259Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

In studies by Costanzo et al epistasis was classified intothree categories significantly positive significantly negativeand nonsignificant (Costanzo et al 2010 2016) A strongnegative correlation was observed between gene distanceand the proportion of gene pairs with significant positiveepistasis (fig 4G qfrac14082 Pfrac14 55 105) whereas no sig-nificant correlation was observed between gene distance andthe proportion of gene pairs with significant negative epistasis(fig 4H qfrac14 043 Pfrac14 0088) All these observations empha-size the important role of positive epistasis in the origin of thenegative EndashD correlation

Epistasis-Driven Evolution of Gene Order after theWhole Genome Duplication in YeastThus far we have observed a negative EndashD correlation anddemonstrated that the correlation was mainly attributable topositively epistatic gene pairs The aforementioned theoriessimulations and empirical evidence led us to propose a hy-pothesis of epistasis-driven evolution of gene order in S cer-evisiae The whole genome duplication (WGD fig 5A) andthe subsequent extensive gene losses substantially changedthe epistatic relations among genes and markedly rewired thegenetic interaction network in yeast (Kellis et al 2004 Dixonet al 2008 Tischler et al 2008 VanderSluis et al 2010) Forinstance only 29 of the identified synthetic lethality is con-served between Schizosaccharomyces pombe and S cerevisiae(Dixon et al 2008) At the same time extensive chromosomerearrangement events occurred For example in areconstructed ancestral species before the WGD (Byrne andWolfe 2005) we identified the gene pairs that were located onthe same chromosome and surprisingly 846 of them arelocalized on different chromosomes in S cerevisiae Morestrikingly among gene pairs that are localized on the samechromosome in both the ancestral species and S cerevisiae996 differ in gene distance Based on these observations weproposed that the rewired genetic interaction network drovethe evolution of gene order resulting in numerous chromo-some rearrangement events (Kellis et al 2004) When genelosses ceased the rewiring of genetic interactions slowed andthe evolutionary force on gene distance also diminishedConsistently synteny relationships are strongly conserved inthe species of the Saccharomyces sensu stricto group (Kelliset al 2003)

Our model predicted that the negative EndashD correlationshould be weaker if the gene order in S cerevisiae has beenunchanged since the WGD The reason is that the gene orderin the ancestor was not subject to the natural selection im-posed by the genetic interaction network of the currentS cerevisiae genome Furthermore given the massive genelosses after the WGD the genetic interaction network cannotbe 100 conserved To test this prediction we calculated thegene distances in the reconstructed ancestral species men-tioned earlier (Byrne and Wolfe 2005) Indeed we found thatthe negative EndashD correlation disappeared when the genedistances in S cerevisiae were replaced by those in the ances-tral species (fig 5B qfrac14 69 103 Pfrac14 026 Nfrac14 26630) Thisobservation indicates that the negative EndashD correlation inS cerevisiae formed during the evolution of gene order after

the WGD Consistently we observed that positively epistaticgene pairs decreased their distances whereas negatively epi-static gene pairs increased their distances during evolution(fig 5C qfrac14017 Pfrac14 0061 Nfrac14 127)

To further test the role of positive epistasis in the evolutionof gene order we identified genes that were ancestrally linked(ie D 100 in the reconstructed ancestor) and examinedwhether they moved toward or away from each other duringevolution Consistent with our model genes with significantpositive epistasis were more likely to move toward each otherthan genes without significant epistasis (fig 5DPfrac14 15 103 two-tailed Fisherrsquos exact test) whereas thedifference between gene pairs with significant negative epis-tasis and those without significant epistasis was not signifi-cant (fig 5D Pfrac14 041 two-tailed Fisherrsquos exact test) Togetherthese observations support our hypothesis of epistasis-drivenevolution of gene order in yeast

Genetic Interaction Network Accurately PredictsGene Order in YeastFinally we determined whether the gene order in S cerevisiaecould be successfully predicted by the empirical data of ge-netic interaction networks (Costanzo et al 2010 2016) Tothis end we identified 22 all-connected three-node motifs in

A B

C D

FIG 5 The origin of the negative EndashD correlation after the WGD inyeast (A) Phylogenetic relationship among yeast species The blackarrow indicates the reconstructed ancestor and the gray arrow indi-cates the WGD event (B) Negative EndashD correlation is not observedwhen the gene order in S cerevisiae is replaced by that in the recon-structed ancestor Gene pairs are separated into bins based on D withequal width of five genes The mean and SEM of epistasis within eachbin are shown Spearmanrsquos correlation coefficient q and the corre-sponding P values were calculated from the raw data (Nfrac14 26630)The dashed line shows the average epistasis among unlinked genes(C) The change in D (S cerevisiaemdashthe reconstructed ancestor) isnegatively correlated with the epistasis in S cerevisiae (Nfrac14 127)(D) The D between a gene pair in S cerevisiae is compared withthat in the reconstructed ancestor Proportions of gene pairs movingtoward and away from each other among gene pairs with significantpositive epistasis (left) nonsignificant epistasis (middle) and signifi-cant negative epistasis (right) are shown

Yang et al doi101093molbevmsx264 MBE

3260Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

which all three genes are localized within a 100-gene range ona chromosome An example is shown in figure 6A in whichVMA22 positively interacts with SRB2 and negatively interactswith AIM17 and the epistatic interaction between SRB2 andAIM17 is weak All six possible gene orders are enumerated infigure 6B Among them the first gene order exhibits a perfectEndashD anticorrelation (qfrac141 fig 6C) which is exactly theprediction of our model In fact it is also the real order ofthese genes on chromosome VIII The second gene ordersuccessfully places positively epistatic genes close to eachother and negatively epistatic genes far from each otherTherefore it is generally consistent with our prediction(fig 6C) Either gene order was considered as being success-fully predicted by our model if it actually occurred in the yeastgenome

We examined the accuracy of our prediction at the geno-mic scale We first divided these 22 motifs into two groupsbased on the magnitude of repistasis Group L contains 11motifs with larger repistasis and Group S contains 11 motifswith smaller repistasis (fig 6D) We found that the gene ordersof seven (out of 11) motifs in Group L were precisely pre-dicted by our model and the accuracy was significantly higherthan the random expectation (176 fig 6E Pfrac14 0001 per-mutation test) By contrast the predictive accuracy in GroupS (2 out of 11) was not significantly different from the randomexpectation (fig 6E Pfrac14 0607 permutation test) because lowvariation among epistasis values in a motif reduces the selec-tive coefficients (figs 2E and 3E) More broadly the propor-tion of successful predictions (first and second gene orders infig 6B) is 100 in Group L significantly higher than the ran-dom expectation (333 fig 6F Plt 0001 permutation test)This high predictive power suggests that epistasis plays avital role in driving the evolution of gene order We fur-ther identified 243 and 1302 all-connected motifs withinthe range of 150 and 200 genes on the same chromosomerespectively and again confirmed the predictive power ofthe genetic interaction network (supplementary fig S8AndashF Supplementary Material online) Moreover the predic-tive power of epistasis on gene order is independent ofexpression similarity because the latter could not accu-rately predict gene order (supplementary fig S9AndashISupplementary Material online)

We then determined whether gene order could be pre-dicted when the genetic interaction network is incompleteTo this end we identified 92 star-like motifs in which all genesare on the same chromosome and at least one gene withD 40 to the hub gene For example SFH1 which encodes acomponent of a chromatin remodeling complex geneticallyinteracts with 90 genes on the same chromosome (fig 6Gshows 20 genes with Dlt 100 to SFH1) Because the negativeEndashD correlation is mainly contributed by positively epistaticgene pairs in theory (figs 1 and 2) our model predicted thatgenes having strong positive epistasis with SFH1 should belocated close to it on the chromosome Indeed these genes(VRP1 FKS1 RPL26A VPS38 DCR2) are located close to SFH1Specifically the gene (VRP1) having the strongest positiveepistatic interaction with SFH1 was the closest gene toSFH1 on the chromosome (fig 6H)

To examine the predictive accuracy at the genomic scalewe divided these 92 star-like motifs into two groups based onthe difference between the top two highest epistasis values inthe motif (Diffepistasis fig 6J) We found that the proportion ofsuccessful predictions was 326 in Group L and 217 inGroup S both of which were significantly higher than randomexpectation (fig 6K Plt 0001 for Group L and Pfrac14 0001 forGroup S permutation test) Furthermore despite the lowerpredictive power our model could still predict gene locationsin 707 star-like motifs in which the closest gene to the hubwas within Dfrac14 90 (supplementary fig S8G and HSupplementary Material online Pfrac14 0002 for Group L andPfrac14 0036 for Group S permutation test)

DiscussionIn our study we provided results from both computationalsimulation and empirical data analysis to support the role ofgenetic interaction networks in driving the evolution of geneorder We performed simulations with different sets ofparameters some of which were from empirical data orwithin the range of empirical data (Schacherer et al 2009Costanzo et al 2010) The negative EndashD correlation was ob-served with all parameter sets (supplementary figs S4ndashS6Supplementary Material online) consistent with both theo-retical predictions (fig 1A) and empirical results (fig 4)

The epistasis values were estimated mainly from null alleles(Costanzo et al 2010) thus it remains unknown whether ourconclusion applies to other deleterious alleles FortunatelyCostanzo et al also estimated 7786453 pairwise epistasisvalues for either decreased abundance by mRNA perturba-tion (DAmP) alleles or temperature-sensitive (ts) alleles withmutations typically changing coding sequences which offeredus the opportunity to address this question (Costanzo et al2010 2016) We found that the epistasis values of null alleleswere significantly correlated with those of other deleteriousalleles of the same gene (supplementary fig S10AndashLSupplementary Material online) suggesting that the negativeEndashD correlation is likely a general phenomenon for variousdeleterious alleles More importantly because DNA deletionevents that lead to null alleles are frequently observed in yeastnatural populations (Schacherer et al 2009) epistasis amongnull alleles by itself may be sufficiently strong to drive theevolution of gene distance

In principle if the frequencies of deleterious alleles in apopulation are significantly reduced by natural selectionthe advantage of genetic linkage could be small for this pop-ulation However numerous deleterious mutations (eg de-letion of a whole gene nonsense mutations missensemutations and mutations altering start codons) have beenobserved in natural populations (Liti et al 2009 Schachereret al 2009) suggesting that the frequencies of deleteriousalleles may not be low This phenomenon is probably dueto the antagonistic pleiotropic effects of a mutation in mul-tiple environments (Qian et al 2012)

It is worth noting that theoretical analysis predicts thatepistasis can drive the evolution of recombination frequencyrather than gene distance In addition to gene distance 1) the

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3261Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

lengths of intergenic regions and 2) recombination hotcold spots could also affect recombination frequencyNevertheless a strong correlation between gene distanceand physical distance that is the number of nucleotidesbetween two genes was observed (qfrac14 0998 Plt 10100Nfrac14 1624935) suggesting that the effect of (1) is negligi-ble Furthermore physical distance is highly correlatedwith recombination frequency when the physical distanceislt 180 kb (supplementary fig S1C SupplementaryMaterial online qfrac14 088 Plt 10100 Nfrac14 336227)

suggesting that the effect of (2) is also limited As a resultrecombination frequency and gene distance are highlycorrelated (supplementary fig S1B SupplementaryMaterial online qfrac14 087 Plt 10100 Nfrac14 337892 forgene pairs with D 100) Because the recombination fre-quency was only measured for 58 of gene pairs inS cerevisiae (Mancera et al 2008) and was unknown forthe reconstructed ancestral species we used gene dis-tance to approximate recombination frequency in thisstudy The optimization of gene distances among

A

HK

B C D

G

E

I

F

J

FIG 6 Gene orders can be predicted from the yeast genetic interaction network (A) A three-node all-connected motif in the yeast geneticinteraction network The standard deviation of epistasis values (repistasis) in this motif is 005 (B) Six possible gene orders are listed The first is theobserved gene order on chromosome VIII in yeast (C) The EndashD relationship for the six possible gene orders in (B) We consider the predictionsuccessful if the first two gene orders are observed because they exhibit negative EndashD correlations In particular the prediction is precise if the firstgene order is observed The dashed line represents Efrac14 0 (D) A histogram of the standard deviation of epistasis values (repistasis) based on whichgenes are divided into two groups with similar sizes Group S and Group L (E) The proportions of precise prediction of Group S Group L andrandom (R) expectations based on permutation Error bars represent standard deviations among 1000 permutations (F) The proportions ofsuccessful predictions similar to (E) (GndashI) A star-like motif in the yeast genetic interaction network and the gene order on chromosome XII inyeast The hub gene SFH1 and 19 genes with the distancelt100 to SFH1 are shown (J) A histogram of differences in the two largest epistasis valuesin the motif (Diffepistasis) based on which genes are divided into two groups with similar sizes Group S and Group L (K) Proportion of successfulpredictions of the closest gene in Group S and Group L Their respective random expectations based on permutation are shown in gray Error barsrepresent standard deviations among 1000 permutations

Yang et al doi101093molbevmsx264 MBE

3262Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

multiple gene pairs may often involve the rearrangementof genes on a chromosome

It is also worth noting that synthetic genetic array (SGA)the experimental strategy used to generate double mutants isbased on recombination between two null alleles (Tong et al2001 Costanzo et al 2010) Thus double mutants for linkedgenes may have a smaller initial frequency which may lead toinaccuracy in estimating fitness values for these mutants Todetermine whether such potential experimental bias has anyimpact on our results we calculated the partial EndashD correla-tion after controlling for the double mutant fitness valuesAgain we observed a strong negative EndashD correlation (partialqfrac14017 Pfrac14 13 109 Nfrac14 1254) This result was notunexpected because the fitness of double mutants andgene distance was only weakly correlated (qfrac14 0086Pfrac14 0002 Nfrac14 1254) More discussion on the caveats inthe analysis of empirical data is included in supplementarynote S2 Supplementary Material online

In a recently study 1800 genetic suppression interac-tions were identified in the budding yeast (van Leeuwen et al2016) which can be regarded as an extreme form of positivegenetic interactions Among them nine gene pairs are tightlylinked (D 2 supplementary table S3 SupplementaryMaterial online) This number is significantly greater thanthe random expectation (permutation test Pfrac14 001) againsupporting our model

The selection on clusters of locally adapted alleles whichsometimes were captured in chromosomal inversions hasbeen studied previously (Yeaman 2013 Kirkpatrick 2017) Inour study we proposed a hypothesis (the negative EndashD cor-relation) on deleterious alleles and tested this hypothesis withanalyses on empirical data and simulation both in the con-text of genetic interaction networks Therefore our resultsprovide additional clues for understanding the basic princi-ples of genome organization In particular the origin andmaintenance of clusters of genes in the same metabolic path-way which have puzzled scientists for years (Wong and Wolfe2005 Slot and Rokas 2010 Lang and Botstein 2011) could bewell explained by genetic interactions Genes in a linear met-abolic pathway exhibit positive epistasis because 1) doublemutants and single mutants have identical impacts ondestroying the function of the pathway (He et al 2010)and 2) the accumulation of deleterious intermediate prod-ucts resulting from a loss-of-function mutation in a down-stream gene of a metabolic pathway may be prevented by aloss-of-function mutation in an upstream gene (Wong andWolfe 2005 Slot and Rokas 2010 Lang and Botstein 2011)Because genetic linkage is advantageous among genes withpositive epistasis the clustering of genes in the same linearmetabolic pathway such as genes in the galactose utilizationpathway or allantoin degradation pathway is favored by nat-ural selection (Wong and Wolfe 2005 Slot and Rokas 2010)

Many factors have been reported to influence the evolu-tion of gene order such as tandem gene duplication positioneffects on gene expression noise (Batada and Hurst 2007Chen and Zhang 2016) coordinated gene expression amongneighboring genes (Cho et al 1998 Cohen et al 2000Boutanaev et al 2002 Spellman and Rubin 2002

Williams and Bowles 2004) clustering of functionally relatedgenes (Wong and Wolfe 2005 Slot and Rokas 2010) amongmany others In this study we provided evidence that inaddition to these factors the genetic interaction networkalso played an important role in driving the evolution ofgene order Because the empirical data of epistasis (fitnessvalues) are available in the budding yeast an evolutionarysimulation integrating many genetic interactions is possiblewhich makes epistasis unique among all factors driving theevolution of gene order Based on the empirical data of yeastgenetic interaction network our simulation indicates that theselective coefficient is on the order of 107 suggesting thatepistasis may play an important role in determining geneorder in yeast a species with a relative large effective popu-lation size (107) A negative EndashD correlation is expected to beobserved in species with a smaller effective population sizeonly if the range of epistasis is larger than that in yeast (figs 2and 3) Because the genome-wide empirical data of epistasisare unavailable in species with a smaller effective populationsize (eg humans and flies) it requires further investigationswhether the genetic interaction network plays a role in theevolution of gene order in these species in the future

Materials and Methods

GenomesThe genome annotation of S cerevisiae was downloaded fromthe Saccharomyces Genome Database (SGD httpwwwyeastgenomeorg version R64) The gene order of the recon-structed ancestor before the WGD (Gordon et al 2009) wasdownloaded from the Yeast Gene Order Browser (YGOB)(Byrne and Wolfe 2005)

SimulationTo examine the fitness effect of gene distance (D) two-locusdynamics under selection were simulated for populationswith different D (from 0 to 100 in 1-gene increments) aswell as a series of epistasis values (E from 0004 to 0004in 0001 increments) between two loci The recombinationfrequency (R) between these two loci was estimated from Dusing the equation below

R frac14 D 0004thorn 0064

which is the linear relationship estimated from the distancebetween two genes (D) and the empirical recombinationfrequencies (R) between them quantified in a previous study(Mancera et al 2008) The simulation was performed in hap-loid organisms in order to be in alignment with the analysesof empirical data in which epistasis values were quantified inhaploid yeast (Costanzo et al 2010 2016) A and B are wild-type alleles of two di-allelic loci on the same chromosome aand b are their deleterious alleles with fitness values ofxaBfrac14xAbfrac14 0992 The fitness of the wild-type (xAB) wasdefined as 1 and the epistasis E was defined as xabxaBxAbThe initial allele frequencies of a and b were both 01 The twoloci were initially under linkage equilibrium and therefore thefrequencies of AB(XAB) Ab(XAb) aB(XaB) and ab(Xab) were081 009 009 and 001 respectively After random mating

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3263Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

selection and recombination the frequencies of the fourgenotypes in the next generation were calculated with thefollowing equations (Nei 1967)

X0

AB frac14XABxAB

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

Ab frac14XAbxAb

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

aB frac14XaBxaB

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

ab frac14Xabxab

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

x is the average fitness of a population and was calculatedas follows

x frac14 XABxAB thorn Xabxab thorn XaBxaB thorn XAbxAb

For each population the average fitness (x) and thefrequencies of deleterious alleles (Xa and Xb) wererecorded in each generation of the simulation The differ-ence between the average fitness of two populations withdifferent D (Dfrac14 50 and Dfrac14 0) reflects whether linkage isfavored by natural selection or not The frequency differ-ence of the deleterious alleles between these two popu-lations is related to the long-term effects ofrecombination We also performed the simulation witha variety of parameter values (supplementary fig S4Supplementary Material online)

The average fitness of a population at the 100th generation(x100) was used to infer the fitness effect of gene distance at agiven epistasis value For each epistasis value we could iden-tify an optimal gene distance (Dopt) as well as a series ofpermitted gene distances (Dpermitted) The difference in x100

between Dpermitted and Dopt was lt107 The mean of allDpermitted values was calculated In addition our simulationdid not introduce new mutations to the population andtherefore the frequencies of deleterious alleles reduced overgenerations Nevertheless the negative EndashD correlation wasalso observed at the 50th and 200th generation (supplemen-tary fig S3 Supplementary Material online)

Toy MotifsWe built a toy motif in a genetic interaction network tosimulate the fitness effect of gene distance In the star-likemotif (fig 2A) a hub gene (A) interacted with nine partners(genes B C D E F G H I and J) with nine epistasis valuesranging from 0004 to 0004 in 0001 increments Weattempted to place nine partner genes on the same side ofthe hub gene on a chromosome Because we focused on theepistasis between the hub gene and the partners placingthe partners on different sides would not affect the resultsWe fixed the chromosomal location of the hub gene and thepartner genes were placed at Dfrac14 20 30 40 50 60 70 80 90and 100 from the hub gene The locations of the nine partnerswere shuffled while keeping the epistasis values unchanged

The fitness of a gene order was defined as the average x100 forall nine hub-containing gene pairs Thus we could calculatethe fitness for a total of (9frac14) 362880 possible gene ordersOther parameters (epistasis fitness defect D and initial allelefrequency) were also used to test whether the simulationresult was parameter sensitive (supplementary fig S5Supplementary Material online) Here epistasis values andfitness defects were randomly chosen from the empiricaldata in previous studies (Costanzo et al 2010 2016)

For the all-connected motif (fig 3A) five genes (A B C Dand E) interacted with each other with ten epistasis valuesranging from 00036 to 00036 in 00008 increments Thefive genes were placed on the same chromosome and thegene locations were set as 1 19 65 93 and 102 so that themaximum D in the motif was 100 The locations of these fivegenes were shuffled whereas the epistasis values were keptunchanged Similar to the star-like motifs the fitness of a geneorder was defined as the average x100 for all ten gene pairsThe fitness was calculated for a total of (5frac14) 120 possiblegene orders Other parameters (epistasis fitness defect D andinitial allele frequency) were also used to determine whetherthe simulation result was parameter sensitive (supplementaryfig S6 Supplementary Material online)

EpistasisEpistasis values were retrieved from the studies of Costanzoet al (Costanzo et al 2010 2016) Because the epistasis valuesof the overlapping gene pairs in these studies were highlycorrelated (rfrac14 071 Plt 10100 Pearsonrsquos correlationNfrac14 2604539) we merged their epistasis values A pair ofgenes was filtered if (a) the epistasis value between theirnull mutations was not a number (NaN) or (b) the epistasisvalues of the same null mutation pair had opposite signs inreciprocal crosses or in different studies We also removed thegene pairs with the null mutation in at least one gene result-ing in a higher fitness value than that of the wild-type Thisfiltering was performed because only a small number of an-tagonistic pleiotropic genes were detected in rich media(Qian et al 2012) Thus the elevated fitness observed upongene deletion in Costanzo et al was likely due to inaccurateestimation of fitness If epistasis between a pair of genes wasexamined multiple times we used the epistasis value with thesmallest P value Following Costanzo et al (2010) epistasisvalues with Plt 005 were classified as significant

In addition to the two data sets from Costanzo et al sev-eral other studies have also estimated epistasis values usinghigh-throughput strategies (Boone et al 2007) In principlewe could have included all of them in this study However forpractical reasons these data sets were not suitable for ourstudy First it would not be appropriate to combine otherdata sets with that of Costanzo et al because epistasis valuesfrom multiple studies potentially have different sources oferror especially if they followed different protocols Secondthe sample sizes of epistasis data of linked genes from otherstudies were not sufficiently large to examine the EndashD corre-lation Because the data set from Costanzo et al representsthe largest available so far containing 27 times more

Yang et al doi101093molbevmsx264 MBE

3264Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 6: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

Slot and Rokas 2010) and functional relationships betweengenes may lead to epistasis we next examined whetherfunctional relationships could confound the negativeEndashD correlation We observed similar negative EndashD corre-lations for gene pairs with an either high- or low semanticsimilarity of GO terms in molecular functions (supplemen-tary fig S7E and F Supplementary Material online) biolog-ical processes (supplementary fig S7G and HSupplementary Material online) and cellular components(supplementary fig S7I and J Supplementary Material on-line) Again we calculated partial correlations controllingfor semantic similarity of GO terms in molecular functions(partial qfrac14015 Pfrac14 73 107 Nfrac14 1151) biologicalprocesses (partial qfrac14015 Pfrac14 32 107 Nfrac14 1151)and cellular components (partial qfrac14015Pfrac14 32 107 Nfrac14 1151) All these results indicate thatthe functional relationship is not a confounding factor inthe negative EndashD correlation which is not unexpectedgiven that a large fraction of genetic interactions do notreflect functional relationships (He et al 2010 Costanzoet al 2016)

Finally we investigated the impact of gene expressionnoise on the negative EndashD correlation because it has beenproposed that essential genes were colocalized in open chro-matin regions to reduce gene expression noise (Batada andHurst 2007 Chen and Zhang 2016) To control for this effectwe first calculated the average gene expression noise(Newman et al 2006) for each gene pair We used the dis-tance of each coefficient of variation (CV) to a running me-dian of CV values (DM) to quantify gene expression noise(Newman et al 2006) in order to minimize the effect of geneexpression magnitude on gene expression noise We observeda negative EndashD correlation for gene pairs with an either high-or low average DM (supplementary fig S7K and LSupplementary Material online) Again we calculated thepartial correlation controlling for gene expression noise andstill observed a negative EndashD correlation (partial qfrac14016Pfrac14 97 103 Nfrac14 258)

Positive Epistasis Plays an Important Role in the Originof the Negative EndashD CorrelationOur model further predicted that the reduction of the dis-tance between positively epistatic genes should play a moreimportant role in the formation of the negative EndashD correla-tion (figs 1 and 2) Indeed a significant negative EndashD corre-lation was observed among positively epistatic gene pairs inS cerevisiae (fig 4C qfrac14014 Pfrac14 00064 Nfrac14 391) whereasno significant correlation was observed among negatively ep-istatic gene pairs (fig 4D qfrac140025 Pfrac14 047 Nfrac14 863) Wefurther verified the role of positive epistasis by shuffling epis-tasis values among all 391 positively epistatic gene pairs inS cerevisiae As expected the EndashD correlation was significantlyweakened after the permutation (fig 4E Pfrac14 0005 one-tailedpermutation test) By contrast no significant difference wasobserved after shuffling negative epistasis values (fig 4FPfrac14 0722 one-tailed permutation test) even though the lat-ter analysis shuffled more gene pairs (Nfrac14 863)

A B

C D

E F

G H

FIG 4 A negative EndashD correlation is observed in the empirical geneticinteraction network of the budding yeast S cerevisiae and positiveepistasis plays a more important role in its formation (A) A significantnegative EndashD correlation is observed in S cerevisiae Gene pairs areseparated into bins based on D with equal width of five genes Themean value of epistasis and the standard error of the mean (SEM)within each bin are shown Spearmanrsquos correlation coefficient q andcorresponding P values were calculated from the raw data (Nfrac141254)The gray dashed line shows the average epistasis among unlinkedgenes (Dgt100) (B) Distribution of EndashD correlation coefficients in1000 shuffled genomes The arrow indicates the observed correlationcoefficient in S cerevisiae (C and D) A significant negative EndashD cor-relation is observed among positively epistatic gene pairs (Nfrac14391)but not among negatively epistatic gene pairs (Nfrac14863) Spearmanrsquoscorrelation coefficient q and the corresponding P values are calcu-lated from the raw data The dashed line shows the average epistasisamong unlinked genes (E) The distribution of correlation coefficientsin 1000 artificial genomes in which values of positive epistasis areshuffled The arrow indicates the EndashD correlation coefficient in reality(F) Similar to (E) values of negative epistasis are shuffled (G and H)The proportion of gene pairs with significant positive epistasis is sig-nificantly correlated with D but that with significant negative epis-tasis is not SEMs are estimated based on binomial distribution Thedashed lines show the proportion of gene pairs with significant pos-itive or negative epistasis among unlinked genes

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3259Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

In studies by Costanzo et al epistasis was classified intothree categories significantly positive significantly negativeand nonsignificant (Costanzo et al 2010 2016) A strongnegative correlation was observed between gene distanceand the proportion of gene pairs with significant positiveepistasis (fig 4G qfrac14082 Pfrac14 55 105) whereas no sig-nificant correlation was observed between gene distance andthe proportion of gene pairs with significant negative epistasis(fig 4H qfrac14 043 Pfrac14 0088) All these observations empha-size the important role of positive epistasis in the origin of thenegative EndashD correlation

Epistasis-Driven Evolution of Gene Order after theWhole Genome Duplication in YeastThus far we have observed a negative EndashD correlation anddemonstrated that the correlation was mainly attributable topositively epistatic gene pairs The aforementioned theoriessimulations and empirical evidence led us to propose a hy-pothesis of epistasis-driven evolution of gene order in S cer-evisiae The whole genome duplication (WGD fig 5A) andthe subsequent extensive gene losses substantially changedthe epistatic relations among genes and markedly rewired thegenetic interaction network in yeast (Kellis et al 2004 Dixonet al 2008 Tischler et al 2008 VanderSluis et al 2010) Forinstance only 29 of the identified synthetic lethality is con-served between Schizosaccharomyces pombe and S cerevisiae(Dixon et al 2008) At the same time extensive chromosomerearrangement events occurred For example in areconstructed ancestral species before the WGD (Byrne andWolfe 2005) we identified the gene pairs that were located onthe same chromosome and surprisingly 846 of them arelocalized on different chromosomes in S cerevisiae Morestrikingly among gene pairs that are localized on the samechromosome in both the ancestral species and S cerevisiae996 differ in gene distance Based on these observations weproposed that the rewired genetic interaction network drovethe evolution of gene order resulting in numerous chromo-some rearrangement events (Kellis et al 2004) When genelosses ceased the rewiring of genetic interactions slowed andthe evolutionary force on gene distance also diminishedConsistently synteny relationships are strongly conserved inthe species of the Saccharomyces sensu stricto group (Kelliset al 2003)

Our model predicted that the negative EndashD correlationshould be weaker if the gene order in S cerevisiae has beenunchanged since the WGD The reason is that the gene orderin the ancestor was not subject to the natural selection im-posed by the genetic interaction network of the currentS cerevisiae genome Furthermore given the massive genelosses after the WGD the genetic interaction network cannotbe 100 conserved To test this prediction we calculated thegene distances in the reconstructed ancestral species men-tioned earlier (Byrne and Wolfe 2005) Indeed we found thatthe negative EndashD correlation disappeared when the genedistances in S cerevisiae were replaced by those in the ances-tral species (fig 5B qfrac14 69 103 Pfrac14 026 Nfrac14 26630) Thisobservation indicates that the negative EndashD correlation inS cerevisiae formed during the evolution of gene order after

the WGD Consistently we observed that positively epistaticgene pairs decreased their distances whereas negatively epi-static gene pairs increased their distances during evolution(fig 5C qfrac14017 Pfrac14 0061 Nfrac14 127)

To further test the role of positive epistasis in the evolutionof gene order we identified genes that were ancestrally linked(ie D 100 in the reconstructed ancestor) and examinedwhether they moved toward or away from each other duringevolution Consistent with our model genes with significantpositive epistasis were more likely to move toward each otherthan genes without significant epistasis (fig 5DPfrac14 15 103 two-tailed Fisherrsquos exact test) whereas thedifference between gene pairs with significant negative epis-tasis and those without significant epistasis was not signifi-cant (fig 5D Pfrac14 041 two-tailed Fisherrsquos exact test) Togetherthese observations support our hypothesis of epistasis-drivenevolution of gene order in yeast

Genetic Interaction Network Accurately PredictsGene Order in YeastFinally we determined whether the gene order in S cerevisiaecould be successfully predicted by the empirical data of ge-netic interaction networks (Costanzo et al 2010 2016) Tothis end we identified 22 all-connected three-node motifs in

A B

C D

FIG 5 The origin of the negative EndashD correlation after the WGD inyeast (A) Phylogenetic relationship among yeast species The blackarrow indicates the reconstructed ancestor and the gray arrow indi-cates the WGD event (B) Negative EndashD correlation is not observedwhen the gene order in S cerevisiae is replaced by that in the recon-structed ancestor Gene pairs are separated into bins based on D withequal width of five genes The mean and SEM of epistasis within eachbin are shown Spearmanrsquos correlation coefficient q and the corre-sponding P values were calculated from the raw data (Nfrac14 26630)The dashed line shows the average epistasis among unlinked genes(C) The change in D (S cerevisiaemdashthe reconstructed ancestor) isnegatively correlated with the epistasis in S cerevisiae (Nfrac14 127)(D) The D between a gene pair in S cerevisiae is compared withthat in the reconstructed ancestor Proportions of gene pairs movingtoward and away from each other among gene pairs with significantpositive epistasis (left) nonsignificant epistasis (middle) and signifi-cant negative epistasis (right) are shown

Yang et al doi101093molbevmsx264 MBE

3260Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

which all three genes are localized within a 100-gene range ona chromosome An example is shown in figure 6A in whichVMA22 positively interacts with SRB2 and negatively interactswith AIM17 and the epistatic interaction between SRB2 andAIM17 is weak All six possible gene orders are enumerated infigure 6B Among them the first gene order exhibits a perfectEndashD anticorrelation (qfrac141 fig 6C) which is exactly theprediction of our model In fact it is also the real order ofthese genes on chromosome VIII The second gene ordersuccessfully places positively epistatic genes close to eachother and negatively epistatic genes far from each otherTherefore it is generally consistent with our prediction(fig 6C) Either gene order was considered as being success-fully predicted by our model if it actually occurred in the yeastgenome

We examined the accuracy of our prediction at the geno-mic scale We first divided these 22 motifs into two groupsbased on the magnitude of repistasis Group L contains 11motifs with larger repistasis and Group S contains 11 motifswith smaller repistasis (fig 6D) We found that the gene ordersof seven (out of 11) motifs in Group L were precisely pre-dicted by our model and the accuracy was significantly higherthan the random expectation (176 fig 6E Pfrac14 0001 per-mutation test) By contrast the predictive accuracy in GroupS (2 out of 11) was not significantly different from the randomexpectation (fig 6E Pfrac14 0607 permutation test) because lowvariation among epistasis values in a motif reduces the selec-tive coefficients (figs 2E and 3E) More broadly the propor-tion of successful predictions (first and second gene orders infig 6B) is 100 in Group L significantly higher than the ran-dom expectation (333 fig 6F Plt 0001 permutation test)This high predictive power suggests that epistasis plays avital role in driving the evolution of gene order We fur-ther identified 243 and 1302 all-connected motifs withinthe range of 150 and 200 genes on the same chromosomerespectively and again confirmed the predictive power ofthe genetic interaction network (supplementary fig S8AndashF Supplementary Material online) Moreover the predic-tive power of epistasis on gene order is independent ofexpression similarity because the latter could not accu-rately predict gene order (supplementary fig S9AndashISupplementary Material online)

We then determined whether gene order could be pre-dicted when the genetic interaction network is incompleteTo this end we identified 92 star-like motifs in which all genesare on the same chromosome and at least one gene withD 40 to the hub gene For example SFH1 which encodes acomponent of a chromatin remodeling complex geneticallyinteracts with 90 genes on the same chromosome (fig 6Gshows 20 genes with Dlt 100 to SFH1) Because the negativeEndashD correlation is mainly contributed by positively epistaticgene pairs in theory (figs 1 and 2) our model predicted thatgenes having strong positive epistasis with SFH1 should belocated close to it on the chromosome Indeed these genes(VRP1 FKS1 RPL26A VPS38 DCR2) are located close to SFH1Specifically the gene (VRP1) having the strongest positiveepistatic interaction with SFH1 was the closest gene toSFH1 on the chromosome (fig 6H)

To examine the predictive accuracy at the genomic scalewe divided these 92 star-like motifs into two groups based onthe difference between the top two highest epistasis values inthe motif (Diffepistasis fig 6J) We found that the proportion ofsuccessful predictions was 326 in Group L and 217 inGroup S both of which were significantly higher than randomexpectation (fig 6K Plt 0001 for Group L and Pfrac14 0001 forGroup S permutation test) Furthermore despite the lowerpredictive power our model could still predict gene locationsin 707 star-like motifs in which the closest gene to the hubwas within Dfrac14 90 (supplementary fig S8G and HSupplementary Material online Pfrac14 0002 for Group L andPfrac14 0036 for Group S permutation test)

DiscussionIn our study we provided results from both computationalsimulation and empirical data analysis to support the role ofgenetic interaction networks in driving the evolution of geneorder We performed simulations with different sets ofparameters some of which were from empirical data orwithin the range of empirical data (Schacherer et al 2009Costanzo et al 2010) The negative EndashD correlation was ob-served with all parameter sets (supplementary figs S4ndashS6Supplementary Material online) consistent with both theo-retical predictions (fig 1A) and empirical results (fig 4)

The epistasis values were estimated mainly from null alleles(Costanzo et al 2010) thus it remains unknown whether ourconclusion applies to other deleterious alleles FortunatelyCostanzo et al also estimated 7786453 pairwise epistasisvalues for either decreased abundance by mRNA perturba-tion (DAmP) alleles or temperature-sensitive (ts) alleles withmutations typically changing coding sequences which offeredus the opportunity to address this question (Costanzo et al2010 2016) We found that the epistasis values of null alleleswere significantly correlated with those of other deleteriousalleles of the same gene (supplementary fig S10AndashLSupplementary Material online) suggesting that the negativeEndashD correlation is likely a general phenomenon for variousdeleterious alleles More importantly because DNA deletionevents that lead to null alleles are frequently observed in yeastnatural populations (Schacherer et al 2009) epistasis amongnull alleles by itself may be sufficiently strong to drive theevolution of gene distance

In principle if the frequencies of deleterious alleles in apopulation are significantly reduced by natural selectionthe advantage of genetic linkage could be small for this pop-ulation However numerous deleterious mutations (eg de-letion of a whole gene nonsense mutations missensemutations and mutations altering start codons) have beenobserved in natural populations (Liti et al 2009 Schachereret al 2009) suggesting that the frequencies of deleteriousalleles may not be low This phenomenon is probably dueto the antagonistic pleiotropic effects of a mutation in mul-tiple environments (Qian et al 2012)

It is worth noting that theoretical analysis predicts thatepistasis can drive the evolution of recombination frequencyrather than gene distance In addition to gene distance 1) the

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3261Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

lengths of intergenic regions and 2) recombination hotcold spots could also affect recombination frequencyNevertheless a strong correlation between gene distanceand physical distance that is the number of nucleotidesbetween two genes was observed (qfrac14 0998 Plt 10100Nfrac14 1624935) suggesting that the effect of (1) is negligi-ble Furthermore physical distance is highly correlatedwith recombination frequency when the physical distanceislt 180 kb (supplementary fig S1C SupplementaryMaterial online qfrac14 088 Plt 10100 Nfrac14 336227)

suggesting that the effect of (2) is also limited As a resultrecombination frequency and gene distance are highlycorrelated (supplementary fig S1B SupplementaryMaterial online qfrac14 087 Plt 10100 Nfrac14 337892 forgene pairs with D 100) Because the recombination fre-quency was only measured for 58 of gene pairs inS cerevisiae (Mancera et al 2008) and was unknown forthe reconstructed ancestral species we used gene dis-tance to approximate recombination frequency in thisstudy The optimization of gene distances among

A

HK

B C D

G

E

I

F

J

FIG 6 Gene orders can be predicted from the yeast genetic interaction network (A) A three-node all-connected motif in the yeast geneticinteraction network The standard deviation of epistasis values (repistasis) in this motif is 005 (B) Six possible gene orders are listed The first is theobserved gene order on chromosome VIII in yeast (C) The EndashD relationship for the six possible gene orders in (B) We consider the predictionsuccessful if the first two gene orders are observed because they exhibit negative EndashD correlations In particular the prediction is precise if the firstgene order is observed The dashed line represents Efrac14 0 (D) A histogram of the standard deviation of epistasis values (repistasis) based on whichgenes are divided into two groups with similar sizes Group S and Group L (E) The proportions of precise prediction of Group S Group L andrandom (R) expectations based on permutation Error bars represent standard deviations among 1000 permutations (F) The proportions ofsuccessful predictions similar to (E) (GndashI) A star-like motif in the yeast genetic interaction network and the gene order on chromosome XII inyeast The hub gene SFH1 and 19 genes with the distancelt100 to SFH1 are shown (J) A histogram of differences in the two largest epistasis valuesin the motif (Diffepistasis) based on which genes are divided into two groups with similar sizes Group S and Group L (K) Proportion of successfulpredictions of the closest gene in Group S and Group L Their respective random expectations based on permutation are shown in gray Error barsrepresent standard deviations among 1000 permutations

Yang et al doi101093molbevmsx264 MBE

3262Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

multiple gene pairs may often involve the rearrangementof genes on a chromosome

It is also worth noting that synthetic genetic array (SGA)the experimental strategy used to generate double mutants isbased on recombination between two null alleles (Tong et al2001 Costanzo et al 2010) Thus double mutants for linkedgenes may have a smaller initial frequency which may lead toinaccuracy in estimating fitness values for these mutants Todetermine whether such potential experimental bias has anyimpact on our results we calculated the partial EndashD correla-tion after controlling for the double mutant fitness valuesAgain we observed a strong negative EndashD correlation (partialqfrac14017 Pfrac14 13 109 Nfrac14 1254) This result was notunexpected because the fitness of double mutants andgene distance was only weakly correlated (qfrac14 0086Pfrac14 0002 Nfrac14 1254) More discussion on the caveats inthe analysis of empirical data is included in supplementarynote S2 Supplementary Material online

In a recently study 1800 genetic suppression interac-tions were identified in the budding yeast (van Leeuwen et al2016) which can be regarded as an extreme form of positivegenetic interactions Among them nine gene pairs are tightlylinked (D 2 supplementary table S3 SupplementaryMaterial online) This number is significantly greater thanthe random expectation (permutation test Pfrac14 001) againsupporting our model

The selection on clusters of locally adapted alleles whichsometimes were captured in chromosomal inversions hasbeen studied previously (Yeaman 2013 Kirkpatrick 2017) Inour study we proposed a hypothesis (the negative EndashD cor-relation) on deleterious alleles and tested this hypothesis withanalyses on empirical data and simulation both in the con-text of genetic interaction networks Therefore our resultsprovide additional clues for understanding the basic princi-ples of genome organization In particular the origin andmaintenance of clusters of genes in the same metabolic path-way which have puzzled scientists for years (Wong and Wolfe2005 Slot and Rokas 2010 Lang and Botstein 2011) could bewell explained by genetic interactions Genes in a linear met-abolic pathway exhibit positive epistasis because 1) doublemutants and single mutants have identical impacts ondestroying the function of the pathway (He et al 2010)and 2) the accumulation of deleterious intermediate prod-ucts resulting from a loss-of-function mutation in a down-stream gene of a metabolic pathway may be prevented by aloss-of-function mutation in an upstream gene (Wong andWolfe 2005 Slot and Rokas 2010 Lang and Botstein 2011)Because genetic linkage is advantageous among genes withpositive epistasis the clustering of genes in the same linearmetabolic pathway such as genes in the galactose utilizationpathway or allantoin degradation pathway is favored by nat-ural selection (Wong and Wolfe 2005 Slot and Rokas 2010)

Many factors have been reported to influence the evolu-tion of gene order such as tandem gene duplication positioneffects on gene expression noise (Batada and Hurst 2007Chen and Zhang 2016) coordinated gene expression amongneighboring genes (Cho et al 1998 Cohen et al 2000Boutanaev et al 2002 Spellman and Rubin 2002

Williams and Bowles 2004) clustering of functionally relatedgenes (Wong and Wolfe 2005 Slot and Rokas 2010) amongmany others In this study we provided evidence that inaddition to these factors the genetic interaction networkalso played an important role in driving the evolution ofgene order Because the empirical data of epistasis (fitnessvalues) are available in the budding yeast an evolutionarysimulation integrating many genetic interactions is possiblewhich makes epistasis unique among all factors driving theevolution of gene order Based on the empirical data of yeastgenetic interaction network our simulation indicates that theselective coefficient is on the order of 107 suggesting thatepistasis may play an important role in determining geneorder in yeast a species with a relative large effective popu-lation size (107) A negative EndashD correlation is expected to beobserved in species with a smaller effective population sizeonly if the range of epistasis is larger than that in yeast (figs 2and 3) Because the genome-wide empirical data of epistasisare unavailable in species with a smaller effective populationsize (eg humans and flies) it requires further investigationswhether the genetic interaction network plays a role in theevolution of gene order in these species in the future

Materials and Methods

GenomesThe genome annotation of S cerevisiae was downloaded fromthe Saccharomyces Genome Database (SGD httpwwwyeastgenomeorg version R64) The gene order of the recon-structed ancestor before the WGD (Gordon et al 2009) wasdownloaded from the Yeast Gene Order Browser (YGOB)(Byrne and Wolfe 2005)

SimulationTo examine the fitness effect of gene distance (D) two-locusdynamics under selection were simulated for populationswith different D (from 0 to 100 in 1-gene increments) aswell as a series of epistasis values (E from 0004 to 0004in 0001 increments) between two loci The recombinationfrequency (R) between these two loci was estimated from Dusing the equation below

R frac14 D 0004thorn 0064

which is the linear relationship estimated from the distancebetween two genes (D) and the empirical recombinationfrequencies (R) between them quantified in a previous study(Mancera et al 2008) The simulation was performed in hap-loid organisms in order to be in alignment with the analysesof empirical data in which epistasis values were quantified inhaploid yeast (Costanzo et al 2010 2016) A and B are wild-type alleles of two di-allelic loci on the same chromosome aand b are their deleterious alleles with fitness values ofxaBfrac14xAbfrac14 0992 The fitness of the wild-type (xAB) wasdefined as 1 and the epistasis E was defined as xabxaBxAbThe initial allele frequencies of a and b were both 01 The twoloci were initially under linkage equilibrium and therefore thefrequencies of AB(XAB) Ab(XAb) aB(XaB) and ab(Xab) were081 009 009 and 001 respectively After random mating

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3263Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

selection and recombination the frequencies of the fourgenotypes in the next generation were calculated with thefollowing equations (Nei 1967)

X0

AB frac14XABxAB

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

Ab frac14XAbxAb

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

aB frac14XaBxaB

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

ab frac14Xabxab

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

x is the average fitness of a population and was calculatedas follows

x frac14 XABxAB thorn Xabxab thorn XaBxaB thorn XAbxAb

For each population the average fitness (x) and thefrequencies of deleterious alleles (Xa and Xb) wererecorded in each generation of the simulation The differ-ence between the average fitness of two populations withdifferent D (Dfrac14 50 and Dfrac14 0) reflects whether linkage isfavored by natural selection or not The frequency differ-ence of the deleterious alleles between these two popu-lations is related to the long-term effects ofrecombination We also performed the simulation witha variety of parameter values (supplementary fig S4Supplementary Material online)

The average fitness of a population at the 100th generation(x100) was used to infer the fitness effect of gene distance at agiven epistasis value For each epistasis value we could iden-tify an optimal gene distance (Dopt) as well as a series ofpermitted gene distances (Dpermitted) The difference in x100

between Dpermitted and Dopt was lt107 The mean of allDpermitted values was calculated In addition our simulationdid not introduce new mutations to the population andtherefore the frequencies of deleterious alleles reduced overgenerations Nevertheless the negative EndashD correlation wasalso observed at the 50th and 200th generation (supplemen-tary fig S3 Supplementary Material online)

Toy MotifsWe built a toy motif in a genetic interaction network tosimulate the fitness effect of gene distance In the star-likemotif (fig 2A) a hub gene (A) interacted with nine partners(genes B C D E F G H I and J) with nine epistasis valuesranging from 0004 to 0004 in 0001 increments Weattempted to place nine partner genes on the same side ofthe hub gene on a chromosome Because we focused on theepistasis between the hub gene and the partners placingthe partners on different sides would not affect the resultsWe fixed the chromosomal location of the hub gene and thepartner genes were placed at Dfrac14 20 30 40 50 60 70 80 90and 100 from the hub gene The locations of the nine partnerswere shuffled while keeping the epistasis values unchanged

The fitness of a gene order was defined as the average x100 forall nine hub-containing gene pairs Thus we could calculatethe fitness for a total of (9frac14) 362880 possible gene ordersOther parameters (epistasis fitness defect D and initial allelefrequency) were also used to test whether the simulationresult was parameter sensitive (supplementary fig S5Supplementary Material online) Here epistasis values andfitness defects were randomly chosen from the empiricaldata in previous studies (Costanzo et al 2010 2016)

For the all-connected motif (fig 3A) five genes (A B C Dand E) interacted with each other with ten epistasis valuesranging from 00036 to 00036 in 00008 increments Thefive genes were placed on the same chromosome and thegene locations were set as 1 19 65 93 and 102 so that themaximum D in the motif was 100 The locations of these fivegenes were shuffled whereas the epistasis values were keptunchanged Similar to the star-like motifs the fitness of a geneorder was defined as the average x100 for all ten gene pairsThe fitness was calculated for a total of (5frac14) 120 possiblegene orders Other parameters (epistasis fitness defect D andinitial allele frequency) were also used to determine whetherthe simulation result was parameter sensitive (supplementaryfig S6 Supplementary Material online)

EpistasisEpistasis values were retrieved from the studies of Costanzoet al (Costanzo et al 2010 2016) Because the epistasis valuesof the overlapping gene pairs in these studies were highlycorrelated (rfrac14 071 Plt 10100 Pearsonrsquos correlationNfrac14 2604539) we merged their epistasis values A pair ofgenes was filtered if (a) the epistasis value between theirnull mutations was not a number (NaN) or (b) the epistasisvalues of the same null mutation pair had opposite signs inreciprocal crosses or in different studies We also removed thegene pairs with the null mutation in at least one gene result-ing in a higher fitness value than that of the wild-type Thisfiltering was performed because only a small number of an-tagonistic pleiotropic genes were detected in rich media(Qian et al 2012) Thus the elevated fitness observed upongene deletion in Costanzo et al was likely due to inaccurateestimation of fitness If epistasis between a pair of genes wasexamined multiple times we used the epistasis value with thesmallest P value Following Costanzo et al (2010) epistasisvalues with Plt 005 were classified as significant

In addition to the two data sets from Costanzo et al sev-eral other studies have also estimated epistasis values usinghigh-throughput strategies (Boone et al 2007) In principlewe could have included all of them in this study However forpractical reasons these data sets were not suitable for ourstudy First it would not be appropriate to combine otherdata sets with that of Costanzo et al because epistasis valuesfrom multiple studies potentially have different sources oferror especially if they followed different protocols Secondthe sample sizes of epistasis data of linked genes from otherstudies were not sufficiently large to examine the EndashD corre-lation Because the data set from Costanzo et al representsthe largest available so far containing 27 times more

Yang et al doi101093molbevmsx264 MBE

3264Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 7: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

In studies by Costanzo et al epistasis was classified intothree categories significantly positive significantly negativeand nonsignificant (Costanzo et al 2010 2016) A strongnegative correlation was observed between gene distanceand the proportion of gene pairs with significant positiveepistasis (fig 4G qfrac14082 Pfrac14 55 105) whereas no sig-nificant correlation was observed between gene distance andthe proportion of gene pairs with significant negative epistasis(fig 4H qfrac14 043 Pfrac14 0088) All these observations empha-size the important role of positive epistasis in the origin of thenegative EndashD correlation

Epistasis-Driven Evolution of Gene Order after theWhole Genome Duplication in YeastThus far we have observed a negative EndashD correlation anddemonstrated that the correlation was mainly attributable topositively epistatic gene pairs The aforementioned theoriessimulations and empirical evidence led us to propose a hy-pothesis of epistasis-driven evolution of gene order in S cer-evisiae The whole genome duplication (WGD fig 5A) andthe subsequent extensive gene losses substantially changedthe epistatic relations among genes and markedly rewired thegenetic interaction network in yeast (Kellis et al 2004 Dixonet al 2008 Tischler et al 2008 VanderSluis et al 2010) Forinstance only 29 of the identified synthetic lethality is con-served between Schizosaccharomyces pombe and S cerevisiae(Dixon et al 2008) At the same time extensive chromosomerearrangement events occurred For example in areconstructed ancestral species before the WGD (Byrne andWolfe 2005) we identified the gene pairs that were located onthe same chromosome and surprisingly 846 of them arelocalized on different chromosomes in S cerevisiae Morestrikingly among gene pairs that are localized on the samechromosome in both the ancestral species and S cerevisiae996 differ in gene distance Based on these observations weproposed that the rewired genetic interaction network drovethe evolution of gene order resulting in numerous chromo-some rearrangement events (Kellis et al 2004) When genelosses ceased the rewiring of genetic interactions slowed andthe evolutionary force on gene distance also diminishedConsistently synteny relationships are strongly conserved inthe species of the Saccharomyces sensu stricto group (Kelliset al 2003)

Our model predicted that the negative EndashD correlationshould be weaker if the gene order in S cerevisiae has beenunchanged since the WGD The reason is that the gene orderin the ancestor was not subject to the natural selection im-posed by the genetic interaction network of the currentS cerevisiae genome Furthermore given the massive genelosses after the WGD the genetic interaction network cannotbe 100 conserved To test this prediction we calculated thegene distances in the reconstructed ancestral species men-tioned earlier (Byrne and Wolfe 2005) Indeed we found thatthe negative EndashD correlation disappeared when the genedistances in S cerevisiae were replaced by those in the ances-tral species (fig 5B qfrac14 69 103 Pfrac14 026 Nfrac14 26630) Thisobservation indicates that the negative EndashD correlation inS cerevisiae formed during the evolution of gene order after

the WGD Consistently we observed that positively epistaticgene pairs decreased their distances whereas negatively epi-static gene pairs increased their distances during evolution(fig 5C qfrac14017 Pfrac14 0061 Nfrac14 127)

To further test the role of positive epistasis in the evolutionof gene order we identified genes that were ancestrally linked(ie D 100 in the reconstructed ancestor) and examinedwhether they moved toward or away from each other duringevolution Consistent with our model genes with significantpositive epistasis were more likely to move toward each otherthan genes without significant epistasis (fig 5DPfrac14 15 103 two-tailed Fisherrsquos exact test) whereas thedifference between gene pairs with significant negative epis-tasis and those without significant epistasis was not signifi-cant (fig 5D Pfrac14 041 two-tailed Fisherrsquos exact test) Togetherthese observations support our hypothesis of epistasis-drivenevolution of gene order in yeast

Genetic Interaction Network Accurately PredictsGene Order in YeastFinally we determined whether the gene order in S cerevisiaecould be successfully predicted by the empirical data of ge-netic interaction networks (Costanzo et al 2010 2016) Tothis end we identified 22 all-connected three-node motifs in

A B

C D

FIG 5 The origin of the negative EndashD correlation after the WGD inyeast (A) Phylogenetic relationship among yeast species The blackarrow indicates the reconstructed ancestor and the gray arrow indi-cates the WGD event (B) Negative EndashD correlation is not observedwhen the gene order in S cerevisiae is replaced by that in the recon-structed ancestor Gene pairs are separated into bins based on D withequal width of five genes The mean and SEM of epistasis within eachbin are shown Spearmanrsquos correlation coefficient q and the corre-sponding P values were calculated from the raw data (Nfrac14 26630)The dashed line shows the average epistasis among unlinked genes(C) The change in D (S cerevisiaemdashthe reconstructed ancestor) isnegatively correlated with the epistasis in S cerevisiae (Nfrac14 127)(D) The D between a gene pair in S cerevisiae is compared withthat in the reconstructed ancestor Proportions of gene pairs movingtoward and away from each other among gene pairs with significantpositive epistasis (left) nonsignificant epistasis (middle) and signifi-cant negative epistasis (right) are shown

Yang et al doi101093molbevmsx264 MBE

3260Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

which all three genes are localized within a 100-gene range ona chromosome An example is shown in figure 6A in whichVMA22 positively interacts with SRB2 and negatively interactswith AIM17 and the epistatic interaction between SRB2 andAIM17 is weak All six possible gene orders are enumerated infigure 6B Among them the first gene order exhibits a perfectEndashD anticorrelation (qfrac141 fig 6C) which is exactly theprediction of our model In fact it is also the real order ofthese genes on chromosome VIII The second gene ordersuccessfully places positively epistatic genes close to eachother and negatively epistatic genes far from each otherTherefore it is generally consistent with our prediction(fig 6C) Either gene order was considered as being success-fully predicted by our model if it actually occurred in the yeastgenome

We examined the accuracy of our prediction at the geno-mic scale We first divided these 22 motifs into two groupsbased on the magnitude of repistasis Group L contains 11motifs with larger repistasis and Group S contains 11 motifswith smaller repistasis (fig 6D) We found that the gene ordersof seven (out of 11) motifs in Group L were precisely pre-dicted by our model and the accuracy was significantly higherthan the random expectation (176 fig 6E Pfrac14 0001 per-mutation test) By contrast the predictive accuracy in GroupS (2 out of 11) was not significantly different from the randomexpectation (fig 6E Pfrac14 0607 permutation test) because lowvariation among epistasis values in a motif reduces the selec-tive coefficients (figs 2E and 3E) More broadly the propor-tion of successful predictions (first and second gene orders infig 6B) is 100 in Group L significantly higher than the ran-dom expectation (333 fig 6F Plt 0001 permutation test)This high predictive power suggests that epistasis plays avital role in driving the evolution of gene order We fur-ther identified 243 and 1302 all-connected motifs withinthe range of 150 and 200 genes on the same chromosomerespectively and again confirmed the predictive power ofthe genetic interaction network (supplementary fig S8AndashF Supplementary Material online) Moreover the predic-tive power of epistasis on gene order is independent ofexpression similarity because the latter could not accu-rately predict gene order (supplementary fig S9AndashISupplementary Material online)

We then determined whether gene order could be pre-dicted when the genetic interaction network is incompleteTo this end we identified 92 star-like motifs in which all genesare on the same chromosome and at least one gene withD 40 to the hub gene For example SFH1 which encodes acomponent of a chromatin remodeling complex geneticallyinteracts with 90 genes on the same chromosome (fig 6Gshows 20 genes with Dlt 100 to SFH1) Because the negativeEndashD correlation is mainly contributed by positively epistaticgene pairs in theory (figs 1 and 2) our model predicted thatgenes having strong positive epistasis with SFH1 should belocated close to it on the chromosome Indeed these genes(VRP1 FKS1 RPL26A VPS38 DCR2) are located close to SFH1Specifically the gene (VRP1) having the strongest positiveepistatic interaction with SFH1 was the closest gene toSFH1 on the chromosome (fig 6H)

To examine the predictive accuracy at the genomic scalewe divided these 92 star-like motifs into two groups based onthe difference between the top two highest epistasis values inthe motif (Diffepistasis fig 6J) We found that the proportion ofsuccessful predictions was 326 in Group L and 217 inGroup S both of which were significantly higher than randomexpectation (fig 6K Plt 0001 for Group L and Pfrac14 0001 forGroup S permutation test) Furthermore despite the lowerpredictive power our model could still predict gene locationsin 707 star-like motifs in which the closest gene to the hubwas within Dfrac14 90 (supplementary fig S8G and HSupplementary Material online Pfrac14 0002 for Group L andPfrac14 0036 for Group S permutation test)

DiscussionIn our study we provided results from both computationalsimulation and empirical data analysis to support the role ofgenetic interaction networks in driving the evolution of geneorder We performed simulations with different sets ofparameters some of which were from empirical data orwithin the range of empirical data (Schacherer et al 2009Costanzo et al 2010) The negative EndashD correlation was ob-served with all parameter sets (supplementary figs S4ndashS6Supplementary Material online) consistent with both theo-retical predictions (fig 1A) and empirical results (fig 4)

The epistasis values were estimated mainly from null alleles(Costanzo et al 2010) thus it remains unknown whether ourconclusion applies to other deleterious alleles FortunatelyCostanzo et al also estimated 7786453 pairwise epistasisvalues for either decreased abundance by mRNA perturba-tion (DAmP) alleles or temperature-sensitive (ts) alleles withmutations typically changing coding sequences which offeredus the opportunity to address this question (Costanzo et al2010 2016) We found that the epistasis values of null alleleswere significantly correlated with those of other deleteriousalleles of the same gene (supplementary fig S10AndashLSupplementary Material online) suggesting that the negativeEndashD correlation is likely a general phenomenon for variousdeleterious alleles More importantly because DNA deletionevents that lead to null alleles are frequently observed in yeastnatural populations (Schacherer et al 2009) epistasis amongnull alleles by itself may be sufficiently strong to drive theevolution of gene distance

In principle if the frequencies of deleterious alleles in apopulation are significantly reduced by natural selectionthe advantage of genetic linkage could be small for this pop-ulation However numerous deleterious mutations (eg de-letion of a whole gene nonsense mutations missensemutations and mutations altering start codons) have beenobserved in natural populations (Liti et al 2009 Schachereret al 2009) suggesting that the frequencies of deleteriousalleles may not be low This phenomenon is probably dueto the antagonistic pleiotropic effects of a mutation in mul-tiple environments (Qian et al 2012)

It is worth noting that theoretical analysis predicts thatepistasis can drive the evolution of recombination frequencyrather than gene distance In addition to gene distance 1) the

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3261Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

lengths of intergenic regions and 2) recombination hotcold spots could also affect recombination frequencyNevertheless a strong correlation between gene distanceand physical distance that is the number of nucleotidesbetween two genes was observed (qfrac14 0998 Plt 10100Nfrac14 1624935) suggesting that the effect of (1) is negligi-ble Furthermore physical distance is highly correlatedwith recombination frequency when the physical distanceislt 180 kb (supplementary fig S1C SupplementaryMaterial online qfrac14 088 Plt 10100 Nfrac14 336227)

suggesting that the effect of (2) is also limited As a resultrecombination frequency and gene distance are highlycorrelated (supplementary fig S1B SupplementaryMaterial online qfrac14 087 Plt 10100 Nfrac14 337892 forgene pairs with D 100) Because the recombination fre-quency was only measured for 58 of gene pairs inS cerevisiae (Mancera et al 2008) and was unknown forthe reconstructed ancestral species we used gene dis-tance to approximate recombination frequency in thisstudy The optimization of gene distances among

A

HK

B C D

G

E

I

F

J

FIG 6 Gene orders can be predicted from the yeast genetic interaction network (A) A three-node all-connected motif in the yeast geneticinteraction network The standard deviation of epistasis values (repistasis) in this motif is 005 (B) Six possible gene orders are listed The first is theobserved gene order on chromosome VIII in yeast (C) The EndashD relationship for the six possible gene orders in (B) We consider the predictionsuccessful if the first two gene orders are observed because they exhibit negative EndashD correlations In particular the prediction is precise if the firstgene order is observed The dashed line represents Efrac14 0 (D) A histogram of the standard deviation of epistasis values (repistasis) based on whichgenes are divided into two groups with similar sizes Group S and Group L (E) The proportions of precise prediction of Group S Group L andrandom (R) expectations based on permutation Error bars represent standard deviations among 1000 permutations (F) The proportions ofsuccessful predictions similar to (E) (GndashI) A star-like motif in the yeast genetic interaction network and the gene order on chromosome XII inyeast The hub gene SFH1 and 19 genes with the distancelt100 to SFH1 are shown (J) A histogram of differences in the two largest epistasis valuesin the motif (Diffepistasis) based on which genes are divided into two groups with similar sizes Group S and Group L (K) Proportion of successfulpredictions of the closest gene in Group S and Group L Their respective random expectations based on permutation are shown in gray Error barsrepresent standard deviations among 1000 permutations

Yang et al doi101093molbevmsx264 MBE

3262Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

multiple gene pairs may often involve the rearrangementof genes on a chromosome

It is also worth noting that synthetic genetic array (SGA)the experimental strategy used to generate double mutants isbased on recombination between two null alleles (Tong et al2001 Costanzo et al 2010) Thus double mutants for linkedgenes may have a smaller initial frequency which may lead toinaccuracy in estimating fitness values for these mutants Todetermine whether such potential experimental bias has anyimpact on our results we calculated the partial EndashD correla-tion after controlling for the double mutant fitness valuesAgain we observed a strong negative EndashD correlation (partialqfrac14017 Pfrac14 13 109 Nfrac14 1254) This result was notunexpected because the fitness of double mutants andgene distance was only weakly correlated (qfrac14 0086Pfrac14 0002 Nfrac14 1254) More discussion on the caveats inthe analysis of empirical data is included in supplementarynote S2 Supplementary Material online

In a recently study 1800 genetic suppression interac-tions were identified in the budding yeast (van Leeuwen et al2016) which can be regarded as an extreme form of positivegenetic interactions Among them nine gene pairs are tightlylinked (D 2 supplementary table S3 SupplementaryMaterial online) This number is significantly greater thanthe random expectation (permutation test Pfrac14 001) againsupporting our model

The selection on clusters of locally adapted alleles whichsometimes were captured in chromosomal inversions hasbeen studied previously (Yeaman 2013 Kirkpatrick 2017) Inour study we proposed a hypothesis (the negative EndashD cor-relation) on deleterious alleles and tested this hypothesis withanalyses on empirical data and simulation both in the con-text of genetic interaction networks Therefore our resultsprovide additional clues for understanding the basic princi-ples of genome organization In particular the origin andmaintenance of clusters of genes in the same metabolic path-way which have puzzled scientists for years (Wong and Wolfe2005 Slot and Rokas 2010 Lang and Botstein 2011) could bewell explained by genetic interactions Genes in a linear met-abolic pathway exhibit positive epistasis because 1) doublemutants and single mutants have identical impacts ondestroying the function of the pathway (He et al 2010)and 2) the accumulation of deleterious intermediate prod-ucts resulting from a loss-of-function mutation in a down-stream gene of a metabolic pathway may be prevented by aloss-of-function mutation in an upstream gene (Wong andWolfe 2005 Slot and Rokas 2010 Lang and Botstein 2011)Because genetic linkage is advantageous among genes withpositive epistasis the clustering of genes in the same linearmetabolic pathway such as genes in the galactose utilizationpathway or allantoin degradation pathway is favored by nat-ural selection (Wong and Wolfe 2005 Slot and Rokas 2010)

Many factors have been reported to influence the evolu-tion of gene order such as tandem gene duplication positioneffects on gene expression noise (Batada and Hurst 2007Chen and Zhang 2016) coordinated gene expression amongneighboring genes (Cho et al 1998 Cohen et al 2000Boutanaev et al 2002 Spellman and Rubin 2002

Williams and Bowles 2004) clustering of functionally relatedgenes (Wong and Wolfe 2005 Slot and Rokas 2010) amongmany others In this study we provided evidence that inaddition to these factors the genetic interaction networkalso played an important role in driving the evolution ofgene order Because the empirical data of epistasis (fitnessvalues) are available in the budding yeast an evolutionarysimulation integrating many genetic interactions is possiblewhich makes epistasis unique among all factors driving theevolution of gene order Based on the empirical data of yeastgenetic interaction network our simulation indicates that theselective coefficient is on the order of 107 suggesting thatepistasis may play an important role in determining geneorder in yeast a species with a relative large effective popu-lation size (107) A negative EndashD correlation is expected to beobserved in species with a smaller effective population sizeonly if the range of epistasis is larger than that in yeast (figs 2and 3) Because the genome-wide empirical data of epistasisare unavailable in species with a smaller effective populationsize (eg humans and flies) it requires further investigationswhether the genetic interaction network plays a role in theevolution of gene order in these species in the future

Materials and Methods

GenomesThe genome annotation of S cerevisiae was downloaded fromthe Saccharomyces Genome Database (SGD httpwwwyeastgenomeorg version R64) The gene order of the recon-structed ancestor before the WGD (Gordon et al 2009) wasdownloaded from the Yeast Gene Order Browser (YGOB)(Byrne and Wolfe 2005)

SimulationTo examine the fitness effect of gene distance (D) two-locusdynamics under selection were simulated for populationswith different D (from 0 to 100 in 1-gene increments) aswell as a series of epistasis values (E from 0004 to 0004in 0001 increments) between two loci The recombinationfrequency (R) between these two loci was estimated from Dusing the equation below

R frac14 D 0004thorn 0064

which is the linear relationship estimated from the distancebetween two genes (D) and the empirical recombinationfrequencies (R) between them quantified in a previous study(Mancera et al 2008) The simulation was performed in hap-loid organisms in order to be in alignment with the analysesof empirical data in which epistasis values were quantified inhaploid yeast (Costanzo et al 2010 2016) A and B are wild-type alleles of two di-allelic loci on the same chromosome aand b are their deleterious alleles with fitness values ofxaBfrac14xAbfrac14 0992 The fitness of the wild-type (xAB) wasdefined as 1 and the epistasis E was defined as xabxaBxAbThe initial allele frequencies of a and b were both 01 The twoloci were initially under linkage equilibrium and therefore thefrequencies of AB(XAB) Ab(XAb) aB(XaB) and ab(Xab) were081 009 009 and 001 respectively After random mating

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3263Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

selection and recombination the frequencies of the fourgenotypes in the next generation were calculated with thefollowing equations (Nei 1967)

X0

AB frac14XABxAB

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

Ab frac14XAbxAb

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

aB frac14XaBxaB

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

ab frac14Xabxab

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

x is the average fitness of a population and was calculatedas follows

x frac14 XABxAB thorn Xabxab thorn XaBxaB thorn XAbxAb

For each population the average fitness (x) and thefrequencies of deleterious alleles (Xa and Xb) wererecorded in each generation of the simulation The differ-ence between the average fitness of two populations withdifferent D (Dfrac14 50 and Dfrac14 0) reflects whether linkage isfavored by natural selection or not The frequency differ-ence of the deleterious alleles between these two popu-lations is related to the long-term effects ofrecombination We also performed the simulation witha variety of parameter values (supplementary fig S4Supplementary Material online)

The average fitness of a population at the 100th generation(x100) was used to infer the fitness effect of gene distance at agiven epistasis value For each epistasis value we could iden-tify an optimal gene distance (Dopt) as well as a series ofpermitted gene distances (Dpermitted) The difference in x100

between Dpermitted and Dopt was lt107 The mean of allDpermitted values was calculated In addition our simulationdid not introduce new mutations to the population andtherefore the frequencies of deleterious alleles reduced overgenerations Nevertheless the negative EndashD correlation wasalso observed at the 50th and 200th generation (supplemen-tary fig S3 Supplementary Material online)

Toy MotifsWe built a toy motif in a genetic interaction network tosimulate the fitness effect of gene distance In the star-likemotif (fig 2A) a hub gene (A) interacted with nine partners(genes B C D E F G H I and J) with nine epistasis valuesranging from 0004 to 0004 in 0001 increments Weattempted to place nine partner genes on the same side ofthe hub gene on a chromosome Because we focused on theepistasis between the hub gene and the partners placingthe partners on different sides would not affect the resultsWe fixed the chromosomal location of the hub gene and thepartner genes were placed at Dfrac14 20 30 40 50 60 70 80 90and 100 from the hub gene The locations of the nine partnerswere shuffled while keeping the epistasis values unchanged

The fitness of a gene order was defined as the average x100 forall nine hub-containing gene pairs Thus we could calculatethe fitness for a total of (9frac14) 362880 possible gene ordersOther parameters (epistasis fitness defect D and initial allelefrequency) were also used to test whether the simulationresult was parameter sensitive (supplementary fig S5Supplementary Material online) Here epistasis values andfitness defects were randomly chosen from the empiricaldata in previous studies (Costanzo et al 2010 2016)

For the all-connected motif (fig 3A) five genes (A B C Dand E) interacted with each other with ten epistasis valuesranging from 00036 to 00036 in 00008 increments Thefive genes were placed on the same chromosome and thegene locations were set as 1 19 65 93 and 102 so that themaximum D in the motif was 100 The locations of these fivegenes were shuffled whereas the epistasis values were keptunchanged Similar to the star-like motifs the fitness of a geneorder was defined as the average x100 for all ten gene pairsThe fitness was calculated for a total of (5frac14) 120 possiblegene orders Other parameters (epistasis fitness defect D andinitial allele frequency) were also used to determine whetherthe simulation result was parameter sensitive (supplementaryfig S6 Supplementary Material online)

EpistasisEpistasis values were retrieved from the studies of Costanzoet al (Costanzo et al 2010 2016) Because the epistasis valuesof the overlapping gene pairs in these studies were highlycorrelated (rfrac14 071 Plt 10100 Pearsonrsquos correlationNfrac14 2604539) we merged their epistasis values A pair ofgenes was filtered if (a) the epistasis value between theirnull mutations was not a number (NaN) or (b) the epistasisvalues of the same null mutation pair had opposite signs inreciprocal crosses or in different studies We also removed thegene pairs with the null mutation in at least one gene result-ing in a higher fitness value than that of the wild-type Thisfiltering was performed because only a small number of an-tagonistic pleiotropic genes were detected in rich media(Qian et al 2012) Thus the elevated fitness observed upongene deletion in Costanzo et al was likely due to inaccurateestimation of fitness If epistasis between a pair of genes wasexamined multiple times we used the epistasis value with thesmallest P value Following Costanzo et al (2010) epistasisvalues with Plt 005 were classified as significant

In addition to the two data sets from Costanzo et al sev-eral other studies have also estimated epistasis values usinghigh-throughput strategies (Boone et al 2007) In principlewe could have included all of them in this study However forpractical reasons these data sets were not suitable for ourstudy First it would not be appropriate to combine otherdata sets with that of Costanzo et al because epistasis valuesfrom multiple studies potentially have different sources oferror especially if they followed different protocols Secondthe sample sizes of epistasis data of linked genes from otherstudies were not sufficiently large to examine the EndashD corre-lation Because the data set from Costanzo et al representsthe largest available so far containing 27 times more

Yang et al doi101093molbevmsx264 MBE

3264Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 8: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

which all three genes are localized within a 100-gene range ona chromosome An example is shown in figure 6A in whichVMA22 positively interacts with SRB2 and negatively interactswith AIM17 and the epistatic interaction between SRB2 andAIM17 is weak All six possible gene orders are enumerated infigure 6B Among them the first gene order exhibits a perfectEndashD anticorrelation (qfrac141 fig 6C) which is exactly theprediction of our model In fact it is also the real order ofthese genes on chromosome VIII The second gene ordersuccessfully places positively epistatic genes close to eachother and negatively epistatic genes far from each otherTherefore it is generally consistent with our prediction(fig 6C) Either gene order was considered as being success-fully predicted by our model if it actually occurred in the yeastgenome

We examined the accuracy of our prediction at the geno-mic scale We first divided these 22 motifs into two groupsbased on the magnitude of repistasis Group L contains 11motifs with larger repistasis and Group S contains 11 motifswith smaller repistasis (fig 6D) We found that the gene ordersof seven (out of 11) motifs in Group L were precisely pre-dicted by our model and the accuracy was significantly higherthan the random expectation (176 fig 6E Pfrac14 0001 per-mutation test) By contrast the predictive accuracy in GroupS (2 out of 11) was not significantly different from the randomexpectation (fig 6E Pfrac14 0607 permutation test) because lowvariation among epistasis values in a motif reduces the selec-tive coefficients (figs 2E and 3E) More broadly the propor-tion of successful predictions (first and second gene orders infig 6B) is 100 in Group L significantly higher than the ran-dom expectation (333 fig 6F Plt 0001 permutation test)This high predictive power suggests that epistasis plays avital role in driving the evolution of gene order We fur-ther identified 243 and 1302 all-connected motifs withinthe range of 150 and 200 genes on the same chromosomerespectively and again confirmed the predictive power ofthe genetic interaction network (supplementary fig S8AndashF Supplementary Material online) Moreover the predic-tive power of epistasis on gene order is independent ofexpression similarity because the latter could not accu-rately predict gene order (supplementary fig S9AndashISupplementary Material online)

We then determined whether gene order could be pre-dicted when the genetic interaction network is incompleteTo this end we identified 92 star-like motifs in which all genesare on the same chromosome and at least one gene withD 40 to the hub gene For example SFH1 which encodes acomponent of a chromatin remodeling complex geneticallyinteracts with 90 genes on the same chromosome (fig 6Gshows 20 genes with Dlt 100 to SFH1) Because the negativeEndashD correlation is mainly contributed by positively epistaticgene pairs in theory (figs 1 and 2) our model predicted thatgenes having strong positive epistasis with SFH1 should belocated close to it on the chromosome Indeed these genes(VRP1 FKS1 RPL26A VPS38 DCR2) are located close to SFH1Specifically the gene (VRP1) having the strongest positiveepistatic interaction with SFH1 was the closest gene toSFH1 on the chromosome (fig 6H)

To examine the predictive accuracy at the genomic scalewe divided these 92 star-like motifs into two groups based onthe difference between the top two highest epistasis values inthe motif (Diffepistasis fig 6J) We found that the proportion ofsuccessful predictions was 326 in Group L and 217 inGroup S both of which were significantly higher than randomexpectation (fig 6K Plt 0001 for Group L and Pfrac14 0001 forGroup S permutation test) Furthermore despite the lowerpredictive power our model could still predict gene locationsin 707 star-like motifs in which the closest gene to the hubwas within Dfrac14 90 (supplementary fig S8G and HSupplementary Material online Pfrac14 0002 for Group L andPfrac14 0036 for Group S permutation test)

DiscussionIn our study we provided results from both computationalsimulation and empirical data analysis to support the role ofgenetic interaction networks in driving the evolution of geneorder We performed simulations with different sets ofparameters some of which were from empirical data orwithin the range of empirical data (Schacherer et al 2009Costanzo et al 2010) The negative EndashD correlation was ob-served with all parameter sets (supplementary figs S4ndashS6Supplementary Material online) consistent with both theo-retical predictions (fig 1A) and empirical results (fig 4)

The epistasis values were estimated mainly from null alleles(Costanzo et al 2010) thus it remains unknown whether ourconclusion applies to other deleterious alleles FortunatelyCostanzo et al also estimated 7786453 pairwise epistasisvalues for either decreased abundance by mRNA perturba-tion (DAmP) alleles or temperature-sensitive (ts) alleles withmutations typically changing coding sequences which offeredus the opportunity to address this question (Costanzo et al2010 2016) We found that the epistasis values of null alleleswere significantly correlated with those of other deleteriousalleles of the same gene (supplementary fig S10AndashLSupplementary Material online) suggesting that the negativeEndashD correlation is likely a general phenomenon for variousdeleterious alleles More importantly because DNA deletionevents that lead to null alleles are frequently observed in yeastnatural populations (Schacherer et al 2009) epistasis amongnull alleles by itself may be sufficiently strong to drive theevolution of gene distance

In principle if the frequencies of deleterious alleles in apopulation are significantly reduced by natural selectionthe advantage of genetic linkage could be small for this pop-ulation However numerous deleterious mutations (eg de-letion of a whole gene nonsense mutations missensemutations and mutations altering start codons) have beenobserved in natural populations (Liti et al 2009 Schachereret al 2009) suggesting that the frequencies of deleteriousalleles may not be low This phenomenon is probably dueto the antagonistic pleiotropic effects of a mutation in mul-tiple environments (Qian et al 2012)

It is worth noting that theoretical analysis predicts thatepistasis can drive the evolution of recombination frequencyrather than gene distance In addition to gene distance 1) the

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3261Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

lengths of intergenic regions and 2) recombination hotcold spots could also affect recombination frequencyNevertheless a strong correlation between gene distanceand physical distance that is the number of nucleotidesbetween two genes was observed (qfrac14 0998 Plt 10100Nfrac14 1624935) suggesting that the effect of (1) is negligi-ble Furthermore physical distance is highly correlatedwith recombination frequency when the physical distanceislt 180 kb (supplementary fig S1C SupplementaryMaterial online qfrac14 088 Plt 10100 Nfrac14 336227)

suggesting that the effect of (2) is also limited As a resultrecombination frequency and gene distance are highlycorrelated (supplementary fig S1B SupplementaryMaterial online qfrac14 087 Plt 10100 Nfrac14 337892 forgene pairs with D 100) Because the recombination fre-quency was only measured for 58 of gene pairs inS cerevisiae (Mancera et al 2008) and was unknown forthe reconstructed ancestral species we used gene dis-tance to approximate recombination frequency in thisstudy The optimization of gene distances among

A

HK

B C D

G

E

I

F

J

FIG 6 Gene orders can be predicted from the yeast genetic interaction network (A) A three-node all-connected motif in the yeast geneticinteraction network The standard deviation of epistasis values (repistasis) in this motif is 005 (B) Six possible gene orders are listed The first is theobserved gene order on chromosome VIII in yeast (C) The EndashD relationship for the six possible gene orders in (B) We consider the predictionsuccessful if the first two gene orders are observed because they exhibit negative EndashD correlations In particular the prediction is precise if the firstgene order is observed The dashed line represents Efrac14 0 (D) A histogram of the standard deviation of epistasis values (repistasis) based on whichgenes are divided into two groups with similar sizes Group S and Group L (E) The proportions of precise prediction of Group S Group L andrandom (R) expectations based on permutation Error bars represent standard deviations among 1000 permutations (F) The proportions ofsuccessful predictions similar to (E) (GndashI) A star-like motif in the yeast genetic interaction network and the gene order on chromosome XII inyeast The hub gene SFH1 and 19 genes with the distancelt100 to SFH1 are shown (J) A histogram of differences in the two largest epistasis valuesin the motif (Diffepistasis) based on which genes are divided into two groups with similar sizes Group S and Group L (K) Proportion of successfulpredictions of the closest gene in Group S and Group L Their respective random expectations based on permutation are shown in gray Error barsrepresent standard deviations among 1000 permutations

Yang et al doi101093molbevmsx264 MBE

3262Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

multiple gene pairs may often involve the rearrangementof genes on a chromosome

It is also worth noting that synthetic genetic array (SGA)the experimental strategy used to generate double mutants isbased on recombination between two null alleles (Tong et al2001 Costanzo et al 2010) Thus double mutants for linkedgenes may have a smaller initial frequency which may lead toinaccuracy in estimating fitness values for these mutants Todetermine whether such potential experimental bias has anyimpact on our results we calculated the partial EndashD correla-tion after controlling for the double mutant fitness valuesAgain we observed a strong negative EndashD correlation (partialqfrac14017 Pfrac14 13 109 Nfrac14 1254) This result was notunexpected because the fitness of double mutants andgene distance was only weakly correlated (qfrac14 0086Pfrac14 0002 Nfrac14 1254) More discussion on the caveats inthe analysis of empirical data is included in supplementarynote S2 Supplementary Material online

In a recently study 1800 genetic suppression interac-tions were identified in the budding yeast (van Leeuwen et al2016) which can be regarded as an extreme form of positivegenetic interactions Among them nine gene pairs are tightlylinked (D 2 supplementary table S3 SupplementaryMaterial online) This number is significantly greater thanthe random expectation (permutation test Pfrac14 001) againsupporting our model

The selection on clusters of locally adapted alleles whichsometimes were captured in chromosomal inversions hasbeen studied previously (Yeaman 2013 Kirkpatrick 2017) Inour study we proposed a hypothesis (the negative EndashD cor-relation) on deleterious alleles and tested this hypothesis withanalyses on empirical data and simulation both in the con-text of genetic interaction networks Therefore our resultsprovide additional clues for understanding the basic princi-ples of genome organization In particular the origin andmaintenance of clusters of genes in the same metabolic path-way which have puzzled scientists for years (Wong and Wolfe2005 Slot and Rokas 2010 Lang and Botstein 2011) could bewell explained by genetic interactions Genes in a linear met-abolic pathway exhibit positive epistasis because 1) doublemutants and single mutants have identical impacts ondestroying the function of the pathway (He et al 2010)and 2) the accumulation of deleterious intermediate prod-ucts resulting from a loss-of-function mutation in a down-stream gene of a metabolic pathway may be prevented by aloss-of-function mutation in an upstream gene (Wong andWolfe 2005 Slot and Rokas 2010 Lang and Botstein 2011)Because genetic linkage is advantageous among genes withpositive epistasis the clustering of genes in the same linearmetabolic pathway such as genes in the galactose utilizationpathway or allantoin degradation pathway is favored by nat-ural selection (Wong and Wolfe 2005 Slot and Rokas 2010)

Many factors have been reported to influence the evolu-tion of gene order such as tandem gene duplication positioneffects on gene expression noise (Batada and Hurst 2007Chen and Zhang 2016) coordinated gene expression amongneighboring genes (Cho et al 1998 Cohen et al 2000Boutanaev et al 2002 Spellman and Rubin 2002

Williams and Bowles 2004) clustering of functionally relatedgenes (Wong and Wolfe 2005 Slot and Rokas 2010) amongmany others In this study we provided evidence that inaddition to these factors the genetic interaction networkalso played an important role in driving the evolution ofgene order Because the empirical data of epistasis (fitnessvalues) are available in the budding yeast an evolutionarysimulation integrating many genetic interactions is possiblewhich makes epistasis unique among all factors driving theevolution of gene order Based on the empirical data of yeastgenetic interaction network our simulation indicates that theselective coefficient is on the order of 107 suggesting thatepistasis may play an important role in determining geneorder in yeast a species with a relative large effective popu-lation size (107) A negative EndashD correlation is expected to beobserved in species with a smaller effective population sizeonly if the range of epistasis is larger than that in yeast (figs 2and 3) Because the genome-wide empirical data of epistasisare unavailable in species with a smaller effective populationsize (eg humans and flies) it requires further investigationswhether the genetic interaction network plays a role in theevolution of gene order in these species in the future

Materials and Methods

GenomesThe genome annotation of S cerevisiae was downloaded fromthe Saccharomyces Genome Database (SGD httpwwwyeastgenomeorg version R64) The gene order of the recon-structed ancestor before the WGD (Gordon et al 2009) wasdownloaded from the Yeast Gene Order Browser (YGOB)(Byrne and Wolfe 2005)

SimulationTo examine the fitness effect of gene distance (D) two-locusdynamics under selection were simulated for populationswith different D (from 0 to 100 in 1-gene increments) aswell as a series of epistasis values (E from 0004 to 0004in 0001 increments) between two loci The recombinationfrequency (R) between these two loci was estimated from Dusing the equation below

R frac14 D 0004thorn 0064

which is the linear relationship estimated from the distancebetween two genes (D) and the empirical recombinationfrequencies (R) between them quantified in a previous study(Mancera et al 2008) The simulation was performed in hap-loid organisms in order to be in alignment with the analysesof empirical data in which epistasis values were quantified inhaploid yeast (Costanzo et al 2010 2016) A and B are wild-type alleles of two di-allelic loci on the same chromosome aand b are their deleterious alleles with fitness values ofxaBfrac14xAbfrac14 0992 The fitness of the wild-type (xAB) wasdefined as 1 and the epistasis E was defined as xabxaBxAbThe initial allele frequencies of a and b were both 01 The twoloci were initially under linkage equilibrium and therefore thefrequencies of AB(XAB) Ab(XAb) aB(XaB) and ab(Xab) were081 009 009 and 001 respectively After random mating

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3263Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

selection and recombination the frequencies of the fourgenotypes in the next generation were calculated with thefollowing equations (Nei 1967)

X0

AB frac14XABxAB

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

Ab frac14XAbxAb

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

aB frac14XaBxaB

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

ab frac14Xabxab

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

x is the average fitness of a population and was calculatedas follows

x frac14 XABxAB thorn Xabxab thorn XaBxaB thorn XAbxAb

For each population the average fitness (x) and thefrequencies of deleterious alleles (Xa and Xb) wererecorded in each generation of the simulation The differ-ence between the average fitness of two populations withdifferent D (Dfrac14 50 and Dfrac14 0) reflects whether linkage isfavored by natural selection or not The frequency differ-ence of the deleterious alleles between these two popu-lations is related to the long-term effects ofrecombination We also performed the simulation witha variety of parameter values (supplementary fig S4Supplementary Material online)

The average fitness of a population at the 100th generation(x100) was used to infer the fitness effect of gene distance at agiven epistasis value For each epistasis value we could iden-tify an optimal gene distance (Dopt) as well as a series ofpermitted gene distances (Dpermitted) The difference in x100

between Dpermitted and Dopt was lt107 The mean of allDpermitted values was calculated In addition our simulationdid not introduce new mutations to the population andtherefore the frequencies of deleterious alleles reduced overgenerations Nevertheless the negative EndashD correlation wasalso observed at the 50th and 200th generation (supplemen-tary fig S3 Supplementary Material online)

Toy MotifsWe built a toy motif in a genetic interaction network tosimulate the fitness effect of gene distance In the star-likemotif (fig 2A) a hub gene (A) interacted with nine partners(genes B C D E F G H I and J) with nine epistasis valuesranging from 0004 to 0004 in 0001 increments Weattempted to place nine partner genes on the same side ofthe hub gene on a chromosome Because we focused on theepistasis between the hub gene and the partners placingthe partners on different sides would not affect the resultsWe fixed the chromosomal location of the hub gene and thepartner genes were placed at Dfrac14 20 30 40 50 60 70 80 90and 100 from the hub gene The locations of the nine partnerswere shuffled while keeping the epistasis values unchanged

The fitness of a gene order was defined as the average x100 forall nine hub-containing gene pairs Thus we could calculatethe fitness for a total of (9frac14) 362880 possible gene ordersOther parameters (epistasis fitness defect D and initial allelefrequency) were also used to test whether the simulationresult was parameter sensitive (supplementary fig S5Supplementary Material online) Here epistasis values andfitness defects were randomly chosen from the empiricaldata in previous studies (Costanzo et al 2010 2016)

For the all-connected motif (fig 3A) five genes (A B C Dand E) interacted with each other with ten epistasis valuesranging from 00036 to 00036 in 00008 increments Thefive genes were placed on the same chromosome and thegene locations were set as 1 19 65 93 and 102 so that themaximum D in the motif was 100 The locations of these fivegenes were shuffled whereas the epistasis values were keptunchanged Similar to the star-like motifs the fitness of a geneorder was defined as the average x100 for all ten gene pairsThe fitness was calculated for a total of (5frac14) 120 possiblegene orders Other parameters (epistasis fitness defect D andinitial allele frequency) were also used to determine whetherthe simulation result was parameter sensitive (supplementaryfig S6 Supplementary Material online)

EpistasisEpistasis values were retrieved from the studies of Costanzoet al (Costanzo et al 2010 2016) Because the epistasis valuesof the overlapping gene pairs in these studies were highlycorrelated (rfrac14 071 Plt 10100 Pearsonrsquos correlationNfrac14 2604539) we merged their epistasis values A pair ofgenes was filtered if (a) the epistasis value between theirnull mutations was not a number (NaN) or (b) the epistasisvalues of the same null mutation pair had opposite signs inreciprocal crosses or in different studies We also removed thegene pairs with the null mutation in at least one gene result-ing in a higher fitness value than that of the wild-type Thisfiltering was performed because only a small number of an-tagonistic pleiotropic genes were detected in rich media(Qian et al 2012) Thus the elevated fitness observed upongene deletion in Costanzo et al was likely due to inaccurateestimation of fitness If epistasis between a pair of genes wasexamined multiple times we used the epistasis value with thesmallest P value Following Costanzo et al (2010) epistasisvalues with Plt 005 were classified as significant

In addition to the two data sets from Costanzo et al sev-eral other studies have also estimated epistasis values usinghigh-throughput strategies (Boone et al 2007) In principlewe could have included all of them in this study However forpractical reasons these data sets were not suitable for ourstudy First it would not be appropriate to combine otherdata sets with that of Costanzo et al because epistasis valuesfrom multiple studies potentially have different sources oferror especially if they followed different protocols Secondthe sample sizes of epistasis data of linked genes from otherstudies were not sufficiently large to examine the EndashD corre-lation Because the data set from Costanzo et al representsthe largest available so far containing 27 times more

Yang et al doi101093molbevmsx264 MBE

3264Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 9: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

lengths of intergenic regions and 2) recombination hotcold spots could also affect recombination frequencyNevertheless a strong correlation between gene distanceand physical distance that is the number of nucleotidesbetween two genes was observed (qfrac14 0998 Plt 10100Nfrac14 1624935) suggesting that the effect of (1) is negligi-ble Furthermore physical distance is highly correlatedwith recombination frequency when the physical distanceislt 180 kb (supplementary fig S1C SupplementaryMaterial online qfrac14 088 Plt 10100 Nfrac14 336227)

suggesting that the effect of (2) is also limited As a resultrecombination frequency and gene distance are highlycorrelated (supplementary fig S1B SupplementaryMaterial online qfrac14 087 Plt 10100 Nfrac14 337892 forgene pairs with D 100) Because the recombination fre-quency was only measured for 58 of gene pairs inS cerevisiae (Mancera et al 2008) and was unknown forthe reconstructed ancestral species we used gene dis-tance to approximate recombination frequency in thisstudy The optimization of gene distances among

A

HK

B C D

G

E

I

F

J

FIG 6 Gene orders can be predicted from the yeast genetic interaction network (A) A three-node all-connected motif in the yeast geneticinteraction network The standard deviation of epistasis values (repistasis) in this motif is 005 (B) Six possible gene orders are listed The first is theobserved gene order on chromosome VIII in yeast (C) The EndashD relationship for the six possible gene orders in (B) We consider the predictionsuccessful if the first two gene orders are observed because they exhibit negative EndashD correlations In particular the prediction is precise if the firstgene order is observed The dashed line represents Efrac14 0 (D) A histogram of the standard deviation of epistasis values (repistasis) based on whichgenes are divided into two groups with similar sizes Group S and Group L (E) The proportions of precise prediction of Group S Group L andrandom (R) expectations based on permutation Error bars represent standard deviations among 1000 permutations (F) The proportions ofsuccessful predictions similar to (E) (GndashI) A star-like motif in the yeast genetic interaction network and the gene order on chromosome XII inyeast The hub gene SFH1 and 19 genes with the distancelt100 to SFH1 are shown (J) A histogram of differences in the two largest epistasis valuesin the motif (Diffepistasis) based on which genes are divided into two groups with similar sizes Group S and Group L (K) Proportion of successfulpredictions of the closest gene in Group S and Group L Their respective random expectations based on permutation are shown in gray Error barsrepresent standard deviations among 1000 permutations

Yang et al doi101093molbevmsx264 MBE

3262Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

multiple gene pairs may often involve the rearrangementof genes on a chromosome

It is also worth noting that synthetic genetic array (SGA)the experimental strategy used to generate double mutants isbased on recombination between two null alleles (Tong et al2001 Costanzo et al 2010) Thus double mutants for linkedgenes may have a smaller initial frequency which may lead toinaccuracy in estimating fitness values for these mutants Todetermine whether such potential experimental bias has anyimpact on our results we calculated the partial EndashD correla-tion after controlling for the double mutant fitness valuesAgain we observed a strong negative EndashD correlation (partialqfrac14017 Pfrac14 13 109 Nfrac14 1254) This result was notunexpected because the fitness of double mutants andgene distance was only weakly correlated (qfrac14 0086Pfrac14 0002 Nfrac14 1254) More discussion on the caveats inthe analysis of empirical data is included in supplementarynote S2 Supplementary Material online

In a recently study 1800 genetic suppression interac-tions were identified in the budding yeast (van Leeuwen et al2016) which can be regarded as an extreme form of positivegenetic interactions Among them nine gene pairs are tightlylinked (D 2 supplementary table S3 SupplementaryMaterial online) This number is significantly greater thanthe random expectation (permutation test Pfrac14 001) againsupporting our model

The selection on clusters of locally adapted alleles whichsometimes were captured in chromosomal inversions hasbeen studied previously (Yeaman 2013 Kirkpatrick 2017) Inour study we proposed a hypothesis (the negative EndashD cor-relation) on deleterious alleles and tested this hypothesis withanalyses on empirical data and simulation both in the con-text of genetic interaction networks Therefore our resultsprovide additional clues for understanding the basic princi-ples of genome organization In particular the origin andmaintenance of clusters of genes in the same metabolic path-way which have puzzled scientists for years (Wong and Wolfe2005 Slot and Rokas 2010 Lang and Botstein 2011) could bewell explained by genetic interactions Genes in a linear met-abolic pathway exhibit positive epistasis because 1) doublemutants and single mutants have identical impacts ondestroying the function of the pathway (He et al 2010)and 2) the accumulation of deleterious intermediate prod-ucts resulting from a loss-of-function mutation in a down-stream gene of a metabolic pathway may be prevented by aloss-of-function mutation in an upstream gene (Wong andWolfe 2005 Slot and Rokas 2010 Lang and Botstein 2011)Because genetic linkage is advantageous among genes withpositive epistasis the clustering of genes in the same linearmetabolic pathway such as genes in the galactose utilizationpathway or allantoin degradation pathway is favored by nat-ural selection (Wong and Wolfe 2005 Slot and Rokas 2010)

Many factors have been reported to influence the evolu-tion of gene order such as tandem gene duplication positioneffects on gene expression noise (Batada and Hurst 2007Chen and Zhang 2016) coordinated gene expression amongneighboring genes (Cho et al 1998 Cohen et al 2000Boutanaev et al 2002 Spellman and Rubin 2002

Williams and Bowles 2004) clustering of functionally relatedgenes (Wong and Wolfe 2005 Slot and Rokas 2010) amongmany others In this study we provided evidence that inaddition to these factors the genetic interaction networkalso played an important role in driving the evolution ofgene order Because the empirical data of epistasis (fitnessvalues) are available in the budding yeast an evolutionarysimulation integrating many genetic interactions is possiblewhich makes epistasis unique among all factors driving theevolution of gene order Based on the empirical data of yeastgenetic interaction network our simulation indicates that theselective coefficient is on the order of 107 suggesting thatepistasis may play an important role in determining geneorder in yeast a species with a relative large effective popu-lation size (107) A negative EndashD correlation is expected to beobserved in species with a smaller effective population sizeonly if the range of epistasis is larger than that in yeast (figs 2and 3) Because the genome-wide empirical data of epistasisare unavailable in species with a smaller effective populationsize (eg humans and flies) it requires further investigationswhether the genetic interaction network plays a role in theevolution of gene order in these species in the future

Materials and Methods

GenomesThe genome annotation of S cerevisiae was downloaded fromthe Saccharomyces Genome Database (SGD httpwwwyeastgenomeorg version R64) The gene order of the recon-structed ancestor before the WGD (Gordon et al 2009) wasdownloaded from the Yeast Gene Order Browser (YGOB)(Byrne and Wolfe 2005)

SimulationTo examine the fitness effect of gene distance (D) two-locusdynamics under selection were simulated for populationswith different D (from 0 to 100 in 1-gene increments) aswell as a series of epistasis values (E from 0004 to 0004in 0001 increments) between two loci The recombinationfrequency (R) between these two loci was estimated from Dusing the equation below

R frac14 D 0004thorn 0064

which is the linear relationship estimated from the distancebetween two genes (D) and the empirical recombinationfrequencies (R) between them quantified in a previous study(Mancera et al 2008) The simulation was performed in hap-loid organisms in order to be in alignment with the analysesof empirical data in which epistasis values were quantified inhaploid yeast (Costanzo et al 2010 2016) A and B are wild-type alleles of two di-allelic loci on the same chromosome aand b are their deleterious alleles with fitness values ofxaBfrac14xAbfrac14 0992 The fitness of the wild-type (xAB) wasdefined as 1 and the epistasis E was defined as xabxaBxAbThe initial allele frequencies of a and b were both 01 The twoloci were initially under linkage equilibrium and therefore thefrequencies of AB(XAB) Ab(XAb) aB(XaB) and ab(Xab) were081 009 009 and 001 respectively After random mating

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3263Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

selection and recombination the frequencies of the fourgenotypes in the next generation were calculated with thefollowing equations (Nei 1967)

X0

AB frac14XABxAB

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

Ab frac14XAbxAb

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

aB frac14XaBxaB

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

ab frac14Xabxab

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

x is the average fitness of a population and was calculatedas follows

x frac14 XABxAB thorn Xabxab thorn XaBxaB thorn XAbxAb

For each population the average fitness (x) and thefrequencies of deleterious alleles (Xa and Xb) wererecorded in each generation of the simulation The differ-ence between the average fitness of two populations withdifferent D (Dfrac14 50 and Dfrac14 0) reflects whether linkage isfavored by natural selection or not The frequency differ-ence of the deleterious alleles between these two popu-lations is related to the long-term effects ofrecombination We also performed the simulation witha variety of parameter values (supplementary fig S4Supplementary Material online)

The average fitness of a population at the 100th generation(x100) was used to infer the fitness effect of gene distance at agiven epistasis value For each epistasis value we could iden-tify an optimal gene distance (Dopt) as well as a series ofpermitted gene distances (Dpermitted) The difference in x100

between Dpermitted and Dopt was lt107 The mean of allDpermitted values was calculated In addition our simulationdid not introduce new mutations to the population andtherefore the frequencies of deleterious alleles reduced overgenerations Nevertheless the negative EndashD correlation wasalso observed at the 50th and 200th generation (supplemen-tary fig S3 Supplementary Material online)

Toy MotifsWe built a toy motif in a genetic interaction network tosimulate the fitness effect of gene distance In the star-likemotif (fig 2A) a hub gene (A) interacted with nine partners(genes B C D E F G H I and J) with nine epistasis valuesranging from 0004 to 0004 in 0001 increments Weattempted to place nine partner genes on the same side ofthe hub gene on a chromosome Because we focused on theepistasis between the hub gene and the partners placingthe partners on different sides would not affect the resultsWe fixed the chromosomal location of the hub gene and thepartner genes were placed at Dfrac14 20 30 40 50 60 70 80 90and 100 from the hub gene The locations of the nine partnerswere shuffled while keeping the epistasis values unchanged

The fitness of a gene order was defined as the average x100 forall nine hub-containing gene pairs Thus we could calculatethe fitness for a total of (9frac14) 362880 possible gene ordersOther parameters (epistasis fitness defect D and initial allelefrequency) were also used to test whether the simulationresult was parameter sensitive (supplementary fig S5Supplementary Material online) Here epistasis values andfitness defects were randomly chosen from the empiricaldata in previous studies (Costanzo et al 2010 2016)

For the all-connected motif (fig 3A) five genes (A B C Dand E) interacted with each other with ten epistasis valuesranging from 00036 to 00036 in 00008 increments Thefive genes were placed on the same chromosome and thegene locations were set as 1 19 65 93 and 102 so that themaximum D in the motif was 100 The locations of these fivegenes were shuffled whereas the epistasis values were keptunchanged Similar to the star-like motifs the fitness of a geneorder was defined as the average x100 for all ten gene pairsThe fitness was calculated for a total of (5frac14) 120 possiblegene orders Other parameters (epistasis fitness defect D andinitial allele frequency) were also used to determine whetherthe simulation result was parameter sensitive (supplementaryfig S6 Supplementary Material online)

EpistasisEpistasis values were retrieved from the studies of Costanzoet al (Costanzo et al 2010 2016) Because the epistasis valuesof the overlapping gene pairs in these studies were highlycorrelated (rfrac14 071 Plt 10100 Pearsonrsquos correlationNfrac14 2604539) we merged their epistasis values A pair ofgenes was filtered if (a) the epistasis value between theirnull mutations was not a number (NaN) or (b) the epistasisvalues of the same null mutation pair had opposite signs inreciprocal crosses or in different studies We also removed thegene pairs with the null mutation in at least one gene result-ing in a higher fitness value than that of the wild-type Thisfiltering was performed because only a small number of an-tagonistic pleiotropic genes were detected in rich media(Qian et al 2012) Thus the elevated fitness observed upongene deletion in Costanzo et al was likely due to inaccurateestimation of fitness If epistasis between a pair of genes wasexamined multiple times we used the epistasis value with thesmallest P value Following Costanzo et al (2010) epistasisvalues with Plt 005 were classified as significant

In addition to the two data sets from Costanzo et al sev-eral other studies have also estimated epistasis values usinghigh-throughput strategies (Boone et al 2007) In principlewe could have included all of them in this study However forpractical reasons these data sets were not suitable for ourstudy First it would not be appropriate to combine otherdata sets with that of Costanzo et al because epistasis valuesfrom multiple studies potentially have different sources oferror especially if they followed different protocols Secondthe sample sizes of epistasis data of linked genes from otherstudies were not sufficiently large to examine the EndashD corre-lation Because the data set from Costanzo et al representsthe largest available so far containing 27 times more

Yang et al doi101093molbevmsx264 MBE

3264Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 10: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

multiple gene pairs may often involve the rearrangementof genes on a chromosome

It is also worth noting that synthetic genetic array (SGA)the experimental strategy used to generate double mutants isbased on recombination between two null alleles (Tong et al2001 Costanzo et al 2010) Thus double mutants for linkedgenes may have a smaller initial frequency which may lead toinaccuracy in estimating fitness values for these mutants Todetermine whether such potential experimental bias has anyimpact on our results we calculated the partial EndashD correla-tion after controlling for the double mutant fitness valuesAgain we observed a strong negative EndashD correlation (partialqfrac14017 Pfrac14 13 109 Nfrac14 1254) This result was notunexpected because the fitness of double mutants andgene distance was only weakly correlated (qfrac14 0086Pfrac14 0002 Nfrac14 1254) More discussion on the caveats inthe analysis of empirical data is included in supplementarynote S2 Supplementary Material online

In a recently study 1800 genetic suppression interac-tions were identified in the budding yeast (van Leeuwen et al2016) which can be regarded as an extreme form of positivegenetic interactions Among them nine gene pairs are tightlylinked (D 2 supplementary table S3 SupplementaryMaterial online) This number is significantly greater thanthe random expectation (permutation test Pfrac14 001) againsupporting our model

The selection on clusters of locally adapted alleles whichsometimes were captured in chromosomal inversions hasbeen studied previously (Yeaman 2013 Kirkpatrick 2017) Inour study we proposed a hypothesis (the negative EndashD cor-relation) on deleterious alleles and tested this hypothesis withanalyses on empirical data and simulation both in the con-text of genetic interaction networks Therefore our resultsprovide additional clues for understanding the basic princi-ples of genome organization In particular the origin andmaintenance of clusters of genes in the same metabolic path-way which have puzzled scientists for years (Wong and Wolfe2005 Slot and Rokas 2010 Lang and Botstein 2011) could bewell explained by genetic interactions Genes in a linear met-abolic pathway exhibit positive epistasis because 1) doublemutants and single mutants have identical impacts ondestroying the function of the pathway (He et al 2010)and 2) the accumulation of deleterious intermediate prod-ucts resulting from a loss-of-function mutation in a down-stream gene of a metabolic pathway may be prevented by aloss-of-function mutation in an upstream gene (Wong andWolfe 2005 Slot and Rokas 2010 Lang and Botstein 2011)Because genetic linkage is advantageous among genes withpositive epistasis the clustering of genes in the same linearmetabolic pathway such as genes in the galactose utilizationpathway or allantoin degradation pathway is favored by nat-ural selection (Wong and Wolfe 2005 Slot and Rokas 2010)

Many factors have been reported to influence the evolu-tion of gene order such as tandem gene duplication positioneffects on gene expression noise (Batada and Hurst 2007Chen and Zhang 2016) coordinated gene expression amongneighboring genes (Cho et al 1998 Cohen et al 2000Boutanaev et al 2002 Spellman and Rubin 2002

Williams and Bowles 2004) clustering of functionally relatedgenes (Wong and Wolfe 2005 Slot and Rokas 2010) amongmany others In this study we provided evidence that inaddition to these factors the genetic interaction networkalso played an important role in driving the evolution ofgene order Because the empirical data of epistasis (fitnessvalues) are available in the budding yeast an evolutionarysimulation integrating many genetic interactions is possiblewhich makes epistasis unique among all factors driving theevolution of gene order Based on the empirical data of yeastgenetic interaction network our simulation indicates that theselective coefficient is on the order of 107 suggesting thatepistasis may play an important role in determining geneorder in yeast a species with a relative large effective popu-lation size (107) A negative EndashD correlation is expected to beobserved in species with a smaller effective population sizeonly if the range of epistasis is larger than that in yeast (figs 2and 3) Because the genome-wide empirical data of epistasisare unavailable in species with a smaller effective populationsize (eg humans and flies) it requires further investigationswhether the genetic interaction network plays a role in theevolution of gene order in these species in the future

Materials and Methods

GenomesThe genome annotation of S cerevisiae was downloaded fromthe Saccharomyces Genome Database (SGD httpwwwyeastgenomeorg version R64) The gene order of the recon-structed ancestor before the WGD (Gordon et al 2009) wasdownloaded from the Yeast Gene Order Browser (YGOB)(Byrne and Wolfe 2005)

SimulationTo examine the fitness effect of gene distance (D) two-locusdynamics under selection were simulated for populationswith different D (from 0 to 100 in 1-gene increments) aswell as a series of epistasis values (E from 0004 to 0004in 0001 increments) between two loci The recombinationfrequency (R) between these two loci was estimated from Dusing the equation below

R frac14 D 0004thorn 0064

which is the linear relationship estimated from the distancebetween two genes (D) and the empirical recombinationfrequencies (R) between them quantified in a previous study(Mancera et al 2008) The simulation was performed in hap-loid organisms in order to be in alignment with the analysesof empirical data in which epistasis values were quantified inhaploid yeast (Costanzo et al 2010 2016) A and B are wild-type alleles of two di-allelic loci on the same chromosome aand b are their deleterious alleles with fitness values ofxaBfrac14xAbfrac14 0992 The fitness of the wild-type (xAB) wasdefined as 1 and the epistasis E was defined as xabxaBxAbThe initial allele frequencies of a and b were both 01 The twoloci were initially under linkage equilibrium and therefore thefrequencies of AB(XAB) Ab(XAb) aB(XaB) and ab(Xab) were081 009 009 and 001 respectively After random mating

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3263Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

selection and recombination the frequencies of the fourgenotypes in the next generation were calculated with thefollowing equations (Nei 1967)

X0

AB frac14XABxAB

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

Ab frac14XAbxAb

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

aB frac14XaBxaB

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

ab frac14Xabxab

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

x is the average fitness of a population and was calculatedas follows

x frac14 XABxAB thorn Xabxab thorn XaBxaB thorn XAbxAb

For each population the average fitness (x) and thefrequencies of deleterious alleles (Xa and Xb) wererecorded in each generation of the simulation The differ-ence between the average fitness of two populations withdifferent D (Dfrac14 50 and Dfrac14 0) reflects whether linkage isfavored by natural selection or not The frequency differ-ence of the deleterious alleles between these two popu-lations is related to the long-term effects ofrecombination We also performed the simulation witha variety of parameter values (supplementary fig S4Supplementary Material online)

The average fitness of a population at the 100th generation(x100) was used to infer the fitness effect of gene distance at agiven epistasis value For each epistasis value we could iden-tify an optimal gene distance (Dopt) as well as a series ofpermitted gene distances (Dpermitted) The difference in x100

between Dpermitted and Dopt was lt107 The mean of allDpermitted values was calculated In addition our simulationdid not introduce new mutations to the population andtherefore the frequencies of deleterious alleles reduced overgenerations Nevertheless the negative EndashD correlation wasalso observed at the 50th and 200th generation (supplemen-tary fig S3 Supplementary Material online)

Toy MotifsWe built a toy motif in a genetic interaction network tosimulate the fitness effect of gene distance In the star-likemotif (fig 2A) a hub gene (A) interacted with nine partners(genes B C D E F G H I and J) with nine epistasis valuesranging from 0004 to 0004 in 0001 increments Weattempted to place nine partner genes on the same side ofthe hub gene on a chromosome Because we focused on theepistasis between the hub gene and the partners placingthe partners on different sides would not affect the resultsWe fixed the chromosomal location of the hub gene and thepartner genes were placed at Dfrac14 20 30 40 50 60 70 80 90and 100 from the hub gene The locations of the nine partnerswere shuffled while keeping the epistasis values unchanged

The fitness of a gene order was defined as the average x100 forall nine hub-containing gene pairs Thus we could calculatethe fitness for a total of (9frac14) 362880 possible gene ordersOther parameters (epistasis fitness defect D and initial allelefrequency) were also used to test whether the simulationresult was parameter sensitive (supplementary fig S5Supplementary Material online) Here epistasis values andfitness defects were randomly chosen from the empiricaldata in previous studies (Costanzo et al 2010 2016)

For the all-connected motif (fig 3A) five genes (A B C Dand E) interacted with each other with ten epistasis valuesranging from 00036 to 00036 in 00008 increments Thefive genes were placed on the same chromosome and thegene locations were set as 1 19 65 93 and 102 so that themaximum D in the motif was 100 The locations of these fivegenes were shuffled whereas the epistasis values were keptunchanged Similar to the star-like motifs the fitness of a geneorder was defined as the average x100 for all ten gene pairsThe fitness was calculated for a total of (5frac14) 120 possiblegene orders Other parameters (epistasis fitness defect D andinitial allele frequency) were also used to determine whetherthe simulation result was parameter sensitive (supplementaryfig S6 Supplementary Material online)

EpistasisEpistasis values were retrieved from the studies of Costanzoet al (Costanzo et al 2010 2016) Because the epistasis valuesof the overlapping gene pairs in these studies were highlycorrelated (rfrac14 071 Plt 10100 Pearsonrsquos correlationNfrac14 2604539) we merged their epistasis values A pair ofgenes was filtered if (a) the epistasis value between theirnull mutations was not a number (NaN) or (b) the epistasisvalues of the same null mutation pair had opposite signs inreciprocal crosses or in different studies We also removed thegene pairs with the null mutation in at least one gene result-ing in a higher fitness value than that of the wild-type Thisfiltering was performed because only a small number of an-tagonistic pleiotropic genes were detected in rich media(Qian et al 2012) Thus the elevated fitness observed upongene deletion in Costanzo et al was likely due to inaccurateestimation of fitness If epistasis between a pair of genes wasexamined multiple times we used the epistasis value with thesmallest P value Following Costanzo et al (2010) epistasisvalues with Plt 005 were classified as significant

In addition to the two data sets from Costanzo et al sev-eral other studies have also estimated epistasis values usinghigh-throughput strategies (Boone et al 2007) In principlewe could have included all of them in this study However forpractical reasons these data sets were not suitable for ourstudy First it would not be appropriate to combine otherdata sets with that of Costanzo et al because epistasis valuesfrom multiple studies potentially have different sources oferror especially if they followed different protocols Secondthe sample sizes of epistasis data of linked genes from otherstudies were not sufficiently large to examine the EndashD corre-lation Because the data set from Costanzo et al representsthe largest available so far containing 27 times more

Yang et al doi101093molbevmsx264 MBE

3264Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 11: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

selection and recombination the frequencies of the fourgenotypes in the next generation were calculated with thefollowing equations (Nei 1967)

X0

AB frac14XABxAB

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

Ab frac14XAbxAb

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

aB frac14XaBxaB

xthorn R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

X0

ab frac14Xabxab

x R XABxABXabxab XaBxaBXAbxAbeth THORN

x2

x is the average fitness of a population and was calculatedas follows

x frac14 XABxAB thorn Xabxab thorn XaBxaB thorn XAbxAb

For each population the average fitness (x) and thefrequencies of deleterious alleles (Xa and Xb) wererecorded in each generation of the simulation The differ-ence between the average fitness of two populations withdifferent D (Dfrac14 50 and Dfrac14 0) reflects whether linkage isfavored by natural selection or not The frequency differ-ence of the deleterious alleles between these two popu-lations is related to the long-term effects ofrecombination We also performed the simulation witha variety of parameter values (supplementary fig S4Supplementary Material online)

The average fitness of a population at the 100th generation(x100) was used to infer the fitness effect of gene distance at agiven epistasis value For each epistasis value we could iden-tify an optimal gene distance (Dopt) as well as a series ofpermitted gene distances (Dpermitted) The difference in x100

between Dpermitted and Dopt was lt107 The mean of allDpermitted values was calculated In addition our simulationdid not introduce new mutations to the population andtherefore the frequencies of deleterious alleles reduced overgenerations Nevertheless the negative EndashD correlation wasalso observed at the 50th and 200th generation (supplemen-tary fig S3 Supplementary Material online)

Toy MotifsWe built a toy motif in a genetic interaction network tosimulate the fitness effect of gene distance In the star-likemotif (fig 2A) a hub gene (A) interacted with nine partners(genes B C D E F G H I and J) with nine epistasis valuesranging from 0004 to 0004 in 0001 increments Weattempted to place nine partner genes on the same side ofthe hub gene on a chromosome Because we focused on theepistasis between the hub gene and the partners placingthe partners on different sides would not affect the resultsWe fixed the chromosomal location of the hub gene and thepartner genes were placed at Dfrac14 20 30 40 50 60 70 80 90and 100 from the hub gene The locations of the nine partnerswere shuffled while keeping the epistasis values unchanged

The fitness of a gene order was defined as the average x100 forall nine hub-containing gene pairs Thus we could calculatethe fitness for a total of (9frac14) 362880 possible gene ordersOther parameters (epistasis fitness defect D and initial allelefrequency) were also used to test whether the simulationresult was parameter sensitive (supplementary fig S5Supplementary Material online) Here epistasis values andfitness defects were randomly chosen from the empiricaldata in previous studies (Costanzo et al 2010 2016)

For the all-connected motif (fig 3A) five genes (A B C Dand E) interacted with each other with ten epistasis valuesranging from 00036 to 00036 in 00008 increments Thefive genes were placed on the same chromosome and thegene locations were set as 1 19 65 93 and 102 so that themaximum D in the motif was 100 The locations of these fivegenes were shuffled whereas the epistasis values were keptunchanged Similar to the star-like motifs the fitness of a geneorder was defined as the average x100 for all ten gene pairsThe fitness was calculated for a total of (5frac14) 120 possiblegene orders Other parameters (epistasis fitness defect D andinitial allele frequency) were also used to determine whetherthe simulation result was parameter sensitive (supplementaryfig S6 Supplementary Material online)

EpistasisEpistasis values were retrieved from the studies of Costanzoet al (Costanzo et al 2010 2016) Because the epistasis valuesof the overlapping gene pairs in these studies were highlycorrelated (rfrac14 071 Plt 10100 Pearsonrsquos correlationNfrac14 2604539) we merged their epistasis values A pair ofgenes was filtered if (a) the epistasis value between theirnull mutations was not a number (NaN) or (b) the epistasisvalues of the same null mutation pair had opposite signs inreciprocal crosses or in different studies We also removed thegene pairs with the null mutation in at least one gene result-ing in a higher fitness value than that of the wild-type Thisfiltering was performed because only a small number of an-tagonistic pleiotropic genes were detected in rich media(Qian et al 2012) Thus the elevated fitness observed upongene deletion in Costanzo et al was likely due to inaccurateestimation of fitness If epistasis between a pair of genes wasexamined multiple times we used the epistasis value with thesmallest P value Following Costanzo et al (2010) epistasisvalues with Plt 005 were classified as significant

In addition to the two data sets from Costanzo et al sev-eral other studies have also estimated epistasis values usinghigh-throughput strategies (Boone et al 2007) In principlewe could have included all of them in this study However forpractical reasons these data sets were not suitable for ourstudy First it would not be appropriate to combine otherdata sets with that of Costanzo et al because epistasis valuesfrom multiple studies potentially have different sources oferror especially if they followed different protocols Secondthe sample sizes of epistasis data of linked genes from otherstudies were not sufficiently large to examine the EndashD corre-lation Because the data set from Costanzo et al representsthe largest available so far containing 27 times more

Yang et al doi101093molbevmsx264 MBE

3264Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 12: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

significant epistasis than the total of all other studies wefocused on these epistasis data in this study

Recombination FrequencyGenotypes of meiosis products from 46 tetrads were down-loaded from Mancera et al (2008) at httpwwwebiacukhuberrecombination A pair of DNA markers was filteredif the genotypes were available in less than half of the spores(lt 92 spores) For each pair of markers spores with parentalgenotypes (Np) and nonparental genotypes (Nnp) werecounted The recombination frequency of a pair of markers(Rm) was calculated as follows

Rm frac14 Nnp=ethNp thorn NnpTHORN

The recombination frequency between a pair of genes (Rg)wasdefinedastheaveragerecombinationfrequencyofallmarkerpairs within the pair of genes which was calculated as follows

Rg frac14X

ifrac141

X

jfrac141

Rm ij=ij

Subscripts i and j represent ith and jth markers in eachgene respectively

Identification of Duplicate GenesAll-against-all BlastP was performed to search for duplicategenes Gene pairs with E valueslt 1010 were defined as du-plicate genes

Expression SimilarityExpression profiles of 6359 genes in 40 studies were compiledby a previous study (Kafri et al 2005) Pearsonrsquos correlationcoefficient for two genes was calculated within each study inthe compiled data set and expression similarity betweenthese two genes was defined as the average correlation coef-ficient among studies

Other Yeast Functional Genomic DataProteinndashprotein interaction data were obtained from theSGD Three-dimensional chromatin colocalization data wereretrieved from Duan et al (2010) Genome-wide gene expres-sion noise data were downloaded from Newman et al (2006)A list of essential genes was downloaded from the Database ofEssential Genes (DEG version 106) (Luo et al 2014) Semanticsimilarity of Gene Ontology (GO) terms was calculated usingthe R package GOSemSim (version 1220) (Yu et al 2010)

Shuffling of Gene Positions or Epistasis ValuesTo obtain the null distribution of EndashD correlation coefficientswe shuffled gene positions while keeping the genome structureof S cerevisiae unchanged (ie number of chromosomes andnumber of genes on each chromosome) The epistasis value ofeach gene pair was also kept unchanged Gene distances werethen calculated based on the new genomic locations

To estimate the relative importance of different epistasiscategories (positive and negative) in shaping the EndashD corre-lation we shuffled epistasis values of one category while

keeping the other category unchanged Gene positionswere not changed during this process

Estimating the Predictive Accuracy of Gene OrderTo examine the accuracy of our prediction of gene order weestimated the success rate by chance (gray bars in fig 6 andsupplementary figs S8 and S9 Supplementary Material online)by randomly assigning the positions of genes 1000 times Foreach permutation the average proportion of successful pre-dictions among motifs was estimated The mean and standarddeviation among the 1000 permutations were then calculated

Supplementary MaterialSupplementary data are available at Molecular Biology andEvolution online

Author ContributionsY-FY WC and WQ conceived the research Y-FY WCand WQ analyzed the data and Y-FY SW and WQ wrotethe article

AcknowledgmentsWe thank Bin He and Weiwei Zhai for discussion This workwas supported by grants from the National Natural ScienceFoundation of China to WQ (91731302)

ReferencesBatada NN Hurst LD 2007 Evolution of chromosome organization

driven by selection for reduced gene expression noise Nat Genet39(8)945ndash949

Boone C Bussey H Andrews BJ 2007 Exploring genetic interactions andnetworks with yeast Nat Rev Genet 8(6)437ndash449

Boutanaev AM Kalmykova AI Shevelyov YY Nurminsky DI 2002 Largeclusters of co-expressed genes in the Drosophila genome Nature420(6916)666ndash669

Byrne KP Wolfe KH 2005 The Yeast Gene Order Browser combiningcurated homology and syntenic context reveals gene fate in poly-ploid species Genome Res 15(10)1456ndash1461

Charlesworth B 1990 Mutation-selection balance and the evolutionaryadvantage of sex and recombination Genet Res 55(3)199ndash221

Charlesworth D Charlesworth B 2011 Mimicry the hunting of thesupergene Curr Biol 21(20)R846ndashR848

Chen X Zhang J 2016 The genomic landscape of position effects onprotein expression level and noise in yeast Cell Syst 2(5)347ndash354

Cho RJ Campbell MJ Winzeler EA Steinmetz L Conway A Wodicka LWolfsberg TG Gabrielian AE Landsman D Lockhart DJ et al 1998 Agenome-wide transcriptional analysis of the mitotic cell cycle MolCell 2(1)65ndash73

Cohen BA Mitra RD Hughes JD Church GM 2000 A computationalanalysis of whole-genome expression data reveals chromosomaldomains of gene expression Nat Genet 26(2)183ndash186

Costanzo M Baryshnikova A Bellay J Kim Y Spear ED Sevier CS Ding HKoh JL Toufighi K Mostafavi S et al 2010 The genetic landscape of acell Science 327(5964)425ndash431

Costanzo M VanderSluis B Koch EN Baryshnikova A Pons C Tan GWang W Usaj M Hanchard J Lee SD et al 2016 A global geneticinteraction network maps a wiring diagram of cellular functionScience 353(6306)aaf1420

Dean EJ Davis JC Davis RW Petrov DA 2008 Pervasive and persistentredundancy among duplicated genes in yeast PLoS Genet4(7)e1000113

Epistasis-Driven Evolution of Gene Order doi101093molbevmsx264 MBE

3265Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017

Page 13: Genetic Interaction Network as an Important Determinant of Gene Order …qianlab.genetics.ac.cn/pdf/Qian18.pdf · 2019-09-29 · Genetic Interaction Network as an Important Determinant

DeLuna A Vetsigian K Shoresh N Hegreness M Colon-Gonzalez MChao S Kishony R 2008 Exposing the fitness contribution of dupli-cated genes Nat Genet 40(5)676ndash681

Dixon SJ Fedyshyn Y Koh JL Prasad TS Chahwan C Chua G Toufighi KBaryshnikova A Hayles J Hoe KL et al 2008 Significant conservationof synthetic lethal genetic interaction networks between distantlyrelated eukaryotes Proc Natl Acad Sci U S A 105(43)16653ndash16658

Duan Z Andronescu M Schutz K McIlwain S Kim YJ Lee C Shendure JFields S Blau CA Noble WS 2010 A three-dimensional model of theyeast genome Nature 465(7296)363ndash367

Eshel I Feldman MW 1970 On the evolutionary effect of recombinationTheor Popul Biol 1(1)88ndash100

Feldman MW Christiansen FB Brooks LD 1980 Evolution of recombi-nation in a constant environment Proc Natl Acad Sci U S A77(8)4838ndash4841

Ghanbarian AT Hurst LD 2015 Neighboring genes show correlatedevolution in gene expression Mol Biol Evol 32(7)1748ndash1766

Gordon JL Byrne KP Wolfe KH 2009 Additions losses and rearrange-ments on the evolutionary route from a reconstructed ancestor tothe modern Saccharomyces cerevisiae genome PLoS Genet5(5)e1000485

He X Qian W Wang Z Li Y Zhang J 2010 Prevalent positive epistasis inEscherichia coli and Saccharomyces cerevisiae metabolic networksNat Genet 42(3)272ndash276

Hurst LD Pal C Lercher MJ 2004 The evolutionary dynamics of eukary-otic gene order Nat Rev Genet 5(4)299ndash310

Kafri R Bar-Even A Pilpel Y 2005 Transcription control reprogrammingin genetic backup circuits Nat Genet 37(3)295ndash299

Kellis M Birren BW Lander ES 2004 Proof and evolutionary analysis ofancient genome duplication in the yeast Saccharomyces cerevisiaeNature 428(6983)617ndash624

Kellis M Patterson N Endrizzi M Birren B Lander ES 2003 Sequencingand comparison of yeast species to identify genes and regulatoryelements Nature 423(6937)241ndash254

Kirkpatrick M 2017 The evolution of genome structure by natural andsexual selection J Hered 108(1)3ndash11

Kondrashov AS 1982 Selection against harmful mutations in large sex-ual and asexual populations Genet Res 40(3)325ndash332

Kondrashov AS 1988 Deleterious mutations and the evolution of sexualreproduction Nature 336(6198)435ndash440

Kouyos RD Silander OK Bonhoeffer S 2007 Epistasis between delete-rious mutations and the evolution of recombination Trends EcolEvol 22(6)308ndash315

Lang GI Botstein D 2011 A test of the coordinated expression hypoth-esis for the origin and maintenance of the GAL cluster in yeast PLoSOne 6(9)e25290

Lawrence J 1999 Selfish operons the evolutionary impact of gene clus-tering in prokaryotes and eukaryotes Curr Opin Genet Dev9(6)642ndash648

Lawrence JG 2002 Shared strategies in gene organization among pro-karyotes and eukaryotes Cell 110(4)407ndash413

Lercher MJ Urrutia AO Hurst LD 2002 Clustering of housekeepinggenes provides a unified model of gene order in the human genomeNat Genet 31(2)180ndash183

Liao BY Zhang J 2008 Coexpression of linked genes in Mammaliangenomes is generally disadvantageous Mol Biol Evol25(8)1555ndash1565

Liti G Carter DM Moses AM Warringer J Parts L James SA Davey RPRoberts IN Burt A Koufopanou V et al 2009 Population genomicsof domestic and wild yeasts Nature 458(7236)337ndash341

Luo H Lin Y Gao F Zhang CT Zhang R 2014 DEG 10 an update of thedatabase of essential genes that includes both protein-coding genesand noncoding genomic elements Nucleic Acids Res 42(Databaseissue)D574ndashD580

Mancera E Bourgon R Brozzi A Huber W Steinmetz LM 2008 High-resolution mapping of meiotic crossovers and non-crossovers inyeast Nature 454(7203)479ndash485

Musso G Costanzo M Huangfu M Smith AM Paw J San Luis BJ BooneC Giaever G Nislow C Emili A et al 2008 The extensive and

condition-dependent nature of epistasis among whole-genomeduplicates in yeast Genome Res 18(7)1092ndash1099

Nei M 1967 Modification of linkage intensity by natural selectionGenetics 57(3)625ndash641

Nei M 1969 Linkage modifications and sex difference in recombinationGenetics 63(3)681ndash699

Newman JR Ghaemmaghami S Ihmels J Breslow DK Noble M DeRisi JLWeissman JS 2006 Single-cell proteomic analysis of S cerevisiaerevealsthearchitectureofbiologicalnoiseNature441(7095)840ndash846

Pal C Hurst LD 2003 Evidence for co-evolution of gene order andrecombination rate Nat Genet 33(3)392ndash395

Phillips PC 2008 Epistasis ndash the essential role of gene interactions in thestructure and evolution of genetic systems Nat Rev Genet9(11)855ndash867

Qian W Liao BY Chang AY Zhang J 2010 Maintenance of duplicategenes and their functional redundancy by reduced expressionTrends Genet 26(10)425ndash430

Qian W Ma D Xiao C Wang Z Zhang J 2012 The genomic landscapeand evolutionary resolution of antagonistic pleiotropy in yeast CellRep 2(5)1399ndash1410

Qian W Zhang J 2008 Evolutionary dynamics of nematode operonseasy come slow go Genome Res 18(3)412ndash421

Qian W Zhang J 2014 Genomic evidence for adaptation by gene du-plication Genome Res 24(8)1356ndash1362

Schacherer J Shapiro JA Ruderfer DM Kruglyak L 2009 Comprehensivepolymorphism survey elucidates population structure ofSaccharomyces cerevisiae Nature 458(7236)342ndash345

Slot JC Rokas A 2010 Multiple GAL pathway gene clusters evolvedindependently and by different mechanisms in fungi Proc NatlAcad Sci U S A 107(22)10136ndash10141

Spellman PT Rubin GM 2002 Evidence for large domains of similarlyexpressed genes in the Drosophila genome J Biol 1(1)5

Tischler J Lehner B Chen N Fraser AG 2006 Combinatorial RNA inter-ference in Caenorhabditis elegans reveals that redundancy betweengene duplicates can be maintained for more than 80 million years ofevolution Genome Biol 7(8)R69

Tischler J Lehner B Fraser AG 2008 Evolutionary plasticity of geneticinteraction networks Nat Genet 40(4)390ndash391

Tong AH Evangelista M Parsons AB Xu H Bader GD Page N RobinsonM Raghibizadeh S Hogue CW Bussey H et al 2001 Systematicgenetic analysis with ordered arrays of yeast deletion mutantsScience 294(5550)2364ndash2368

van Leeuwen J Pons C Mellor JC Yamaguchi TN Friesen H KoschwanezJ Usaj MM Pechlaner M Takar M Usaj M et al 2016 Exploringgenetic suppression interactions on a global scale Science354(6312)aag0839

VanderSluis B Bellay J Musso G Costanzo M Papp B Vizeacoumar FJBaryshnikova A Andrews B Boone C Myers CL 2010 Geneticinteractions reveal the evolutionary trajectories of duplicate genesMol Syst Biol 6429

Vavouri T Semple JI Lehner B 2008 Widespread conservation of geneticredundancy during a billion years of eukaryotic evolution TrendsGenet 24(10)485ndash488

Wagner A 2005 Energy constraints on the evolution of gene expressionMol Biol Evol 22(6)1365ndash1374

Williams EJ Bowles DJ 2004 Coexpression of neighboring genes in thegenome of Arabidopsis thaliana Genome Res 14(6)1060ndash1067

Wong S Wolfe KH 2005 Birth of a metabolic gene cluster in yeast byadaptive gene relocation Nat Genet 37(7)777ndash782

Wu CI Ting CT 2004 Genes and speciation Nat Rev Genet5(2)114ndash122

Yeaman S 2013 Genomic rearrangements and the evolution of clustersof locally adaptive loci Proc Natl Acad Sci U S A110(19)E1743ndashE1751 17

Yu G Li F Qin Y Bo X Wu Y Wang S 2010 GOSemSim an R packagefor measuring semantic similarity among GO terms and gene prod-ucts Bioinformatics 26(7)976ndash978

Zaslaver A Baugh LR Sternberg PW 2011 Metazoan operons acceleraterecovery from growth-arrested states Cell 145(6)981ndash992

Yang et al doi101093molbevmsx264 MBE

3266Downloaded from httpsacademicoupcommbearticle-abstract341232544318638by Institute of Geneties and Developmental BiologyCAS useron 03 December 2017