Adaptive Evolution of Gene Expression in Drosophila · Article Adaptive Evolution of Gene Expression in Drosophila Graphical Abstract Highlights d Adaptive evolution of gene expression

Article

Adaptive Evolution of Gene
Expression in Drosophila
Graphical Abstract

Highlights

d Adaptive evolution of gene expression is pervasive in

Drosophila

d Stabilization and adaptation of gene expression follow

distinct molecular clocks

d Gene function determines the rate of expression adaptation

d Sex-specific adaptation of gene expression occurs

predominantly in males

Nourmohammad et al., 2017, Cell Reports 20, 1385–1395August 8, 2017 ª 2017 The Authors.http://dx.doi.org/10.1016/j.celrep.2017.07.033

Authors

Armita Nourmohammad,

Joachim Rambeau, Torsten Held,

Viera Kovacova, Johannes Berg,

Michael Lassig

[email protected] (A.N.),[email protected] (M.L.)

In Brief

Drosophila presents an evolutionary

conundrum: there is ubiquitous genomic

adaptation, yet it has been impossible to

identify system-wide signals of

adaptation for gene expression.

Nourmohammad et al. develop a method

to infer stabilizing and directional

selection from expression data. They

show that adaptation dominates the

evolution of gene expression in

Drosophila.

mailto:[email protected]


http://dx.doi.org/10.1016/j.celrep.2017.07.033

http://crossmark.crossref.org/dialog/?doi=10.1016/j.celrep.2017.07.033&domain=pdf

Cell Reports

Article

Adaptive Evolution of GeneExpression in DrosophilaArmita Nourmohammad,1,4,* JoachimRambeau,2 Torsten Held,2 Viera Kovacova,3 Johannes Berg,2 andMichael Lassig2,*1Joseph-Henri Laboratories of Physics and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA2Institut f€ur Theoretische Physik, Universitat zu Koln, Z€ulpicher Str. 77, 50937 Koln, Germany3CECAD, Universitat zu Koln, Joseph-Stelzmann-Str. 26, 50931 Koln, Germany4Lead Contact

*Correspondence: [email protected] (A.N.), [email protected] (M.L.)http://dx.doi.org/10.1016/j.celrep.2017.07.033

SUMMARY

Gene expression levels are important quantitativetraits that link genotypes to molecular functions andfitness. In Drosophila, population-genetic studieshave revealed substantial adaptive evolution at thegenomic level, but the evolutionary modes of geneexpression remain controversial. Here, we presentevidence that adaptation dominates the evolution ofgene expression levels in flies. We show that 64%of the observed expression divergence across sevenDrosophila species are adaptive changes driven bydirectional selection. Our results are derived fromtime-resolved data of gene expression divergenceacross a family of related species, using a probabi-listic inference method for gene-specific selection.Adaptive gene expression is stronger in specific func-tional classes, including regulation, sensory percep-tion, sexual behavior, and morphology. Moreover,we identify a large group of genes with sex-specificadaptation of expression, which predominantly oc-curs in males. Our analysis opens an avenue to mapsystem-wide selection on molecular quantitativetraits independently of their genetic basis.

INTRODUCTION

Several studies have found evidence for widespread adaptive

evolution of the Drosophila genome (Andolfatto, 2005; Mustonen

and Lassig, 2007; Sella et al., 2009). This includes adaptive

changes in the non-coding sequence, consistent with classical

ideas on the importance of regulatory evolution for phenotypic

adaptation (King and Wilson, 1975). Gene expression levels are

important molecular phenotypes that quantify the effects of regu-

lation on organismic traits and fitness. Insights on how genome

evolution affects gene expression have come from studies of

quantitative trait loci (QTLs); see Fraser (2011); Romero et al.

(2012), and Pai et al. (2015) for reviews. These studies compare

lineage- or species-specific difference in the expression QTLs,

in line with Orr’s sign test for selection on quantitative traits

(Orr, 1998). Due to the limited number of QTLs, the sign test is

only applicable to gene groups that have been pre-determined

based on criteria other than selection on expression levels. In

CellThis is an open access article under the CC BY-N

yeast, at least 10% of the genes have been inferred to undergo

adaptive evolution of expression (Fraser et al., 2010). By extend-

ing the sign test to include information on outgroup species, it has

been possible to identify lineage-specific positive selection on

cis-regulatory expression QTLs in functional gene classes of

mice (Fraser et al., 2011) and plants (Riedel et al., 2015).

A similar approach has been used to correlate population-spe-

cific environmental variables with expression SNPs; this has

shown that local adaptation of the human population is driven

by gene expression in a number of gene classes (Fraser, 2013).

In flies, expression-QTL analysis has been used to estimate cis

and trans effects on expression (Genissel et al., 2008; Wittkopp

et al., 2008) and to compare the evolution of expression and

that of the underlying regulatory sequence (Coolon et al., 2014);

related studies have been performed in yeast (Bullard et al.,

2010; Artieri and Fraser, 2014). These QTL studies have brought

specific insights into modes of gene expression evolution in spe-

cific functional classes. However, given the complexity of the reg-

ulatory genotype-to-phenotype map and the limited sensitivity of

QTL studies, our understanding of how genome-wide adaptive

changes relate to mRNA and protein levels has remained incom-

plete (Hoekstra and Coyne, 2007; Fraser, 2011; Pai et al., 2015).

An alternative approach is to analyze the evolution of gene

expression by methods of quantitative genetics, without

explicit reference to genetic evolution of the QTL (Rifkin et al.,

2003; Khaitovich et al., 2004, 2005; Lemos et al., 2005; Rifkin

et al., 2005; Gilad et al., 2006; Whitehead and Crawford,

2006; Zhang et al., 2007; Bedford and Hartl, 2009; Fraser

et al., 2011; Romero et al., 2012; Pai et al., 2015). These studies

compare the expression divergence across species, the varia-

tion within species, and the expected behavior for neutral evo-

lution (Lynch and Hill, 1986). A broad picture of evolutionary

constraint on gene expression levels caused by stabilizing

selection has emerged in a number of species, including

Drosophila (Rifkin et al., 2003; Lemos et al., 2005; Rifkin et al.,

2005; Gilad et al., 2006; Bedford and Hartl, 2009; Romero

et al., 2012). Mutation accumulation experiments in Drosophila

show that the neutral expression divergence generated by

random mutations in the lab significantly exceeds the natural

expression variation, indicating strong negative selection on

most random mutations affecting gene expression (Rifkin

et al., 2005). A comparative study between human and chim-

panzee has produced signatures of predominantly neutral evo-

lution of gene expression (Khaitovich et al., 2004, 2005). Other

studies in primates have identified stabilizing selection, as well

Reports 20, 1385–1395, August 8, 2017 ª 2017 The Authors. 1385C-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).




http://crossmark.crossref.org/dialog/?doi=10.1016/j.celrep.2017.07.033&domain=pdf

http://creativecommons.org/licenses/by-nc-nd/4.0/

Figure 1. Phylogenetic Tree and Evolu-

tionary Distances of 7 Drosophila Species

Phylogeny of the Drosophila genus, as re-

constructed in Drosophila 12GenomesConsortium

et al. (2007) from synonymous sequence diver-

gence. Six clades are marked by colored triangles;

their ancestral nodes aremarkedby colored circles.

The table specifies the species contained in each of

the clades and the clade divergence time tC (see

Experimental Procedures).

as lineage- and tissue-specific directional expression changes

(Gilad et al., 2006; Blekhman et al., 2008; Brawand et al., 2011;

Romero et al., 2012). However, it has remained difficult to

demonstrate that positive selection, as opposed to relaxed

stabilizing selection, is the evolutionary cause of expression

divergence (Fraser, 2011). Thus, estimating the genome-wide

contribution of adaptation to the evolution of gene expression

is an outstanding problem.

In this paper, we show that adaptation is the prevalent evolu-

tionary mode of gene expression in the Drosophila genus. We

infer directional selection driving adaptation, together with con-

servation under stabilizing selection, and we show that these

forces act on different scales of evolutionary time. Our inference

is based on theoretical results on the evolution of molecular

quantitative traits (Held et al., 2014; Nourmohammad et al.,

2013a, 2013b), using solely the dependence of gene expression

divergence on the divergence time of 7 Drosophila species.

Moreover, the method only relies on the phenotypic observables

and does not depend on number and effects of the underlying

QTL; these molecular determinants of gene expression are often

unknown and vary considerably among genes.

RESULTS

Pattern of Gene Expression DivergenceWe use gene expression data from samples of males and females

(Zhang et al., 2007), which cover 6,332 orthologous genes in seven

Drosophila species. A phylogenetic tree of these species is shown

in Figure 1. The dataset of Zhang et al. (2007) is obtained from spe-

cies-specific microarrays, which makes it suited to cross-species

analysis. Gene expression levels are defined by a standard trans-

formation ofmRNAcounts,whichaccounts for differences in assay

1386 Cell Reports 20, 1385–1395, August 8, 2017

sensitivity among experimental probes

(Quackenbush, 2002). The transformation

method and its implications for evolu-

tionary analysis are detailed in Experi-

mental Procedures and Supplemental

Experimental Procedures (Figure S1). We

use these data to estimate the mean

expression level of a gene within each spe-

cies, its total heritable expression variance

D (referred to as expression diversity), and

its non-heritable expression variance be-

tween biological replicates. For each pair

of species, we obtain the cross-species

expression divergence D of a gene as the

squared difference between the species mean levels. Cross-spe-

cies differences in expression for a single gene are noisyand reflect

thephysiology of that gene,but averagesover all or large classesof

genes showaclear evolutionary pattern that can be comparedwith

model expectations. In particular, the time-dependent expression

divergence hDiji, where i, j labels a given pair of species and

angular brackets denote averages over genes, plays a central

role in our analysis, as explained in Box 1. We define the rescaled

divergence as

Uij =

Di j

D0

; (Equation 1)

where the trait scale D0 is defined such that Uijz1 for neutral

evolution in the limit of long divergence times (details of this defi-

nition are given in Experimental Procedures and Box 1). The evo-

lution of these divergencemeasures depends only weakly on the

effect distribution of expression QTL and on the amount of

recombination between these loci, which is key to quantitative

genetics approaches (Lynch and Hill, 1986; Leinonen et al.,

2013; Nourmohammad et al., 2013a, 2013b; Held et al., 2014).

To obtain a genome-wide evolutionary picture of gene

expression in Drosophila, we evaluate the aggregate time-

dependent divergence for all genes and species in our dataset

(Supplemental Experimental Procedures). Grouping the spe-

cies into 6 clades, we obtain a consistent pattern of divergence

UðtÞ as a function of divergence time t (Figure 2). We can attri-

bute this pattern to biological divergence of expression levels,

because the species-specific design of microarrays sup-

presses technical errors that depend on evolutionary distance

(Zhang et al., 2007). To test this prerequisite for evolutionary

analysis, we compare the mean expression levels for specific

Box 1. Trait Evolution in a Fitness Seascape

FITNESS MODEL

The schematic shows the evolution of a quantitative trait in a

single-peak fitness seascape (green curves). The distribution

of trait values within a species (gray curves) changes over a

macro-evolutionary period t, which can be observed as

cross-species divergence of the mean trait values DðtÞ (grayarrow). The fitness seascape constrains trait values around a

fitness peak by stabilizing selection, and evolutionary displace-

ments of this peak generate directional selection (green arrow).

The minimal fitness model has two parameters: the stabilizing

strength c is proportional to the inverse square width of the

fitness peak, and the driving rate ymeasures the mean square

displacement of the fitness peak per unit of evolutionary time

(see Experimental Procedures and Supplemental Information).

Lower plane: in a typical realization, the population mean trait

(black line) follows the moving fitness optimum (green line)

with delay and additional fluctuations.

TIME-DEPENDENT DIVERGENCE

The rescaled mean square displacement UðtÞ is plotted

against the rescaled divergence time t. Neutral evolution (gray):

UðtÞ reaches a saturation value of U0 = 1 with a relaxation time

of t0 1 (in units of the inverse mutation rate). Conservation

(blue): in a single-peak fitness landscape, UðtÞ has a smaller

saturation value, Ustab 1=c, which is reached faster than at

neutrality, tstab Ustab < 1. Adaptation (green): in a fluctuating

fitness seascape, there is a linear surplus UadðtÞ, which mea-

sures the amount of trait adaptation. We use the nonlinear rela-

tion between the trait divergence UðtÞ and the divergence time

t to infer the fitness parameters ðc; yÞ.

LINEAGE- AND GENE-SPECIFIC INFERENCE

Based on a joint probabilistic description of trait evolution

andfitness fluctuations,we can infer the likelihoodof the fitness

parameters, stabilizing strength c and driving rate y, for individ-

ual genes. The inference involves summingover all evolutionary

histories of mean and optimal trait values across the phylogeny

(black and green lines) that lead to the observed values

E1;E2;., at the terminal nodes (shown here for three species).

Over macro-evolutionary distances, this sum is dominated by

the most parsimonious lineage-specific evolutionary history

and can be evaluated analytically (see Experimental Proced-

ures and Supplemental Information). The evolutionary histories

on different branches mutually constrain each other because

they are connected at the branch points (yellow diamonds).

Cell Reports 20, 1385–1395, August 8, 2017 1387

Figure 2. Adaptive Evolution of Gene

Expression

The time-dependent divergence (rescaled) UðtÞfrom all genes is plotted against the divergence

time t for six partial species clades (small squares)

and for the entire Drosophila genus (large square).

Species clades and divergence times (scaled by

the rate of synonymous mutation) are defined by

the phylogeny of the Drosophila genus (Figure 1).

Trait divergence values are scaled by the asymp-

totic long-term limit under neutral evolution (see

text and Experimental Procedures). These data are

shown with theoretical curves UðtÞ under direc-

tional selection (green line), under stabilizing se-

lection (blue line), and for neutral evolution (gray

line). Inferred model parameters are stabilizing

strength c = 18:4 and driving rate y = 0:08 (Box 1)

(see Experimental Procedures and Supplemental

Experimental Procedures). We infer a time-

dependent adaptive component of the expression

divergence UadðtÞ (green shaded area); the com-

plementary component UeqðtÞ (blue shaded area)

is generated by genetic drift under stabilizing

selection. Adaptation accounts for a fraction uad =Uad=U= 64% of the expression divergence across the Drosophila genus ðtDros: = 1:4Þ. See Figure S3 for a

comparison of the data to models of time-independent stabilizing selection (Bedford and Hartl, 2009); see also Figures S1, S2, and S4–S7.

gene classes across species. We find no distance-dependent

differences, which provides strong evidence that our data are

free of technical divergence caused by a species bias in probe

sensitivity (Figure S1).

The rescaled expression divergence data in Figure 2

showmacro-evolution of expression levels. The average expres-

sion divergence has two distinct molecular clocks: a rapid

increase on timescales t below the D. melanogaster (D. mel)-

D. simulans (D. sim) divergence time is followed by a slower

increase on larger timescales. This pattern is clearly incompat-

ible with neutral evolution, where the rescaled divergence

would follow a uniform linear pattern on short timescales and

saturate to 1 on timescales given by the inverse point mutation

rate (gray line in Figure 2, to be compared with the aggregate

divergence plot in Box 1). The actual pattern shows stronger

evolutionary constraint, which is clearly visible already within

the D. mel-D. sim-D. yakuba (D. yak) clade: the species

pair D. mel-D. yak has about twice the divergence time but

only 1.2 times the expression divergence compared to the

pair D. mel-D. sim. Hence, the characteristic constraint time is

of the order tmelsim, about a factor of 10 shorter than the neutral

saturation time. This pattern indicates evolution under substan-

tial stabilizing selection, in qualitative agreement with previous

studies (Supplemental Experimental Procedures) (Rifkin et al.,

2003; Lemos et al., 2005; Bedford and Hartl, 2009) and with a

standard QST/FST analysis (Leinonen et al., 2013). However, the

expression divergence increases with the divergence time

throughout the Drosophila genus (green shaded area) and

does not show evidence of saturation for larger values of diver-

gence time t: This observation is in accordance with a similar

pattern of the expression divergence observed previously

(Zhang et al., 2007) and is backed up by our probabilistic analysis

reported later. In the following, we show that the increase of

expression divergence beyond tmelsim reflects adaptive evolu-

tion of gene expression over macro-evolutionary timescales,


and we provide a parsimonious explanation for the separation

of molecular clocks.

Fitness Model for Gene ExpressionThe inference of adaptation is based on a minimal dynamical

model of selection: gene expression levels E evolve in a sin-

gle-peak fitness seascape fðE; tÞ= f const:3ðE EðtÞÞ2(Nourmohammad et al., 2013a; Held et al., 2014). This model

is illustrated in Box 1 and formally defined in Experimental Pro-

cedures. The fitness peak EðtÞ for a given gene performs a

random walk over macro-evolutionary periods, which maps

continual changes of the optimal expression of that gene.

Despite its simplicity, the seascape model combines two

salient features of selection on gene expression: stabilizing se-

lection generates evolutionary constraint, and directional se-

lection drives long-term adaptive changes. These selection

components are measured by two parameters: the stabilizing

strength c, which is proportional to the inverse square width

of the fitness peak, and the driving rate y, which measures the

mean square displacement of the peak position EðtÞ per unitof evolutionary time. The fitness peak value f is arbitrary,

because only fitness differences between individuals matter

for the evolution of a species.

The fitness seascapemodel captures distinct selective causes

of adaptive evolution. Long-term environmental shifts can lead

to changes in the optimal expression levels that accumulate

over macro-evolutionary periods. The co-evolution of genes in

networks acts in a similar way: changes in the expression of

one gene generate time-dependent selection on the expression

of functionally correlated genes. This time dependence broadly

describes cross-gene epistasis in regulatory and metabolic

pathways. The seascape model is not concerned with individual

environmental shifts or epistatic changes; it describes the evolu-

tionary system biology of cumulative effects over macro-evolu-

tionary periods and over groups of genes. These effects can

be inferred from our kind of dataset and give rise to a random

walk model for fitness peak positions EðtÞ. Later, we extend

this model to include larger, punctuated shifts of fitness peaks.

Inference of Adaptive EvolutionTime-dependent selection generates complex evolutionary dy-

namics of expression levels for a given gene. The top diagram

of Box 1 shows a typical pattern: the population mean level fol-

lows the fitness peak displacements with some delay; additional

deviations of mean and optimal trait value are generated by ge-

netic drift. The fitness seascape model provides a simple con-

ceptual and computational basis to infer these dynamics. The

simplest inference scheme is based on aggregate time-depen-

dent expression divergence data; a probabilistic extension to in-

dividual genes is discussed later. Box 1 shows the analytical

form of the rescaled trait divergence UðtÞ in a fitness seascape

with positive stabilizing strength and driving rate (c> 0; y> 0;

green solid line); the corresponding form UeqðtÞ in a fitness land-

scape of the same stabilizing strength and zero driving rate

(c> 0; y= 0; blue solid line) reaches the saturation value Ustab

(Nourmohammad et al., 2013a; Held et al., 2014). The saturation

time is inversely proportional to the stabilizing strength

tstab Ustab 1=c and is shorter than expected from neutral

evolution t 1 (gray solid line). The resulting decomposition

UðtÞ=UeqðtÞ+UadðtÞ (Equation 2)

determines the adaptive fraction uadðtÞ=UadðtÞ=UðtÞ=hDadðtÞi=hDðtÞi of the trait divergence, which is driven by direc-

tional selection; the complementary fraction 1 uadðtÞ is gener-ated by genetic drift under stabilizing selection. In the linear

regime UðtÞzUstab +UadðtÞ, which covers all species clades in

this dataset, the fitted amplitudes provide simple estimates of

the selection parameters (Held et al., 2014):

cz2

Ustab

; yz2UadðtÞ

t: (Equation 3)

Here we determine these parameters, together with the trait

scale D0, by a maximally conservative inference procedure,

which produces the smallest value of stabilizing strength c

compatible with the data (Experimental Procedures).

Howmuch adaptation is in the evolutionary process? This can

bemeasured by the fitness fluxF, which is the cumulative fitness

gain through adaptive changes over an evolutionary period

(Mustonen and Lassig, 2007, 2010; Held et al., 2014). For a

gene with evolving mean expression level GðtÞ, the fitness flux

at a given time t is defined as the rate dGðtÞ=dt of expressionchange multiplied by the fitness gradient vFðG; tÞ=vG, where

FðG; tÞ is the mean fitness of the population at time t (Supple-

mental Experimental Procedures). Positive fitness flux under

time-dependent selection does not imply a net gain in fitness,

because fitness gains through adaptation can be offset by losses

through displacements of the fitness peak. We can compare

these dynamics to walking on an escalator that moves in the

opposite direction. The quantity F corresponds to the walker’s

total number of uphill steps on the escalator but is unrelated to

an absolute height gain. In the seascape model, the fitness flux

is proportional to the driving rate y and to the stabilizing strength

c, because a population under stronger selection follows the

fitness peak more closely and accumulates more adaptation.

Hence, the cumulative fitness flux F over an evolutionary period

t has the expectation value h2NFðtÞi= 2 cy t (scaled by the

effective population size N) (Held et al., 2014). Equation 3 estab-

lishes a simple relation between fitness flux and the adaptive part

of the expression divergence:

h2NFðtÞiz 2uadðtÞ1 uadðtÞ: (Equation 4)

Thus, the time-dependent pattern of expression divergence

discriminates directional selection in a genuine fitness seascape

from purely stabilizing selection in a static fitness landscape. The

joint inference of these selection components provides a more

powerful signal of adaptation than QST/FST analysis (Leinonen

et al., 2013), which would infer only stabilizing selection from

these data. In Supplemental Experimental Procedures, we

discuss in detail the relationship between the test for selection

based on trait divergence and other trait-based selection tests:

the QST/FST test (Leinonen et al., 2013), Ornstein-Uhlenbeck

models (Hansen, 1997; Bedford and Hartl, 2009), and the

McDonald-Kreitman test (McDonald and Kreitman, 1991).

Fitness Seascape of Drosophila Gene ExpressionWe first use the aggregate time-dependent divergence data to

infer a gene-averaged fitness seascape of expression levels in

Drosophila (Figure 2). The least-square fitted seascape model

(green line) contains stabilizing and directional selection, as illus-

trated in the center plot of Box 1. This model explains the

observed pattern UðtÞ: the evolutionary constraint between

neighboring species (here D. mel, D. sim, and D. yak) is caused

by stabilizing selection, and the approximately linear long-term

increase signals adaptation.

Our inference also shows that stabilizing selection alone

cannot explain the Drosophila expression data. In a static fitness

landscape with substantial stabilizing strength, genetic drift gen-

erates a rapidly saturating pattern of DðtÞ that is not observed in

the data (Figure S3). A previous study of this dataset suggests a

pattern with slow saturation on timescales on the order of the

Drosophila genus divergence time (Bedford and Hartl, 2009);

such a pattern would imply weak stabilizing selection ðc< 1Þand near-neutral evolution of gene expression levels (see Sup-

plemental Experimental Procedures for a detailed discussion).

Compared to the seascape model, the weak-selection land-

scape model provides a suboptimal fit to the observed DðtÞdata (Figure S3). This ranking of models is confirmed and quan-

tified by the probabilistic analysis.

Probabilistic Inference of Adaptively Regulated GenesNext, we extend the inference of selection to the noisy patterns

of individual genes. Using probabilistic extension of the selection

test, we obtain gene-specific posterior likelihood distributions of

stabilizing strength and fitness fluxQðc;FÞ (Box 1) (Experimental

Procedures). Maximizing this function allows us to infer, for each

gene, the most likely values of stabilizing strength c and fitness

flux F (or, equivalently, of c and y) given its observed expression


A B Figure 3. Probabilistic Inference of Adaptive

Gene Expression

(A) Distribution of maximum-likelihood values of the

scaled cumulative fitness flux, 2NF, inferred for indi-

vidual genes (see Experimental Procedures and Sup-

plemental Experimental Procedures). Our inference

classifies 54% of all genes as adaptively regulated

(2NF> 4, green shaded part of the distribution).

(B) Bayesian inference of fitness models. The posterior

log-likelihood score Sðc;FÞ favors the optimal

seascape model (c = 18:4, 2NF = 3:8; green square)

over the best landscape model (ceq = 16, F= 0; blue

square) and the neutral model (c= 0, F= 0; gray

square).

See also Figures S1, S2, S6, and S7.

values in 7 Drosophila species on the phylogeny of Figure 1 (see

also Box 1).

This analysis shows again the dominant role of adaptation in

gene expression: for 54% of all genes, we infer a significant

maximum-likelihood fitness flux F across the Drosophila genus;

we classify these genes as adaptively regulated (Table S1). Fig-

ure 3A shows the distribution of maximum-likelihood values of

the fitness flux for individual genes, which determines the in-

ferred clade-specific fraction of adaptive expression divergence

(Table 1). This fraction uadðtÞ increases with clade divergence

time, in accordance with the aggregate data of Figure 2. Be-

tween D. mel and D. sim, which diverged about 2–3 mya, 92%

of the expression divergence can be attributed to genetic drift

under stabilizing selection. Across the entire Drosophila genus,

which had its last common ancestor about 40 mya ago, we infer

64% of the expression divergence to be adaptive.

The Bayesian scheme also allows us to quantify the overall sta-

tistical significance of our selection inference. In Figure 3B, we

plot the cumulative log-likelihood score for all genes as a function

of stabilizing strength c and cumulative fitness flux F. As shown

by a log-likelihood test, the global maximum-likelihood seascape

model is strongly favored over themaximum-likelihood landscape

model ðP< 103600Þ and over neutral evolution ðP< 105400Þ (seeEquation 35 in Supplemental Experimental Procedures). This

analysis rejects neutral evolution and evolution under static stabi-

lizing selection in a robust way: it does not require model assump-

tions on the adaptive dynamics, and the ranking of models is

stable under alternate evaluation of species divergence times

(Figure S2). We conclude that the long-term increase of expres-

sion divergence beyond the D. mel-D. sim divergence time, as

observed in Figure 2, is a statistically significant signal of adaptive

evolution.

Drift and Adaptation Follow Distinct Molecular ClocksThe fitness seascape model interprets the two molecular clocks

observed in gene expression divergence in terms of distinct

evolutionary forces: the rapid short-term increase is caused by

genetic drift, whereas the slower long-term increase is caused

by adaptation. Because genetic drift and adaptation differ in

tempo, the relative contribution of adaptation to expression

divergence depends on evolutionary time: the adaptive part is

small for the youngest species clades, but adaptation becomes

dominant across the entireDrosophila genus (green shaded area

in Figure 2). This nonlinearity is a specific evolutionary feature of


quantitative traits with a complex molecular basis, which can

have individual loci under weak selection (Sunyaev and Roth,

2013). Substitutions at these loci generate predominantly diffu-

sive divergence of expression on short timescales ðt < tstabÞ;this pattern is modified by stabilizing and directional selection

on longer timescales.

The existence of two molecular clocks has an immediate

consequence: the power of inference methods for trait adap-

tation increases with evolutionary time span. Quantitative ge-

netics studies over short divergence times cannot distinguish

neutral evolution from evolution under selection, because the

divergence pattern is dominated by the diffusive molecular

clock in both cases (Box 1). For example, the observation of

drift-dominated behavior in primates (Khaitovich et al., 2004)

is consistent with neutral evolution but cannot exclude stabiliz-

ing or directional selection. Thus, it is compatible with the

signal of adaptive evolution based on expression QTLs in

humans (Fraser, 2013). In contrast, the data from the

7 Drosophila species spans both short and long divergence

times and displays two molecular clocks. This divergence

pattern is no longer consistent with neutral evolution and is

indicative of macro-evolutionary adaptation of gene expres-

sion in Drosophila.

Testing Alternative Evolutionary ScenariosThe minimal seascape model explains the pattern of gene

expression divergence across the Drosophila genus in a parsi-

monious way. But are there equally parsimonious alternative

modes of selection or demography that are consistent with the

data? To assess the specificity and robustness of the

seascape-based inference, we characterize the statistics of

gene expression levels in a number of alternative modes of evo-

lution by analytical approximations and simulations, and we

compare the results to the Drosophila data.

First, demographic effects may increase or decrease the

effective population size in a specific lineage, which affects the

stabilizing strength c for all genes. As shown in Figure S4, line-

age-specific changes in effective population size that persist

over sufficiently long evolutionary periods can be traced in the

aggregate time-dependent divergence UðtÞ. Such effects are

not observed in our data, which suggests that long-term demo-

graphic effects do not play a dominant role in the evolution of

Drosophila gene expression levels (Figure S4). This result does

not exclude short-term changes of population size, which

Table 1. Selection Parameters and Amount of Adaptation

Gene classes (gene

number) c 2NFD. mel-D. sim D. mel-D. yak D. vir-D. moj D. mel-D. ana D. mel-D. pse Dros. (D. mel-D. moj)

uad (%) a (%)

All genes (6,332) 18.4 3.8 7 23 48 59 61 64 54

Broad codon

usage (1,176)

15.6 3.9 9 25 49 61 62 66 57

Narrow codon

usage (501)

18.0 2.4 5 15 36 48 49 53 18

High expression

(553)

14.3 1.7 1 8 27 39 40 44 0

Dros., Drosophila genus; c, maximum-likelihood stabilizing strength; 2NF, maximum-likelihood fitness flux; uad, clade-dependent adaptive fraction

of the gene expression divergence; a, fraction of adaptively regulated genes across the Drosophila genus, given by the condition 2NFa > 4.

occurred, for example, in the evolution of the D. mel lineage

(Lachaise et al., 1988). Such changes can be traced in sequence

polymorphism spectra (Glinka et al., 2003; Haddrill et al., 2005;

Stephan and Li, 2007; Thornton et al., 2007), but they have

only minor effects on gene expression levels (Figure S4).

Next, we ask whether theDrosophila data can be explained by

lineage- and gene-specific relaxation of stabilizing selection. We

consider a specific non-adaptive mode of expression changes:

functional genes evolve under stabilizing selection in a static

fitness landscape, but individual genes can (partially) lose func-

tion at a given point in their evolutionary history, which relaxes

selection on their expression. We model loss of function as sto-

chastic events occurring at a small rate, independently for each

gene and on each lineage. This model produces a divergence

function UðtÞ with a long-term nonlinearity that is not seen in

the U data (Figure S5). The most direct way to discriminate be-

tween relaxation of selection and adaptive evolution is to use a

directional bias: most functional genes are upregulated by stabi-

lizing selection (a similar bias has been exploited in expression

QTL studies) (Fraser et al., 2010; Fraser, 2011, 2013). Hence, in

the loss-of-function mode, a comparison of expression levels

for a given gene would show small cross-species differences at

higher expression levels (i.e., between the lineages with a func-

tional gene), together with large deviations at lower levels (i.e., in

the lineages with lost gene function). Accordingly, the distribution

of expression divergence values for a given species pair would

show a broad tail generated by the loss events (Figure S5). These

features are not observed in our data, indicating that relaxed sta-

bilizing selection alone cannot explain the evolution ofDrosophila

expression levels (Figure S5). Loss of gene function does happen

in our phylogeny, but affected genes will often lose expression

altogether and hence will be suppressed in our dataset.

We also compare the Drosophila data with alternative models

of adaptive evolution. For example, individual genes can undergo

a (partial) neo-functionalization that requires a major change

in their expression. We describe this mode of evolution by a

punctuated fitness seascape, inwhich large shifts of the peakpo-

sition are stochastic events occurring at a small rate (Held et al.,

2014). This process produces an aggregate divergence function

UðtÞ that is compatible with the data, but a broad tail in the distri-

bution of expression divergence values is not observed

(Figure S5). We conclude that gradual but continual changes in

optimal levels, as described by our minimal model, are the

dominant evolutionary force driving the adaptation of gene

expression in Drosophila.

Functional Determinants of SelectionBy applying our inference to specific classes of genes, we can

get a more detailed view on adaptation of gene expression in

Drosophila. First, we observe a strong correlation between

codon usage and adaptation: genes with specific codons

show strongly reduced adaptive expression divergence and

lower average fitness flux than genes with broad codon usage

(Figure 4; Table 1). Specific codon usage is known to be preva-

lent in highly expressed genes (Ikemura, 1985); consistently, we

find stronger conservation of expression and lower levels of

fitness flux in this class (Table 1). Different codons for the

same amino acid differ in their efficiency of translation (Ikemura,

1985; Shields et al., 1988), which implies that genes with broad

codon usage have a higher potential for adaptive changes at

the post-transcriptional level. Here we find stronger adaptation

at the mRNA level in this gene class, which suggests a two-tier

mode of evolution: adaptive mRNA changes lay the ground on

which coherent adaptive tuning of protein levels can build.

At the same time, we find no significant correlation between

fitness flux for expression changes and adaptation of the amino

acid sequence, as measured by a McDonald-Kreitman test (Fig-

ure S6) (McDonald and Kreitman, 1991). This decoupling makes

sense, because expression changes of a gene are caused by cis-

and trans-regulatory sequence changes but do not require

evolution of its coding sequence. At the broad level of analysis

afforded by our dataset, we conclude that for a given gene,

expression level and coding sequence evolve independently to

a large degree. For a metabolite or a transcription factor, adap-

tive changes of its cellular concentration are often coupled with

conservation of its function.

Our gene-specific inference can be used to detect functional

gene classes associated with adaptive evolution of regulation.

A full ranking of gene classes by enrichment in adaptively regu-

lated genes with associated p values is reported in Table S1.

Gene functions associated with enhanced adaptation of expres-

sion include sensory perception, regulation, neural maturation,

regulation of growth, aging, and morphology. Adaptively regu-

lated functions also include response to UV radiation, which

has been identified as an important climate-mediated trait in hu-

mans (Hancock et al., 2011; Fraser, 2013). Adaptive evolution of


A B Figure 4. Adaptation of Gene Expression

Depends on Codon Usage Bias

(A) The aggregate time-dependent divergence

UðtÞ for genes with broad codon usage ðOÞ andfor genes with specific codon usage ðPÞ is

shown, together with theoretical curves under

directional selection (dashed and dashed-dotted

lines); the theoretical curve inferred for all genes is

shown for comparison (solid line; cf. Figure 2).

Codon usage ismeasured by the effective number

of codons (Supplemental Experimental Proced-

ures) (Wright, 1990); inferred model parameters

are listed in Table 1.

(B) The distribution of the cumulative fitness flux

2NF is plotted against the effective number of co-

dons n (circle, average; line, median; box, 50%

around median; bars, 70% around median) (Sup-

plemental Experimental Procedures) (Wright, 1990).

See also Figures S1, S2, S4, S5, and S7.

genes related to growth, regulation, and morphology has been

previously inferred by expression QTL and comparative studies

of gene regulation in other species (Fraser et al., 2011; Fraser,

2011; Romero et al., 2012). Here we identify these categories

from a quantitative, system-wide scan for adaptively regulated

genes. This points to the power of our phenotype-based infer-

ence scheme, which is not confounded by the combinatorial

complexity of cis-regulatory sequence in higher eukaryotes.

Sex-Specific Evolution of ExpressionWe test the role of expression differentiation between male and

female samples for adaptive evolution across the Drosophila

genus. The sex specificity of a given gene (Zhang et al., 2007),

defined as the difference between its male and female expres-

sion level Emf =Em Ef , is a distinct trait whose evolutionary

pattern can be analyzed by our method. We can distinguish

two modes of evolution: conservation of sex specificity main-

tained by stabilizing selection and sex-specific adaptation of

expression (Figure 5A). Most genes of our dataset have well-

conserved and often small sex specificity; these genes evolve

their expression levels coherently between males and females

(Zhang et al., 2007). The remaining 19% of the genes have a sig-

nificant cumulative fitness flux Fmf of their specificity trait; we

classify them as undergoing sex-specific adaptation of expres-

sion in theDrosophila genus. These genes cover all four chromo-

somes of the Drosophila genome.

Gene functions associated with sex-specific adaptation of

expression include regulation of translation, reproduction,

post-mating behavior, and (immune) response to biotic stimuli

(Table S2). To understand the distribution of these adaptive pro-

cesses between sexes, we apply our inference to classes of

genes with different species-averaged sex bias of expression

(Assis et al., 2012). For male-biased genes, the aggregate diver-

gence Umf signals substantial sex-specific adaptation (Fig-

ure 5B). Consistently, fitness flux Fmf is strongly enhanced in

genes that are predominantly expressed in males (Figure 5C).

Fitness flux is lower in other classes, including genes expressed

predominantly in females.

Altogether, we find a remarkable evolutionary asymmetry be-

tween sexes: male bias in expression is associated with adaptive


evolution of expression (orange shaded areas in Figures 5B and

5C), whereas female bias in expression is under weaker direc-

tional selection and primarily reflects conserved physiological

differences between male and female organisms. This result

complements a previously observed evolutionary asymmetry at

the sequence level: genes with male-biased expression show

increased amino acid divergence (Zhang et al., 2007). As sug-

gested by a McDonald-Kreitman test, this increase can be asso-

ciated with adaptive evolution of gene function (Figure S6).

DISCUSSION

Wehave shown that adaptive regulation accounts for most of the

macro-evolutionary divergence in gene expression across the

Drosophila genus. Genes differ considerably in the amount of

adaptation, depending on their codon usage, sexual differentia-

tion, and functional class. These results provide evidence for

system-wide adaptation of gene regulation in Drosophila at the

primary level of transcription, notwithstanding further evolu-

tionary complexities at the level of translation (Romero et al.,

2012; Artieri and Fraser, 2014). It remains to be seen whether a

similar prevalence of adaptation in the evolution of expression

will be found in different species.

Our inference of adaptation exploits the complex dependence

of the expression divergence on the evolutionary distance be-

tween species. It reflects two fundamental evolutionary features

of quantitative traits. First, such traits generate a divergence

pattern with two distinct molecular clocks: at a short evolutionary

distance, the divergence is always near the expected value

under neutrality; at a longer distance, it depends jointly on stabi-

lizing and directional selection (Figure 2; Box 1). This feature

reconciles seemingly contradictory results of previous studies:

analysis of closely related species produces a signal of neutral

evolution (Khaitovich et al., 2004, 2005), whereas evolutionary

constraint becomes apparent for more distant species (Rifkin

et al., 2003; Lemos et al., 2005; Rifkin et al., 2005; Gilad et al.,

2006; Bedford and Hartl, 2009; Romero et al., 2012). Second,

the phenotypic evolution of gene expression decouples from de-

tails of its genetic basis. This explains why we find overall strong

selection on gene expression levels even though selection on

A

B C

Figure 5. Sex-Specific Evolution of Gene

Expression

(A) Schematic showing conservation of sex speci-

ficity (left panel) versus sex-specific adaptation of

expression (right panel). The sex-specificity trait

(brown line) is defined as the difference between

male and female expression levels (purple and blue

lines). The schematics show all three lines as func-

tions of evolutionary time.

(B) The time-dependent divergence of the sex-

specificity trait Umf for all genes ð,Þ, genes with

male-biased expression ðOÞ, and genes with fe-

male-biased expression ðPÞ is shown, together

with theoretical curves under directional selection.

(C) The distribution of the cumulative fitness flux for

the sex-specificity trait 2NFmf is plotted against

the species-averaged sex specificity Emf (circle,

average; line, median; box, 50% around median;

bars, 70% around median) (Supplemental Experi-

mental Procedures). Sex-specific adaptation

(2NFmf > 4:5, orange shaded part) occurs predom-

inantly in male-biased genes.

See also Figures S1 and S4–S7.

individual QTL is often weak (Sunyaev and Roth, 2013). The

probabilistic extension of our inference scheme, which is based

on gene-specific expression divergence, identifies functional

gene classes associated with adaptive evolution of regulation.

The selection model underlying our analysis is a single-peak

fitness seascape, which contains components of stabilizing

and directional selection on a quantitative trait (Box 1). These

components are well-established notions of quantitative ge-

netics on micro-evolutionary timescales. Each of them can

provide a snapshot of the predominant selection pressure in a

population. However, the description of selection remains

incomplete as a description of selection over macro-evolu-

tionary periods. If selection on a trait is directional at a given

evolutionary time, will that selection relax after the trait value

has significantly adapted in the direction of selection? If selec-

tion is stabilizing, can we assume the optimal trait value will

remain invariant in the context of a different species? To

address these questions, we need a conceptual and quantita-

tive synthesis of stabilizing and directional selection. The sin-

gle-peak seascape model arguably provides the simplest

such synthesis. It also provides a simple picture of continual

adaptation over macro-evolutionary periods: a species follows

a moving fitness peak, and this process generates positive

fitness flux but no net increase in fitness.

Our method of selection inference can be applied to a spec-

trum of molecular quantitative traits with a complex genetic

basis, provided that comparative data from multiple, sufficiently

diverged species are available. Such traits include genome-

wide protein levels, protein-DNA binding interactions, and enzy-

matic activities. For most of these traits, we have only partial

knowledge of the underlying genetic loci and their effects on trait

and fitness. Our method complements QTL studies and opens a

way to infer quantitative phenotype-fitness maps at the systems

level.

EXPERIMENTAL PROCEDURES

Sequence Data and Evolutionary Tree

A synonymous genome sequence is used to estimate the species divergence

times ti j (scaled in units of the inverse point mutation rate m1 (Drosophila

12 Genomes Consortium et al., 2007). The resulting phylogeny (Drosophila

12 Genomes Consortium et al., 2007) is shown in Figure 1, where we also

compare the divergence times ti j with divergence measures based on amino

acid distances.

Expression Data and Primary Analysis

We use genome-wide expression data from 7 Drosophila species (D. mela-

nogaster [D. mel], D. simulans [D. sim], D. yakuba [D. yak], D. ananassae

[D. ana], D. pseudoobscura [D. pse], D. virilis [D. vir], and D. mojavensis

[D. moj]), obtained in Zhang et al. (2007) (GEO: GSE6640). These data contain

mRNA intensity measurements for a number of biological replicates from adult

(5–7 days post-eclosion) males and females in each species. For quality

assessment, a number of technical replicates were obtained for each biolog-

ical replicate. Moreover, specific microarray platforms were designed for each

of these species, which allows for a reliable comparison of expression levels

across species (Figure S1) (Zhang et al., 2007). We restrict the analysis to

the 6,332 genes that have unambiguous one-to-one orthologs across all lines

and are tested by at least four probes in eachmicroarray platform (see Supple-

mental Information for details).

The expression levels Eai;s;k are labeled by gene a; species i; sex s, and bio-

logical replicates k: The levels are transformed to mean 0 and cross-gene vari-

ance 1 for each replicate (Z-transformation of microarrays) (Quackenbush,

2002). In the Supplemental Experimental Procedures, we show that this trans-

formation accurately captures evolutionary information and that our results are

robust under quantitative details of the transformation (Figure S1). Further-

more, the assays do not generate a spurious signal of expression divergence

across species (Figure S1). For a given gene a, we estimate the variance

across biological replicates dai and genetic mean Gai in each species and the

divergence Daij = ðGa

i Gaj Þ2 between any two species i, j. These divergence

data inform our primary inference of adaptation. For each pair of species,

we evaluate the aggregate divergence hDiji and the rescaled divergence

hUiji given by Equation 1, using a trait scale D0 obtained by our model fit

(described later). For each clade C in our phylogeny, we obtain the aggregate

data ðtC;UCÞ shown in Figures 2, 4A, and 5B by averaging ti j and Uij over all


pairs of species i; j˛C that are connected via the root of the clade. Unlike

pairwise divergence between species, clade-specific divergence data allow

unbiased error analysis and model ranking, because they are only weakly

correlated through the structure of the phylogeny (Figure 1).

Inference of Selection

The minimal fitness seascape for a given gene takes the form

fðE; tÞ= f c

2NE20

ðE EðtÞÞ2;

where the optimal trait value EðtÞ performs an Ornstein-Uhlenbeck process

with mean square displacement yE20 per unit m1 of evolutionary time

(Box 1); E20 is the average genetic variation of expression in the long-term limit

of neutral evolution (Nourmohammad et al., 2013b).

We use the time dependence of the clade-specific aggregate expression

divergence ðtC; hDCðtÞiÞ to infer the fitness parameters of stabilizing strength

c and driving rate y. We treat the trait scale D0 as an additional fit parameter

and assume that the saturation due to stabilizing selection occurs at the latest

possible time (i.e., for the largest Ustab) consistent with the data, resulting in a

best model with conservative estimates of stabilizing strength c. This assump-

tion is necessary because the resolution of the data on short evolutionary time-

scales is bounded by the D. mel-D. sim divergence. Using this procedure, we

infer a global fitness seascape with parameters ðc = 18:4; y =0:08Þ and a re-

sulting average fitness flux 2NF = 3:8 per gene across the Drosophila genus

ðtDros: = 1:4Þ (Figure 2). The fitness flux is independent of the trait scale D0

and hence of the preceding assumption. Moreover, we use the trait scale D0

and the neutral sequence diversity p0 determined from synonymous polymor-

phisms (Begun et al., 2007) to estimate the expected aggregate trait diversity

within a given species hDi p0D0 (Supplemental Experimental Procedures).

This estimate of hDi determines the sampling error of the observed expression

divergence D (Figure 2 shows error-corrected divergence data). Conversely,

the trait scaleD0 can be inferred directly fromdata of hDi (Supplemental Exper-

imental Procedures), but such data are not available for all species in the pre-

sent set.

Control fits of the same data to equilibriummodels, including the well-known

Ornstein-Uhlenbeck dynamics for the population mean trait (Hansen, 1997;

Bedford and Hartl, 2009), are shown in Figure S3. To account for the expres-

sion noise due to the limited number of biological replicates in each species,

we use a probabilistic extension of this test. We evaluate the Bayesian poste-

rior probability distribution for the stabilizing strength and fitness flux in individ-

ual genesQðc;F jEaÞ, given their samplemean data Ea = ðEa1 ;.;Ea

7 Þ. This pro-duces gene-specific expectation values ca and Fa (Figures 3A, 4B, and 5C).

This method uses gene-specific trait scales D0, which account for differences

in mutational variance between genes. We use a conservative condition on

fitness flux 2NFa > 4 to infer adaptively regulated genes (Table S1). The cumu-

lative log-likelihood score Sðc;FÞ=Pa log Qðc;F jEaÞ quantifies the statistical

significance of our inference (Figure 3B).

Analysis of Alternative Evolutionary Scenarios

To test for lineage-specific demographic effects, we compare the aggregate

rescaled divergence U= hDðtÞi=D0 data to theoretical functions Uðt; tiÞcomputed for an alternative model with a change in effective population size

on the phylogenetic branch of species i (Figure S4).We also examine two alter-

native selection scenarios: relaxed stabilizing selection by partial loss of func-

tion (ca switches to a reduced value with rate g) and punctuated fitness peak

shifts (Ea jumps by an amount on the order of E0 with a rate on the order of ym)

(Figure S5). The observed distributions of cross-species expression differ-

ences are consistent with the minimal seascape model but at variance with

both alternative models (Figure S5).

Analysis of Specific Gene Classes

To infer sex-specific evolution, we define specificity traits as differences

between male and female expression levels Eamf ;i =Ea

m;i Eaf;i for each gene

(Zhang et al., 2007). Genes with sex-specific adaptive evolution of expression

are identified by a condition on the cumulative fitness flux for the specificity


trait 2NFamf > 4:5 (Table S2). Genes with male- and female-biased expression

are identified using the results of Assis et al. (2012).

Simulation Tests

We simulate Fisher-Wright evolution to validate our probabilistic inference

scheme and to establish its robustness under trait epistasis (Figure S7).

SUPPLEMENTAL INFORMATION

Supplemental Information includes Supplemental Experimental Procedures,

seven figures, and two tables and can be found with this article online at

http://dx.doi.org/10.1016/j.celrep.2017.07.033.

AUTHOR CONTRIBUTIONS

A.N., J.R., T.H., V.K., J.B., and M.L. designed the research and analyzed the

data. A.N., T.H., and M.L. wrote the article.

ACKNOWLEDGMENTS

We acknowledge discussions with P. Andolfatto, N. Barton, A. Beyer, H.

Fraser, M. quksza, L.B. Oliver, J. Plotkin, S. Schiffels, P. Shah, and D. Sturgill.

This work has been supported by the James S. McDonnell Foundation (A.N.),

U.S. National Science Foundation grant PHY1305525 (A.N.), and Deutsche

Forschungsgemeinschaft grant SFB 680. We acknowledge the U.S. National

Science Foundation grant PHY1125915 to the Kavli Institute of Theoretical

Physics (UCSB), where part of this work was performed.

Received: September 6, 2016

Revised: April 15, 2017

Accepted: July 13, 2017

Published: August 8, 2017

REFERENCES

Andolfatto, P. (2005). Adaptive evolution of non-coding DNA inDrosophila. Na-

ture 437, 1149–1152.

Artieri, C.G., and Fraser, H.B. (2014). Evolution at two levels of gene expression

in yeast. Genome Res. 24, 411–421.

Assis, R., Zhou, Q., and Bachtrog, D. (2012). Sex-biased transcriptome evolu-

tion in Drosophila. Genome Biol. Evol. 4, 1189–1200.

Bedford, T., and Hartl, D.L. (2009). Optimization of gene expression by natural

selection. Proc. Natl. Acad. Sci. USA 106, 1133–1138.

Begun, D.J., Holloway, A.K., Stevens, K., Hillier, L.W., Poh, Y.P., Hahn, M.W.,

Nista, P.M., Jones, C.D., Kern, A.D., Dewey, C.N., et al. (2007). Population ge-

nomics: whole-genome analysis of polymorphism and divergence in

Drosophila simulans. PLoS Biol. 5, e310.

Blekhman, R., Oshlack, A., Chabot, A.E., Smyth, G.K., and Gilad, Y. (2008).

Gene regulation in primates evolves under tissue-specific selection pressures.

PLoS Genet. 4, e1000271.

Brawand, D., Soumillon, M., Necsulea, A., Julien, P., Csardi, G., Harrigan, P.,

Weier, M., Liechti, A., Aximu-Petri, A., Kircher, M., et al. (2011). The evolution of

gene expression levels in mammalian organs. Nature 478, 343–348.

Bullard, J.H., Mostovoy, Y., Dudoit, S., and Brem, R.B. (2010). Polygenic and

directional regulatory evolution across pathways in Saccharomyces. Proc.

Natl. Acad. Sci. USA 107, 5058–5063.

Coolon, J.D., McManus, C.J., Stevenson, K.R., Graveley, B.R., and Wittkopp,

P.J. (2014). Tempo and mode of regulatory evolution in Drosophila. Genome

Res. 24, 797–808.

Drosophila 12 Genomes Consortium, Clark, A., Eisen, M., Smith, D., Bergman,

C., Oliver, B., Markow, T., Kaufman, T., Kellis, M., Gelbart, W., Iyer, V., et al.

(2007). Evolution of genes and genomes on the Drosophila phylogeny. Nature

450, 203–218.


http://refhub.elsevier.com/S2211-1247(17)30999-3/sref1




























Fraser, H.B. (2011). Genome-wide approaches to the study of adaptive gene

expression evolution: systematic studies of evolutionary adaptations involving

gene expressionwill allowmany fundamental questions in evolutionary biology

to be addressed. BioEssays 33, 469–477.

Fraser, H.B. (2013). Gene expression drives local adaptation in humans.

Genome Res. 23, 1089–1096.

Fraser, H.B., Moses, A.M., and Schadt, E.E. (2010). Evidence for widespread

adaptive evolution of gene expression in budding yeast. Proc. Natl. Acad. Sci.

USA 107, 2977–2982.

Fraser, H.B., Babak, T., Tsang, J., Zhou, Y., Zhang, B., Mehrabian, M., and

Schadt, E.E. (2011). Systematic detection of polygenic cis-regulatory evolu-

tion. PLoS Genet. 7, e1002023.

Genissel, A., McIntyre, L.M., Wayne, M.L., and Nuzhdin, S.V. (2008). Cis and

trans regulatory effects contribute to natural variation in transcriptome of

Drosophila melanogaster. Mol. Biol. Evol. 25, 101–110.

Gilad, Y., Oshlack, A., Smyth, G.K., Speed, T.P., and White, K.P. (2006).

Expression profiling in primates reveals a rapid evolution of human transcrip-

tion factors. Nature 440, 242–245.

Glinka, S., Ometto, L., Mousset, S., Stephan, W., and De Lorenzo, D. (2003).

Demography and natural selection have shaped genetic variation inDrosophila

melanogaster: a multi-locus approach. Genetics 165, 1269–1278.

Haddrill, P.R., Thornton, K.R., Charlesworth, B., and Andolfatto, P. (2005). Mul-

tilocus patterns of nucleotide variability and the demographic and selection

history of Drosophila melanogaster populations. Genome Res. 15, 790–799.

Hancock, A.M., Witonsky, D.B., Alkorta-Aranburu, G., Beall, C.M., Gebremed-

hin, A., Sukernik, R., Utermann, G., Pritchard, J.K., Coop, G., and Di Rienzo, A.

(2011). Adaptations to climate-mediated selective pressures in humans. PLoS

Genet. 7, e1001375.

Hansen, T.F. (1997). Stabilizing selection and the comparative analysis of

adaptation. Evolution 51, 1341–1351.

Held, T., Nourmohammad, A., and Lassig, M. (2014). Adaptive evolution ofmo-

lecular phenotypes. J. Stat. Mech. 9, P09029.

Hoekstra, H.E., and Coyne, J.A. (2007). The locus of evolution: evo devo and

the genetics of adaptation. Evolution 61, 995–1016.

Ikemura, T. (1985). Codon usage and tRNA content in unicellular and multicel-

lular organisms. Mol. Biol. Evol. 2, 13–34.

Khaitovich, P., Weiss, G., Lachmann, M., Hellmann, I., Enard, W., Muetzel, B.,

Wirkner, U., Ansorge, W., and Paabo, S. (2004). A neutral model of transcrip-

tome evolution. PLoS Biol. 2, E132.

Khaitovich, P., Hellmann, I., Enard, W., Nowick, K., Leinweber, M., Franz, H.,

Weiss, G., Lachmann, M., and Paabo, S. (2005). Parallel patterns of evolution

in the genomes and transcriptomes of humans and chimpanzees. Science

309, 1850–1854.

King, M.C., and Wilson, A.C. (1975). Evolution at two levels in humans and

chimpanzees. Science 188, 107–116.

Lachaise, D., Cariou, M., David, J., Lemeunier, F., Tsacas, L., and Ashburner,

M. (1988). Historical biogeography of the Drosophila melanogaster species

subgroup. In Evolutionary Biology, Volume 22, M. Hecht, B. Wallace, and G.

Prance, eds. (Springer), pp. 159–225.

Leinonen, T., McCairns, R.J., O’Hara, R.B., and Merila, J. (2013). Q(ST)-F(ST)

comparisons: evolutionary and ecological insights from genomic heterogene-

ity. Nat. Rev. Genet. 14, 179–190.

Lemos, B., Meiklejohn, C.D., Caceres, M., and Hartl, D.L. (2005). Rates of

divergence in gene expression profiles of primates, mice, and flies: stabilizing

selection and variability among functional categories. Evolution 59, 126–137.

Lynch, M., and Hill, W.G. (1986). Phenotypic evolution by neutral mutation.

Evolution 40, 915–935.

McDonald, J.H., and Kreitman, M. (1991). Adaptive protein evolution at the

Adh locus in Drosophila. Nature 351, 652–654.

Mustonen, V., and Lassig, M. (2007). Adaptations to fluctuating selection in

Drosophila. Proc. Natl. Acad. Sci. USA 104, 2277–2282.

Mustonen, V., and Lassig, M. (2010). Fitness flux and ubiquity of adaptive evo-

lution. Proc. Natl. Acad. Sci. USA 107, 4248–4253.

Nourmohammad, A., Held, T., and Lassig, M. (2013a). Universality and pre-

dictability in molecular quantitative genetics. Curr. Opin. Genet. Dev. 23,

684–693.

Nourmohammad, A., Schiffels, S., and Lassig, M. (2013b). Evolution of molec-

ular phenotypes under stabilizing selection. J. Stat. Mech. 1, P01012.

Orr, H.A. (1998). Testing natural selection vs. genetic drift in phenotypic evolu-

tion using quantitative trait locus data. Genetics 149, 2099–2104.

Pai, A.A., Pritchard, J.K., andGilad, Y. (2015). The genetic andmechanistic ba-

sis for variation in gene regulation. PLoS Genet. 11, e1004857.

Quackenbush, J. (2002). Microarray data normalization and transformation.

Nat. Genet. 32 (Suppl), 496–501.

Riedel, N., Khatri, B.S., Lassig, M., and Berg, J. (2015). Multiple-line inference

of selection on quantitative traits. Genetics 201, 305–322.

Rifkin, S.A., Kim, J., andWhite, K.P. (2003). Evolution of gene expression in the

Drosophila melanogaster subgroup. Nat. Genet. 33, 138–144.

Rifkin, S.A., Houle, D., Kim, J., and White, K.P. (2005). A mutation accumula-

tion assay reveals a broad capacity for rapid evolution of gene expression. Na-

ture 438, 220–223.

Romero, I.G., Ruvinsky, I., and Gilad, Y. (2012). Comparative studies of gene

expression and the evolution of gene regulation. Nat. Rev. Genet. 13, 505–516.

Sella, G., Petrov, D.A., Przeworski, M., and Andolfatto, P. (2009). Pervasive

natural selection in the Drosophila genome? PLoS Genet. 5, e1000495.

Shields, D.C., Sharp, P.M., Higgins, D.G., and Wright, F. (1988). ‘‘Silent’’ sites

in Drosophila genes are not neutral: evidence of selection among synonymous

codons. Mol. Biol. Evol. 5, 704–716.

Stephan,W., and Li, H. (2007). The recent demographic and adaptive history of

Drosophila melanogaster. Heredity (Edinb) 98, 65–68.

Sunyaev, S.R., and Roth, F.P. (2013). Systems biology and the analysis of ge-

netic variation. Curr. Opin. Genet. Dev. 23, 599–601.

Thornton, K.R., Jensen, J.D., Becquet, C., and Andolfatto, P. (2007). Progress

and prospects in mapping recent selection in the genome. Heredity (Edinb) 98,

340–348.

Whitehead, A., and Crawford, D.L. (2006). Neutral and adaptive variation in

gene expression. Proc. Natl. Acad. Sci. USA 103, 5425–5430.

Wittkopp, P.J., Haerum, B.K., and Clark, A.G. (2008). Regulatory changes un-

derlying expression differences within and between Drosophila species. Nat.

Genet. 40, 346–350.

Wright, F. (1990). The effective number of codons used in a gene. Gene 87,

23–29.

Zhang, Y., Sturgill, D., Parisi, M., Kumar, S., and Oliver, B. (2007). Constraint

and turnover in sex-biased gene expression in the genus Drosophila. Nature

450, 233–237.













































































































Cell Reports, Volume 20

Supplemental Information

Adaptive Evolution of Gene

Expression in Drosophila

Armita Nourmohammad, Joachim Rambeau, Torsten Held, Viera Kovacova, JohannesBerg, and Michael Lässig

Ω(τ)

Ω(τ)

Ω(τ)

mea

n ex

pres

sion

leve

lm

ean

expr

essi

on le

vel

(D)

D.sim*D. sim

D. melD. yak

D. anaD. pse

D. virD. moj

-0.25

0

0.25

0.5

D.sim*D. sim

D. melD. yak

D. anaD. pse

D. virD. moj

-0.5

0

0.5

1

1.5

2

0 2 4 6 80

0.2

0.4

0.6

0.8

1

0 2 4 6 80

0.2

0.4

0.6

0.8

1

0 2 40

0.2

0.4

0.6

0.8

1

0 2 4 6 80

0.2

0.4

0.6

0.8

1

0 2 4 6 80

0.2

0.4

0.6

0.8

1

0 2 4 6 80

0.2

0.4

0.6

0.8

1

mel-sim, τ ~ 0.11 µ−1 mel-yak, τ ~ 0.27 µ−1 moj-vir, τ ~ 0.71 µ−1

mel-ana, τ ~ 1.12 µ−1 mel-pseud, τ ~ 1.17 µ−1 Dros. genus, τ ~ 1.38 µ−1

clade-specific expression divergence, DC

cum

ulat

ive

dist

ribut

ion

func

tion

(E)

expr

essi

on le

vel

D.sim*D. sim

D. melD. yak

D. anaD. pse

D. virD. moj

8

9

10

11

(A)0 0.5 1 1.5

0

0.1

0.2

0.3

0.4

divergence time, τ

resc

aled

div

erge

nce,

expr

essi

on le

vel

expr

essi

on le

vel

D.sim*D. sim

D. melD. yak

D. anaD. pse

D. virD. moj-1

-0.5

0

0.5

1

D.sim*D. sim

D. melD. yak

D. anaD. pse

D. virD. moj

8

9

10

11

(B)

0 0.5 1 1.50

0.02

0.04

0.06

0.08

divergence time, τ

(C)

0 0.5 1 1.50

0.02

0.04

0.06

0.08

divergence time, τ

resc

aled

div

erge

nce,

resc

aled

div

erge

nce,

Ω(τ)

Figure S1: Transformation of expression data and testing for technical expression divergence. Related toFigures 2, 3, 4, 5.

1

Figure S1: Transformation of expression data and testing for technical expression divergence. Related toFigures 2, 3, 4, 5. The following statistics are compared between (A) raw intensities, (B) Z-transformed intensities,and (C) quantile-normalized intensities. Top panels: Average expression intensities across all genes are shown for allbiological replicates of female (circle) and male (triangle) organisms (error bars indicate standard deviation). Centerpanels: Clustering of expression intensities for all genes (horizontal axis), and all replicates (vertical axis, denoted by“species sex ID”) by Euclidean distance (see section 1 of SI). For raw intensities, the replicates of each species clustertogether but cross-species differences do not reflect evolutionary distances, as shown by the scrambled phylogenieson the right hand side. Z-transformed and quantile-normalized intensities recover the species clades of the sequence-based Drosophila phylogeny; cf. Fig. 1. Tree branches are colored by species, as in the top panels. Bottom panels:The aggregate (rescaled) divergence for clades, ΩC (filled squares) and for individual pairs of species, Ωij (emptysquares), is plotted against divergence time, τ (as in Fig. 2). The rescaling of the expression divergence is done by acommon denominator D0 consistent with Fig. 2 in the main text. The dependence of expression divergence on τ ismasked for raw intensities, but consistent for Z-transformed and quantile-normalized intensities. In addition, clade-based statistics is seen to substantially reduce the noise of the expression divergence data. We conclude that a Z- orquantile transformation of the data is essential to capture evolutionary information, but our results are robust undervariants of the transformation. See section 1 of SI. (D) The species-specific aggregate mean gene expression level〈Ei〉 is plotted for different classes of genes. Top panel: genes with varying level of sequence divergence across 7Drosophila species: 10% highest divergence (dark blue triangles), 20% medium divergence (medium blue squares)and 10% lowest divergence (light blue triangles); Bottom panel: highly expressed genes (green triangles), and male-biased gene (orange triangles). The error bars show the standard deviation of the mean in each class. In all classes,we find no significant species dependence of the class averages 〈Ei〉. (E) Cumulative distribution of clade-specificexpression divergence (unscaled) DC , estimated for Drosophila clades (Fig. 2) are indistinguishable in gene classeswith varying levels of sequence divergence; the color code is similar to (D, top panel). We conclude that the assay isfree of technical divergence; see section 1 of SI.

2

bio. replicatevariance (δ)

diversity (∆sim)

divergence(D)

cross-genevariance (V)

0.030.040.05

0.1

0.20.30.40.5

1.0

dimorphism(∆ mf )

D. melD. simD. yakD. anaD. pseD. virD. moj

(B)synonymous divergence, τ(A)

0 0.5 1 1.50

0.05

0.1

0.15

0.2

0.25

amin

o ac

id d

iver

genc

e, τ~

Figure S2: Sequence and gene expression variation. Related to Figures 2, 3, 4. (A) Pairwise amino acid sequencedivergence vs. divergence time from synonymous sequence (circle) and the clade-specific divergence times (squareswith color for clades as in Fig. 1). We conclude evolutionary trees based on amino acid distances are less suitablefor our analysis (cf. the discussion on control analysis of equilibrium models in section 2 of SI.) (B) Gene-averagedexpression variance across biological replicates 〈δ〉 (, equation 4), expression diversity 〈∆〉 (5, equation 6), male-female expression dimorphism 〈∆mf 〉 (, equation 7), clade divergence 〈D〉 (4, equation 8 and used in Figs. 2, 4),and cross-gene variance of expression V = 〈Γ2

i 〉 ≈ 1 (×). We find a clear ranking 〈δi〉 < 〈∆〉sim . 〈∆mf 〉 <〈Dij〉 < Vi. The color code for single-species data is shown in the legend, colors for clades are as in Fig. 1.

3

0 0.5 1 1.50

0.1

0.2

0.3

0.4

0.5

0.6

0 0.5 1 1.50

0.1

0.2

0.3

0.4

0.5

0.6

divergence time, τ rescaled amino acid distance , τ(A) (B)~

expr

essio

n di

verg

ence

, D

expr

essio

n di

verg

ence

, D

Figure S3: Fitness landscape models as control. Related to Figure 2. (A) Clade-specific gene expression diver-gence, DC (unscaled, filled squares), together with pairwise expression divergence, Dij (empty squares), is plottedagainst the divergence time estimated from four-fold synonymous sites (Drosophila 12 Genomes Consortium et al.,2007) (Fig. 1). The seascape model with the trait scale D0 as a fit parameter (green solid line; stabilizing strengthc∗ = 18.4, driving rate υ∗ = 0.08; as in Fig. 2) explains these data; this model is discussed in the main text. An al-ternative seascape model with the trait scale inferred from the D. simulans diversity data (dashed green line; c = 18.6,υ = 0.07) is very similar, which serves as a consistency check. The landscape models with the trait scale as a fitparameter (solid blue line; ceq < 1) and with the trait scale inferred from the diversity data (dashed blue line; ceq = 8)provide a significantly poorer fit; see section 2 of SI for the likelihood comparison of these models. In particular,neither of the equilibrium models can explain the evolution of expression in the youngest clades: the model withdiversity from data overestimates the divergence Dmel−yak and Dmel−sim, the model with inferred diversity over-estimates the relative divergence Dmel−yak/Dmel−sim. (B) The same clade-specific gene expression divergence DC(filled squares) and pairwise expression divergence Dij (empty squares) are plotted against the amino-acid sequencedistance of Fig. S2A (Bedford and Hartl, 2009), uniformly rescaled to give the same scaled genus divergence timeτDros. = 1.4 as in (A). We find the same ranking of models, but all fits become poorer due to the nonlinearities ofthe amino acid divergence times (cf. Fig. S2A). See section 2 of SI for a detailed comparison with the results ofref. (Bedford and Hartl, 2009).

4

Ω(τ)

resc

aled

div

erge

nce,

divergence time, τ(C)

0 0.5 1 1.50

0.03

0.06

0.09

0 0.5 1 1.50

0.03

0.06

0.09

(A)

(B)

0.0 0.5 1.0 1.50.00

0.02

0.04

0.06

0.0 0.5 1.0 1.50.00

0.02

0.04

0.06

0.0 0.5 1.0 1.50.00

0.01

0.02

0.0 0.5 1.0 1.50.00

0.01

0.02

0 0.5 1 1.50

0.03

0.06

0.09

0 0.5 1 1.50

0.03

0.06

0.09

0 0.5 1 1.50

0.03

0.06

0.09

0 0.5 1 1.50

0.03

0.06

0.09

0 0.5 1 1.50

0.03

0.06

0.09

Figure S4: Test of lineage-specific demography. Related to Figures 2, 4, 5. We compare the polarized (rescaled)divergence ΩC,i with species i as outgroup (equation 36,4) to background data from partial clades excluding speciesi (5); both quantities are plotted against the clade divergence time. (A) Left panel: Data for clades with outgroupD. melanogaster. Center and right panels: Evolution with a reduced or enhanced effective population size Ni in theoutgroup lineage. Analytical curves and simulation results are shown for Ni = 3N (dashed lines, N) Ni = N/2(dashed-dotted lines, 4) in a fitness landscape (stabilizing strength c = 20, driving rate υ = 0; center panel) andseascape (c = 20, υ = 0.09; right panel). (B) Same as (A), with outgroup D. mojavensis. (C) Data for each of theother five species chosen as outgroup. These data give no evidence of long-term lineage-specific demography. Theanalytical and simulation results show that lineage-specific demography under stabilizing selection does not confoundthe signal of adaptive evolution in the time-dependent divergence Ω, shown in Fig. 2. Lineage-specific demography isintroduced in section 3, simulation details are given in section 5 of SI.

5

Ω(τ)

Ω(τ)

expression difference

coun

ts

expression difference expression difference expression difference

fract

ion

fract

ion

fract

ion

(A)

(B) (C) (D)

divergence time, τ divergence time, τ divergence time, τ

-9 -6 -3 0 3 6 9

10-1

10-3

10-2

-9 -6 -3 0 3 6 9-9 -6 -3 0 3 6 9

10-1

10-3

10-2

10-1

10-3

10-2

−4 −2 0 2 4101

102

103

resc

aled

div

erge

nce,

0.0 0.5 1.0 1.50.00

0.10

0.20

0.0 0.5 1.0 1.50.00

0.10

0.20

0.0 0.5 1.0 1.50.00

0.10

0.20

resc

aled

div

erge

nce,

resc

aled

div

erge

nce,

Ω(τ)

Figure S5: Test of alternative selection scenarios. Related to Figures 2, 4, 5. (A) Distributions of clade-specificexpression level differences, PC(∆E) (equation 39, color code as in Fig. 1), standard-normalized to mean 0 andvariance 1. These distributions are approximately Gaussian (black line: standard normal distribution). (B) Minimalseascape model. Top panel: Time-dependent (rescaled) divergence Ω(τ) (bullets: simulation results; line: analyticalcurve as in Fig. 2). Bottom panel: Standard-normalized distributions of trait differences, Pτ (∆E), from simulationsfor τ = 0.21, 0.69 and 1.37 (green, orange, and blue bullets) are of Gaussian form (dotted line). The same quantitiesare shown for two alternative fitness models: (C) Loss-of-function model. Functional genes evolve in a static fitnesslandscape of stabilizing strength c = 4.5; individual genes lose function with rate γ = 0.04µ, resulting in reducedselection (c → 0.01 c). The loss events generate a nonlinearity in Ω(τ) and a broad tail in Pτ (∆E) that are notobserved in the data. (D) Punctuated fitness seascape. Individual genes jump to a new, uncorrelated fitness peakwith rate 0.16µ. These dynamics also generate a broad tail in Pτ (∆E). The Drosophila data of ΩC (Fig. 2) andof PC(∆E) together favor the minimal seascape model over both alternatives. The loss-of-function model and thepunctuated seascape model are introduced in section 3, simulation details are given in section 5 of SI.

6

1 2 3 4 5 6 7−1

−0.5

0

0.5

adap

tive

subs

titut

ions

, α seq

fitness flux, 2NΦ −2 −1 0 1 2

−1

−0.5

0

0.5

1

1.5

average sex specificity, Emf

adap

tive

subs

titut

ions

, α seq

(A) (B)

Figure S6: Adaptive gene expression versus adaptive evolution of protein sequence. Related to Figures 2, 3, 5.(A) The distribution of αseq = (DnPs/DsPn)− 1, denoting the fraction of adaptive amino acid substitutions (Smithand Eyre-Walker, 2002), is plotted against the cumulative fitness flux of gene expression, 2NΦ reported in Table S1and shown in Fig. 3A (circle: average; line: median; box: 50% around median; bars: 70% around median). Wefind no correlation between these statistics, which suggests that adaptive gene expression is an independent modeof evolution. (B) The distribution of αseq plotted against the average sex specificity Emf signals increased adaptiveprotein evolution in genes with sex-biased expression, which is strongest in male-biased genes (cf. the results ofref. (Zhang et al., 2007)). For the definition of sex-biased expression, see section 4 of SI.

7

input fitness flux, 2N Φin

infe

rred

fitne

ss fl

ux, 2

N Φ

input stabilizing strength, c in

infe

rred

stab

ilizin

g st

reng

th, c

fitne

ss fl

ux, 2

N Φ

epistasis strength, ε2

0

1

2

3

4

10110010-1020 3040

100

101

102

10532110-1

10-1

100

10-2

10-3

10-4

10-5

101

0.05 0.1 0.25 0.5 1 2.5 5 10

(A) (B) (C)

Figure S7: Simulation tests of the inference scheme. Related to Figures 2, 3, 4, 5. (A,B) Distributions ofthe cumulative fitness flux 2NΦα and stabilizing strength cα inferred from simulated expression data are plottedagainst the simulation input parameters 2NΦin and cin (red line: median, box: 50% around the median, bar: 75%around median). This simulation analysis supports that the inferred gene-specific maximum likelihood values (Φα, cα)reported in Table S1 and shown in Fig. 3B are on average conservative estimates of the underlying evolutionaryparameters (cin,Φin). See section 5 of SI for simulation details. (C) Selection inference for epistatic traits. Simulationresults of the actual fitness flux (4) are compared to flux values inferred by the standard test based on the time-dependent (rescaled) divergence Ω(τ) (, see section 2 of SI). Both quantities are plotted against the strength ofepistasis, ε2, defined as the ratio of epistatic and additive trait variance (section 5 of SI); horizontal lines show theactual fitness flux without epistasis (ε2 = 0). Simulations are shown for selection parameters (c = 4.5, υ = 0.4)(green) and (c = 4.5, υ = 0) (blue). We conclude that our inference of adaptive evolution based on the aggregaterescaled divergence Ω(τ) (Fig. 2) is not confounded by trait epistasis. See section 5 of SI for simulation details.

8

Supplemental Procedures

1. Data and primary analysis

Sequence data and phylogenetic tree. Our inference procedure requires the following global sequence-based information (which does not include expression QTL):

(a) A phylogenetic tree of the 7 Drosophila species included in this study. Here we use the tree of theDrosophila 12 Genome Consortium (Drosophila 12 Genomes Consortium et al., 2007), which is basedon genome-wide divergence at synonymous sequence sites. This tree determines six clades of phyloge-netically related species (Fig. 1), which are used in our analysis of time-dependent expression divergence(Figs. 2 and 4A,5B).

(b) Divergence times between all pairs of species, scaled in units of the inverse neutral point mutation rate.The tree of Fig. 1 uses a lineage-specific mutation rate to infer the length of its 12 branches. The scaleddivergence time τij for a given species pair (i, j) is the sum of the lengths of the branches connectingthese species. The scaled divergence time of a clade C is defined as an average over species pairs,

τC =1

|C1||C \ C1|∑i∈C1

∑j∈C\C1

τij , (1)

where C is the set of species in the clade and (C1, C2) is the partitioning of this set defined by the rootnode.

An accurate inference of divergence times is an important prerequisite for our evolutionary analysisof gene expression. The times τij have been inferred in ref. (Drosophila 12 Genomes Consortiumet al., 2007) from synonymous sequence divergence, accounting for saturation effects due to multiplemutations. We can compare these times with the analogous times τij inferred from amino acid sequencedivergence, which have been used in a previous study (Bedford and Hartl, 2009). Fig. S2A shows ascatter plot (τij , τij) for all species pairs, and the clade divergence time (τC , τC), as defined in eq. (1).Compared to the molecular clock of neutral evolution, the amino acid times τij are seen to suffer fromsignificant inhomogeneities within the Drosophila genus. We conclude that the τij values provide anonlinear measure of divergence times, which is less suitable for evolutionary analysis than the timesτij inferred from synonymous sequence.

Expression data. We use genome-wide expression data from 7 Drosophila species obtained by ref. (Zhanget al., 2007) (GEO: GSE6640). These data are well suited for our analysis. They cover several clades ofspecies that are well comparable at the organismic level and sufficiently diverged for adaptive evolution ofexpression to be detectable (section 2). Moreover, Drosophila has larger effective population size, highermutation rates, and shorter generation times than typical mammalian species (Gilad et al., 2006a), and adap-tive evolution has been detected at the genomic level by several methods (Andolfatto, 2005; Mustonen andLassig, 2007; Sella et al., 2009). Hence, compared to more recent data from other species (Brawand et al.,2011; Perry et al., 2012; Tsankov et al., 2010), the Drosophila expression data of Zhang et al. (Zhang et al.,2007) are a suitable target for the inference of adaptive evolution. These data contain mRNA intensity mea-surements for a number of biological replicates (4 − 7) from the adult (5 − 7 days post eclosion) malesand females in each species. Specific microarray platforms were designed for each of these species, al-lowing for a reliable comparison of expression levels across species. Each platform has an array of probes

9

mapped to assembled genome sequences and to GLEANR gene annotations by the Drosophila 12 GenomesConsortium (Drosophila 12 Genomes Consortium et al., 2007), which also provides sequence homologytables. For each species, at least four hybridizations, including technical (dye-flipped) replicates for eachof the biological replicates were performed. We restrict the analysis to the 6332 genes that have unam-biguous one-to-one orthologs across all lines and are tested by at least four probes in each microarrayplatform. We obtain a set of expression levels Eαi,s,κ (defined as log2 intensities) labelled by gene numberα ∈ 1, . . . , g=6332, species i ∈ mel, sim, yak, ana, pse, vir, moj (Fig. 1), sex s ∈ m, f, and biolog-ical replicates κ ∈ 1, . . . , ki,s = 4− 7; biological replicates contain similar amounts of genetic material.The data contain two strains of D. simulans from the Tucson Drosophila Stock Center: (D. sim: 14021-0251.011, and D. sim: 14021-0251.198), which are used to estimate the genetic variance of expression (seebelow).

Transformation of expression levels. A measured raw-probe microarray signal is largely influenced bynon-biological factors, such as varying total RNA abundances, labeling and hybridization efficiency, thataffect all probes on a chip. The data provided by Zhang et al. (Zhang et al., 2007) is log-2 transformation ofthe intensities after a primary batch correction, using the method of variance stabilizing transformation (Hu-ber et al., 2002). Similar to previous evolutionary analysis on the same dataset (Bedford and Hartl, 2009),we perform a standard Z-transform normalization for each replicate, by defining a linear transformation ofthe intensities (Quackenbush, 2002),

Eαi,s,κ →Eαi,s,κ − 〈Ei,s,κ〉√

Vi,s,κ, (2)

where 〈Ei,s,κ〉 and Vi,s,κ denote mean and variance of the expression across all genes in a given replicate(i, s, κ). The transformed levels Eαi,s,κ are shifted to mean 0 and normalized to variance 1 across all genesin each biological replicate.

Evolutionary implications of the transformation. From an evolutionary point of view, any transfor-mation is a heuristic to make quantitative trait data more comparable between species. Specifically, thetransformation should minimize the ratio of non-evolutionary noise compared to evolutionary signal. Thevalidity of a specific transformation scheme has to be judged from consistency of the results. Here we showthat the Z transformation (2) produces a consistent evolutionary signal and its results are robust under quan-titative details of the transformation; we also verify that the species-specific assay of ref. (Zhang et al., 2007)does not generate spurious signal of divergence across species.

(a) The Z transformation captures evolutionary information. As shown in Fig. S1A (top panel), the averageintensities of probes across all genes 〈Ei,s,κ〉 are comparable between biological replicates of a givenstrain, but differ substantially among species and even between the two strains of D. simulans. Clus-tering of the expression intensities based on Euclidian distance between homologues across biologicalreplicates shows the masking of evolutionary information in the raw intensities of the probes: these in-tensities cluster together for replicates of the same species; however, cross-species differences betweenintensities of homologues lead to a scrambled phylogeny (Fig. S1A, center panel). This masking canalso be seen in a plot of the aggregate time-dependent rescaled divergence Ω defined in equation (9)(Fig. S1A, bottom panel). In contrast, for Z-transformed data, the clustering of gene expression levelsproduces a phylogeny that recovers the species clades of the sequence-based Drosophila phylogeny,

10

and the aggregate rescaled expression divergence Ω shows a consistent dependence on divergence time(Fig. S1B (bottom panel), cf. Fig. 2). Therefore, the Z transform is essential to capture the evolutionaryinformation in the data (Quackenbush, 2002).

(b) Results are robust under variants of the transformation. In order to test the sensitivity of our resultsto the specific choice of transformation, we performed the commonly used quantile normalization onthe raw expression intensities (Bolstad et al., 2003) (implemented in the R-package “preprocessCore”as the function “normalize.quantiles”). Quantile normalization forces the observed distributions of rawintensities to be the same across all replicates, and equal to the distribution obtained by taking theaverage of each quantile across samples. For quantile normalized expression intensities, clusteringagain recovers the clades of the sequence-based Drosophila phylogeny, and we obtain aggregate time-dependent divergence Ω(τ) very similar to those of Z-transformed data (Fig. S1C). In this paper, weuse the Z-transformation as normalization method, because it is a more conservative choice in pre-processing of data and it does not homogenize the expression distributions across species.

(c) Absence of “technical divergence”. The expression levels were measured by species-specific microarrayplatforms (Zhang et al., 2007) designed to eliminate confounding effects of sequence divergence onhybridization and hence, to make expression levels suitable for cross-species comparison. We can testthis property in the Z-transformed data. If the assay has technical bias, we would expect its effects to bemore pronounced in genes with higher level of sequence divergence. Specifically, assume an imperfectassay with a hybridization bias towards a given species i∗ and stochastic degradation of sensitivity inother species. In a minimal model, the technical effect ∆E of an amino acid mutation is a stochasticvariable with mean A and variance B. The resulting observed expression level follows a biased randomwalk dependent on the sequence divergence (mismatch density) dαii∗ ,

Eαi = Eαi∗ −Adαii∗ + χαi with 〈χαi 〉 = 0, 〈χαi χαi 〉 = Bdαii∗ (3)

and positive constants A,B. A biased assay generates a “technical” aggregate expression divergence〈Dii∗〉 = A2〈d2

ii∗〉 + B〈dii∗〉 that would confound our evolutionary analysis. In addition, it leads tospecies-dependent aggregate mean expression levels 〈Ei〉 = 〈Ei∗〉 − A〈dii∗〉. The Z-transformationeliminates the aggregate bias over all genes, but species-dependent averages 〈Ei〉 would still be observ-able in classes of genes with high and low sequence divergence. In Fig. S1D (top), we plot 〈Ei〉 in theclasses of genes with 10% highest, 20% medium and 10% lowest level of sequence divergence (mea-sured by the total branch length of gene-specific phylogenies inferred from the divergence of synony-mous sites across orthologues (Drosophila 12 Genomes Consortium et al., 2007)). Fig. S1D (bottom)shows the same average for two classes of genes studied in this paper, highly-expressed genes and male-biased genes (as defined below). In all classes, we find no significant species dependence of the classaverages 〈Ei〉. In addition, Fig. S1E shows the distributions of clade-specific expression divergence DC(unscaled) for gene classes with varying levels of sequence divergence (similar to Fig. S1D). We find nosignificant difference in the expression divergence distributions across such gene classes, indicating thatour inference of adaptation based on gene expression divergence in Fig. 2 is not prompted by technicaldivergences in the assay. Overall, we conclude that the assay is free of technical divergence.

Expression statistics within and across species. Using the normalized expression levels, we can defineaverages and natural variation of expression at three different levels:

11

(a) The mean and (unbiased) variance of expression across biological replicates characterize the distributionof expression levels for a given genotype. Here we estimate these quantities from the data of eachreplicates,

Eαi,s =1

ki

∑κ

Eαi,s,κ, δαi,s =1

ki − 1

∑κ

(Eαi,s,κ − Eαi,s)2, (4)

and we define the sample mean and variance,

Eαi =1

2(Eαi,m + Eαi,f ), δαi =

1

4(δαi,m + δαi,f ). (5)

(b) The genetic mean and diversity of expression characterize the distribution of heritable expression dif-ferences in a population. Heritable components of quantitative traits are often inferred from “commongarden” breeding experiments under standardized environmental conditions. The genetic mean anddiversity for a given gene are defined in terms of the data within one species,

Γαi = Eαi ± SEΓ, ∆αi = VarEαi −

1

kiδαi , (6)

where SEΓ is the standard error for estimating the population mean expression from ni geneticallyindependent samples in species i, each of which is an average over ki independent biological replicates,SEΓ = ((∆α

i + δαi /ki)/ni)1/2. The unbiased estimate of variance among genetically distinct samples

is denoted by VarEαi ; the standard error for estimating the expected expression value of each geneticsample from its ki biological replicates propagates in evaluating the expression diversity within eachspecies ∆α

i as given by equation (6). The data of ref. (Zhang et al., 2007) limit the direct information ondiversity to a broad estimate from two D. simulans strains, ∆α

sim = 12(Eαsim1 − Eαsim2)2 − δαsim/ksim.

Therefore, we infer the aggregate diversity 〈∆〉 self-consistently from the model parameters, using thepattern of gene expression divergence (Fig. 2) and the sequence heterozygosity; see equation (20) below.We use the estimated diversity to determine the sampling error of the observed expression divergenceD. Consistently, the model estimate for the expression diversity in D. simulans is very similar to theobserved value 〈∆sim〉 (section 2).

Similarly, we define the expression dimorphism between males and females in each species,

∆αi,mf =

1

2(Eαi,m − Eαi,f )2 − 1

kiδαi . (7)

(c) The expression divergence is defined as the squared difference between population means,Dαij = (Γαi −

Γαj )2, and characterizes evolutionary expression differences between two species. Here we estimate thedivergence for a given gene from the cross-species data, accounting for propagation of error in evaluatingthe species average gene expression level,

Dαij = (Eαi − Eαj )2 − 2〈∆〉 − 1

kiδαi −

1

kjδαj . (8)

Here we have substituted the species expression diversity by the model fit parameter of aggregate diver-sity 〈∆〉 and have set the number of genetically independent samples to ni = 1.

12

Equations (6) and (8) follow Wright’s decomposition of the variance of a quantitative trait into intra- andinter-species components (Wright, 1950), which underlies the quantitative genetics summary statistics FSTand QST (see section 2). For the analysis of sex-specific evolution (section 4), we use the same rationale forthe sex-specificity traits Eαi,mf = Eαi,m − Eαi,f .

In Fig. S2B, we compare gene-averaged values of expression variance across biological replicates, diver-sity, dimorphism and divergence (these averages are denoted by angular brackets), as well as the cross-genevariance of expression. We find a clear ranking 〈δi〉 < 〈∆sim〉 . 〈∆i,mf 〉 < 〈Dij〉 < Vi for all species iand j, where Vi = 〈Γ2

i 〉 ≈ 1 by our normalization. In the Ω test for selection on gene expression, we usedivergence estimates given by equation (8) in aggregate measures across groups of species and classes ofgenes. However, our data set has a low number of genetic samples per species. Hence, single-gene estimatesof diversity and divergence are noisy, which calls for a probabilistic inference of selection. The Ω test andits probabilistic extension for individual genes are described in section 2.

Time-dependent aggregate (rescaled) divergence, Ω. The aggregate expression divergence Ωij for agiven species pair (i, j) is defined as

Ωij =〈Dij〉D0

(9)

The gene-specific expression divergence Dαij is given by equation (8). Angular brackets denote averages

over all genes in our dataset, 〈Dij〉 = 1g

∑αD

αij . The denominator D0 = limτ→∞〈D(τ, c = 0)〉 is chosen

such that the rescaled trait divergence Ωij = 1 for neutral evolution in the limit of long divergence times(section 2). The asymptotic averaged divergence in neutrality D0 (9) is related to the scale E2

0 , previouslydefined as the average genetic variation of trait in the long-term limit of neutral evolution in ref. (Nourmo-hammad et al., 2013b), by D0 = 2E2

0 . The rescaled divergence ΩC for a species clade C is defined as anaverage over species pairs,

ΩC =1

|C1||C \ C1|∑i∈C1

∑j∈C\C1

Ωij , (10)

in analogy with the definition (1) of clade divergence times. We also define aggregate divergence ΩGij andΩGC for specific gene classes G, using restricted averages 〈. . . 〉G .

2. Inference of selection on gene expression

Evolutionary model. We consider the evolution of gene expression levels under genetic drift, mutation,and selection given by a fitness model with peak displacements on macro-evolutionary time scales. In theminimal seascape model (Held et al., 2014; Nourmohammad et al., 2013a), the fitness of a given genedepends on its expression level E and on evolutionary time t,

f(E, t) = f∗ − c0

(E − E∗(t)

)2. (11)

The expression value of maximum fitness, E∗(t), performs an Ornstein-Uhlenbeck random walk with dif-fusion constant υ0, average value E and stationary mean square deviation r2D0/2, where r2 is a constant oforder 1, and D2

0 is the trait scale introduced in eq. 9. This process is defined by the Langevin equation

d

dtE∗(t) = − υ0

r2D0(E∗(t)− E) + η(t), (12)

13

where η(t) is the random variable of a delta-correlated Gaussian process with average 0 and variance υ0.These random variables are assumed to be independent for each gene and on each lineage. The Ornstein-Uhlenbeck fitness seascape should not be confused with a previous Ornstein-Uhlenbeck model for the evolu-tion of quantitative traits under stabilizing selection (Beaulieu et al., 2012; Bedford and Hartl, 2009; Butlerand King, 2004; Hansen, 1997; Hansen et al., 2008; Kalinka et al., 2010; Rohlfs et al., 2014) (a detailedcomparison is given below).

The minimal seascape model captures two kinds of selection on gene expression in a unified way:

(a) Stabilizing selection. This type of selection constrains the intra- and inter-population variation of ex-pression levels to values around E∗(t). We define the dimensionless stabilizing strength

c = N D0 c0, (13)

where N is the effective population size. In the limit case υ0 = 0, the fitness seascape reduces toa static fitness landscape, f(E) = f∗ − c0(E − E∗)2, and stabilizing selection is the only selectiveforce. This provides a simple interpretation of the selection parameter c: it compares the (hypothetical)genetic load c0D0/2 of a neutrally evolving trait evaluated in the landscape f(E) and the actual geneticload 1/2N in the same landscape, assuming a mutation-selection-drift equilibrium at low mutationrates (Nourmohammad et al., 2013a). This parameter signals the regimes of weak (c . 1) and strong(c & 1) stabilizing selection (Nourmohammad et al., 2013b).

(b) Directional selection. In a fitness seascape, this type of selection triggers adaptive response of thepopulation mean trait in the direction of fitness peak displacements. We define the scaled driving rate

υ =2υ0

µD0. (14)

This parameter measures mean square displacement of the fitness peak, in units of trait scale D0 andper unit 1/µ of evolutionary time. In macro-evolutionary seascapes, υ is sufficiently low for populationto follow fitness peak displacements; such seascapes are a joint model of stabilizing and directionalselection (Held et al., 2014). The values of υ inferred from our data fall in this regime (see section 2).Because the seascape dynamics is a short-range Markov process, the mean square peak displacementover a scaled evolutionary time τ is then simply D0 υτ/2. (Here we express υ in units of µ and τ inunits of 1/µ, which differs slightly from the notation in refs. (Held et al., 2014; Nourmohammad et al.,2013a).) In the long-term regime υτ r2, the fitness peak dynamics becomes stationary with meanE and variance r2D0/2. This regime turns out to be well beyond the divergence times in our speciessample. Hence, the statistics of Drosophila gene expression levels and our inference of selection areindependent of r2.

Fitness flux. This measure of adaptation is defined as the speed of movement on a fitness land- or seascapeby genotype or heritable phenotype changes in a population (Held et al., 2014; Mustonen and Lassig, 2010).The cumulative fitness flux associated with the population mean expression level Γ(t) of a gene in a fitnessseascape f(E, t) is given by

Φ(τ) =

∫ τ

t=0

∂f(Γ, t)

∂Γ

dΓ(t)

dtdt. (15)

This quantity measures the total amount of adaptation over a macro-evolutionary period τ in a populationhistory. This quantity satisfies the fitness flux theorem (Mustonen and Lassig, 2010), which generalizes the

14

Fisher’s fundamental theorem of natural selection to mutation-selection-drift processes. As shown by the fit-ness flux theorem, the average cumulative fitness flux over parallel evolutionary histories, in units of 1/2N ,measures the importance of adaptation compared to genetic drift: adaptation is substantial if 〈2NΦ(τ)〉 & 1.For a stationary adaptive process in the minimal seascape (11), the average scaled cumulative fitness fluxtakes the simple form (Held et al., 2014; Nourmohammad et al., 2013a)

〈2NΦ(τ)〉 ' 2cυ τ, (16)

up to factors of order π0. The exact functional form of the fitness flux is given in reference (Held et al.,2014). A population evolving under strong stabilizing selection i.e., in a sharply peaked fitness seascape(c 1), follows the movements of the fitness peak, measured by the driving rate υ, more closely, and,hence, accumulates a larger fitness flux over time. Therefore, it is intuitive that the averaged cumulativefitness flux of the population eq. 16 is proportional to the product of the stabilizing strength c and the drivingrate υ.

The cumulative fitness flux is closely related to the time-dependent fraction of expression divergencethat is adaptive, ωad(τ) (equation 24). We introduce the shorthand Φ = Φ(τDros.) with τDros. = 1.4 (Fig. 1);this quantity measures the amount of adaptation across the Drosophila genus. By the probabilistic inferencemethod discussed below, we obtain expectation values 2NΦα of the rescaled fitness flux for individualgenes over the divergence time of the Drosophila genus (equation 32). We use these values to describe theoverall statistics of expression adaptation (Fig. 3A), to infer differences in adaptation between gene classes(Fig. 4; Table 1), and to define significantly adaptive genes (using a threshold 2NΦα > 4; Table S1). For theanalysis of sex-specific adaptation (Fig. 5), we define an analogous fitness flux 2NΦmf for sex-specificitytraits (section 4).

Evolutionary modes of quantitative traits. In the minimal seascape model, the aggregate time-dependent(rescaled) divergence Ω defined by equation (9) depends on the divergence time τ and on the selectionparameters of stabilizing strength c and driving rate υ; the exact form of this function is given in ref. (Heldet al., 2014). We can use the behavior of the rescaled time-dependent divergence Ω to distinguish threemodes of evolution:

(a) Neutral evolution (c = 0). The rescaled trait divergence has an initially linear increase due to mutationsand genetic drift, and it approaches a maximum value 1 with a scaled relaxation time of 1,

Ω0(τ) 'τ for τ 1,1 for τ 1.

(17)

(b) Evolution under stabilizing selection (c & 1, υ = 0). In a static fitness landscape, the rescaled traitdivergence approaches a smaller maximum value, Ωstab(c) < 1, with a proportionally shorter relaxationtime (Held et al., 2014),

Ωeq(τ) '

[1 + G(c)] τ for τ Ωstab(c)Ωstab(c) for τ Ωstab(c).

(18)

Over a wide range of evolutionary parameters, which includes the inferred values for the data set of thisstudy, the maximum value depends on the stabilizing strength in a simple way, Ωstab(c) ∼ 1/(2c), withcorrections for weaker selection and for larger nucleotide sequence diversity (Nourmohammad et al.,

15

2013b). The factor [1+G(c)] captures the short-time constraint on the trait divergence due to stabilizingselection, compared to neutrality (eq. 17). The functional form of G(c) is given explicitly in ref. (Nour-mohammad et al., 2013b). Over a wide range of the stabilizing strength c, this constraint remains weakand Ω(τ) evolves near neutrality (Nourmohammad et al., 2013b), as long as τ Ωstab(c).

(c) Adaptive evolution under stabilizing and directional selection (c & 1, υ > 0). In a genuine fitnessseascape, the divergence acquires an adaptive component,

Ω(τ) = Ωeq(τ) + Ωad(τ) =

[1 + G(c)] τ for τ Ωstab(c)Ωstab(c) + 1

2υ [τ − 2Ωstab(c)], for τ Ωstab(c),(19)

with corrections for τ approaching the saturation time of fitness peak displacements, r2/υ. The fullanalytical form of the functions Ω0(τ) (equation 17), Ωeq(τ) (equation 18), and Ω(τ) (equation 19) isgiven in refs. (Held et al., 2014; Nourmohammad et al., 2013a).

Moreover, our analysis shows that the trait scale D0 equals, up to a selection-dependent coefficient, the ratioof the expected trait diversity 〈∆〉 and the neutral sequence diversity π0 within a given species,

D0 =〈∆〉(c)π0

[1 + G(c)]−1. (20)

Importantly, the relation (20) is robust under changes of the effective population size. We expect suchchanges to affect 〈∆〉 and π0 in the same way but to leave their ratio invariant. This is consistent withthe role of D0 in our macroevolutionary analysis. At neutrality, D0 = 〈∆〉0/π0 is simply the mutationalvariance of a quantitative trait, as defined in refs. (Chakraborty and Nei, 1982; Lynch and Hill, 1986; Lynchand Walsh, 1998), up to a rescaling of evolutionary time to units of the inverse point mutation rate 1/µ.This relation remains approximately valid under stabilizing selection over a wide range of parameters c, forwhich G(c) 1 (Nourmohammad et al., 2013b). This implies a universal quasi-neutral short-term behaviorof the divergence (Held et al., 2014; Nourmohammad et al., 2013a),

〈D(τ)〉 ' D0[1 + G(c)] =〈∆〉(c)π0

τ for τ Ωstab(c). (21)

Ω-test for selection on quantitative traits. The time-dependence of divergence provides a joint test forstabilizing and directional selection on quantitative traits. We can infer the selection parameters of a seascapemodel by fitting the function Ω(τ) (equation 19) and the corresponding trait scale D0 (equation 9) to data(τ,D). This method has the following properties:

(a) The inference of selection requires data on time-dependent divergence (τ,D) from species with differentdivergence times in the regime τ & Ωstab. In the quasi-neutral regime τ . Ωstab, the rescaled time-dependent divergence Ω is insensitive to selection (equations 17–19).

(b) By the decomposition (equation 19), the time-dependent ratio

ωad(τ) =Ωad(τ)

Ω(τ)≡ 〈Dad(τ)〉〈D(τ)〉

(22)

defines the adaptive fraction of the trait divergence. The complementary fraction, 1−ωad(τ), is attributedto genetic drift under stabilizing selection.

16

(c) We can approximate the divergence (equation 19) by the linear form Ω(τ) ≈ Ωstab + Ωad(τ) = Ωstab +υτ/2. Therefore, already a linear fit to data produces simple estimates of stabilizing strength and drivingrate,

c ≈ 1

2Ωstab, υ ≈ 2Ωad(τ)

τ, (23)

and infers the adaptive fraction of expression divergence, which is related to the average scaled fitnessflux (Held et al., 2014) (equation 16),

ωad(τ) ≈ Ω(τ)− Ωstab

Ω(τ), 〈2NΦ(τ)〉 ≈ 2Ωad(τ)

Ω− Ωad(τ)(24)

The quantities ωad(τ) and 〈2NΦ(τ)〉 are independent of the trait scale D0.

(d) The rescaled trait divergence Ω (equation 9) decouples from the genetic basis of the trait. Specifi-cally, it depends only weakly on the number and effect size of the underlying QTL (Held et al., 2014;Nourmohammad et al., 2013b), on the amount of recombination between these sites (Held et al., 2014;Nourmohammad et al., 2013b), and on the nonlinearities in the genotype-phenotype map (trait epista-sis; see section 5 and Fig. S7C). The time dependent Ω(τ) also decouples from details of the selectiondynamics; it can be applied to punctuated adaptive processes, which have fewer and larger peak dis-placements (Held et al., 2014) (section 3).

(e) A variant of the Ω test consists in directly inferring D0 from diversity data in any species of the dataset by equation 20, as discussed previously in ref. (Held et al., 2014). Given the scarce information ontrait diversity in our data set, we do not use this version of the test for our inference of selection in thepresent paper (however, we perform a consistency check based on diversity estimates in D. simulans).

Comparison of the Ω test with related methods. Our inference method for selection on quantitative traitscan be compared with three well-known selection tests for phenotypic and genomic data:

(a) QST/FST ratio test for selection on quantitative traits. The summary statistics FST and QST measure theexpected fraction of the total genetic variation harbored in a pair of populations that can be attributedto the divergence between these populations; the complementary fraction is attributed to the diversitywithin populations. FST refers to neutrally evolving sequence loci (Lande, 1992; Wright, 1943, 1950),which can be regarded as a “pseudo-trait” with aggregate divergence and diversity. QST is the analo-gous measure for quantitative traits under selection (Spitze, 1993). The expected dependence of thesemeasures on divergence time can be expressed in terms of the rescaled divergence Ω (equation 9),

FST(τ) =〈D(τ)〉0

〈D(τ)〉0 + 2〈∆〉0' Ω0(τ)

Ω0(τ) + 2π0(25)

QST(τ) =〈D(τ)〉

〈D(τ)〉+ 2〈∆〉' Ω(τ)

Ω(τ) + 2π0(26)

where we use expectation values 〈. . . 〉 in an ensemble of parallel-evolving populations and the subscript0 refers to neutral evolution. The QST/FST test (Leinonen et al., 2013) stipulates that a quantitative traitis evolving at neutrality if QST/FST = 1, under stabilizing selection if QST/FST < 1, and under direc-tional selection if QST/FST > 1. The data set of this study shows aggregate values QST/FST between0.6 for the mel-sim clade and 0.8 across the entire Drosophila genus; these values are obtained using

17

equations (10), (25), and (26). Hence, this test signals broad stabilizing selection but no directionalselection. In contrast, the time-dependent divergence test infers both stabilizing and directional selec-tion from the linear dependence Ω(τ) (Fig. 2 and equation 19). This inference shows a conceptuallyimportant point: stabilizing and directional selection are not mutually exclusive, but joint features ofselection on macro-evolutionary time scales.

The QST/FST test infers adaptive evolution under quite restrictive conditions: QST/FST > 1, or equiv-alently Ω > Ω0, implies that directional selection is the dominant selection component for short di-vergence times. In the seascape model, this requires large driving rates (υ & 1), sufficiently large peakdisplacement amplitudes r, and sufficiently large stabilizing strength c (Held et al., 2014). Hence, valuesQST/FST > 1 are most likely to be observed for individual traits that have undergone a large shift of theoptimal trait value in their recent evolutionary history, which is in accordance with data from a numberof studies; see (Le Corre and Kremer, 2012; Leinonen et al., 2013) for comprehensive review. In thisstudy, which uses aggregate data over large classes of genes, we do not expect, and do not find, valuesQST/FST > 1. We note that on sufficiently short time scales, the QST/FST and the test based on thetrait divergence are always insensitive to selection, because the trait divergence is in the quasi-neutralregime (equation 21).

(b) Ornstein-Uhlenbeck model for quantitative trait evolution. This phenomenological model describes aquantitative trait evolving under genetic drift and stabilizing selection (Beaulieu et al., 2012; Butlerand King, 2004; Hansen, 1997; Hansen et al., 2008) and has been applied to the evolution of geneexpression (Bedford and Hartl, 2009; Kalinka et al., 2010; Rohlfs et al., 2014) (a detailed comparisonwith the results of ref. (Bedford and Hartl, 2009) is given below). The model is defined by a Langevinequation for the population mean trait,

d

dtΓ(t) = −λ (Γ− E∗) + ηΓ(t), (27)

where ηΓ(t) is the random variable of a delta-correlated Gaussian process with average 0 and varianceσ2/N . The model constants λ and σ2 are usually regarded as independent fit parameters. The Ornstein-Uhlenbeck dynamics of the population mean trait Γ(t) around a fixed optimal trait value E∗ (equation27) should not be confused with the Ornstein-Uhlenbeck dynamics of the time-dependent optimumE∗(t) in our seascape model (equation 11).

A Langevin equation similar to (27) can be derived from more general population-genetic models forthe evolution of a quantitative trait E in a static fitness landscape f(E) = −c0 (E − E∗)2, which havebeen introduced in refs. (de Vladar and Barton, 2011; Nourmohammad et al., 2013b). In these models,the population mean trait follows the Ornstein-Uhlenbeck process

d

dtΓ(t) = −2〈∆〉 c0 (Γ− E∗)− 2µ (Γ− Γ0) + ηΓ(t), (28)

where Γ0 is the genetic mean trait in the long-term limit of neutral evolution and ηΓ(t) is the randomvariable of a delta-correlated Gaussian process with average 0 and variance 〈∆〉/N . Comparison withequation (27) determines the Ornstein-Uhlenbeck coefficients in terms of the stabilizing strength and theaverage trait diversity (λ = 2〈∆〉 c0, σ2 = 〈∆〉). Equation (28) contains an additional mutational term(−2µ)(Γ − Γ0), which implies that the expectation value 〈Γ〉 differs from the optimum trait value E∗.We note that the diffusion constant 〈∆〉/N determines the behavior of the trait divergence (equation 21),and of the QST/FST ratio in the quasi-neutral regime (τ Ωstab). The Ornstein-Uhlenbeck model has

18

been generalized to account for lineage-specific stabilizing selection in a phylogeny (Beaulieu et al.,2012; Butler and King, 2004; Hansen, 1997; Hansen et al., 2008; Kalinka et al., 2010; Rohlfs et al.,2014); however, inferring independent selection parameters for each lineage may lead to overfitting ofour data set. Instead, we use the seascape model (11) to infer lineage- and gene-specific changes of thetrait optimum E∗(t) using a single additional selection parameter υ.

(c) McDonald-Kreitman test for adaptive sequence evolution (McDonald and Kreitman, 1991). The sequence-based test for selection evaluates the ratio of the cross- to intra-species sequence variation for a sequenceclass under putative selection (e.g., non-synonymous mutations in protein-coding sequence) and com-pares it to the analogous ratio for bona fide neutral changes (e.g., synonymous mutations.) Positiveselection in the query sequence is inferred if ratio in the sequence class is larger than that of the neutralexpectation. In contrast, the selection test for quantitative traits based on the time-dependent divergenceΩ(τ) requires only data from traits under selection, but from three or more species with divergencetimes beyond the equilibrium relaxation time Ωstab. These differences highlight distinct evolutionarycharacteristics of quantitative traits. First, such traits have a quasi-neutral regime of macro-evolutionarydivergence times (equation 21) that has no direct analogue in sequence evolution (Nourmohammadet al., 2013b). Second, in most cases we do not have a gauge of neutrally evolving traits analogous tosynonymous sequence.

Application of the Ω test to gene expression data. We apply the Ω test to the Drosophila gene expressiondata of ref. (Zhang et al., 2007) as follows. To evaluate rescaled expression divergence data (τC ,ΩC) forsix Drosophila species clades (equations 1 and 10), we estimate the aggregate unscaled gene expressiondivergence across clades, 〈DC〉, and fit the model function 〈D(τ)〉 to these data. This fit contains the threeparameters (D0, c, υ). The time scale of stabilizing selection observed in the data (i.e., the first bend in thedivergence curve) equals 〈Dstab〉 ∼ D0/c. We treat the trait scale D0 as an additional fit parameter anddetermine this scale by assuming that the saturation due to stabilizing selection occurs at the latest possible(i.e., for the largest Ωstab) consistent with the data, resulting in a best model with conservative estimates ofstabilizing strength c. The rescaled expression divergence with the inferred trait scale, Ω(τ) = 〈D(τ)〉/D0,is plotted in Fig. 2. The best-fit seascape model has parameters (c∗ = 18.4, υ∗ = 0.08) (green line in Fig. 2);this model explains the divergence data and produces evidence for adaptive evolution of gene expression.Using the decomposition into adaptive and drift components (green and blue shaded areas), we obtain acumulative fitness flux 〈2NΦ〉 = 3.8 across the entire Drosophila genus (equations 23 and 24). We notethat the inference of fitness flux decouples from the scale D0. The probabilistic extension of this test toindividual genes is discussed below.

As a consistency check, we use the aggregate expression diversity in D. simulans, 〈∆〉sim (estimatedfrom two strains in this data set), and the average heterozygosity of synonymous sites in the D. simulanspopulation, π0 = 0.018 (Begun et al., 2007), to estimate the trait scale D0 from equation (20). The resultingoptimal seascape model has parameters (c = 18.6, υ = 0.07), which are very similar to the values quotedabove (see Fig. S3A).

Control analysis of equilibrium models. We can also compare the aggregate expression data (τC ,ΩC) tofitness landscape models of time-independent stabilizing selection:

(a) We can infer a landscape model from the divergence data. This model variant has two independentparameters, the stabilizing strength c and the trait scale D0, which we set to its maximum fitted value

19

to obtain a conservative estimate of stabilizing strength. It provides a significantly worse fit to the datathan the seascape model (Fig. S3A). In particular, it cannot explain the pattern of expression diver-gence between close species. The model predicts a quasi-neutral linear growth of the divergence withDmel−yak/Dmel−sim ≈ τmel−yak/τmel−sim ≈ 2 (equation 21), which drastically overestimates the ob-served ratio Dmel−yak/Dmel−sim ≈ 1.2. This model also fails to infer substantial stabilizing selection(ceq = 0.25).

(b) With divergence and diversity inferred from data, the landscape model has a single free parameter, thestabilizing strength c. In contrast to the seascape model, the best-fit landscape model provides a poor fitto the data (Fig. S3A). It captures the average rescaled divergence Ω across the Drosophila clades, butfails to describe the systematic amplitude differences between these clades. In particular, the landscapemodel drastically overestimates the divergence of close species, Dmel−yak and Dmel−sim. Compared tothe landscape model with fitted trait scale, this model variant also produces a worst fit to the data. Theprobabilistic analysis reported in equation (35) that both equilibrium models have a significantly lowerlikelihood than the seascape model.

Comparison with a previous study. Bedford and Hartl (BH) (Bedford and Hartl, 2009) analyze aggre-gate expression levels from the same data set and fit these data to an Ornstein-Uhlenbeck model of evolutionunder stabilizing selection (Hansen, 1997) (equation 27), which is closely related to our landscape modelinferred solely from divergence data. The Ornstein-Uhlenbeck model cannot infer adaptation; it assumesstabilizing selection and has fit parameters that determine the equilibration time and the level of satura-tion. Based on this model, BH infer stabilizing selection on expression levels, and they report an apparentsaturation of gene expression divergence. This saturation is at variance with the linear growth on timescales beyond the divergence time of D. melanogaster and D. simulans, which is inferred in Fig. 2 and inref. (Zhang et al., 2007). The analysis of ref. (Bedford and Hartl, 2009) presents the following issues of dataanalysis and of interpretation of the results:

(a) BH (Bedford and Hartl, 2009) use amino acid distances in their phylogeny. These distances are affectedby selection (Smith and Eyre-Walker, 2002). As shown in Fig. S2A, they produce a nonlinear measureof divergence times, τij , which is less suitable for evolutionary analysis than the times τij inferred fromsynonymous sequence (Drosophila 12 Genomes Consortium et al., 2007) that are used in this study. Totest the influence of amino acid sequence divergence on our inference of adaptive evolution, we repeatthe analysis with this variant of the phylogeny. We find the same ranking of models: the seascape modelexplains the data significantly better than landscape models, none of which provides a satisfactory fit tothe divergence between close species (Fig. S3B). The probabilistic analysis of equation (35) confirmsthis ranking. At the same time, it displays that amino acid times are suboptimal: they lead to a significantlikelihood cost for all models. We conclude that our inference of adaptive evolution is robust undervariations of the sequence-based phylogenies.

(b) BH (Bedford and Hartl, 2009) analyze expression divergence for pairs of species, while we group thespecies into clades (Fig. 2). These differences lead to a more noisy dependence of the expression di-vergence data on evolutionary time (Fig. S1C) and make a straightforward distinction of conservationand adaptation more difficult at the level of aggregate data. Moreover, pairwise expression divergencedata are strongly correlated through the structure of the phylogeny, which is apparent from the clus-tering of these data (open squares in Fig. S3A,B). Clade-specific divergence data are statistically moreindependent, which allows for meaningful error analysis and model ranking.

20

(c) The saturation of expression levels claimed in ref. (Bedford and Hartl, 2009) is ascribed to time scalessimilar to the neutral relaxation time (τ ∼ 1 in units of the inverse neutral point mutation rate, dashedline in Fig. S3B). Under any Ornstein-Uhlenbeck or landscape model, this pattern would imply weakstabilizing selection (ceq = 0.25 in the landscape model) and weak constraint on gene expression di-vergence. That is, gene expression would evolve near neutrality throughout the Drosophila genus: thedivergence would be 87% of the neutral divergence for τmel−sim and 68% for τDros..

Probabilistic inference of selection. Here we describe the extension of our selection inference methodto expression data of individual genes. A minimal seascape model is determined by the parameters (c, υ)or equivalently by (c,Φ), where Φ = 2cυτDros./2N denotes the expected cumulative fitness flux over thegenus divergence time (equation 16). We derive a posterior probability distribution Q(c,Φ |Eα), whereE = (Eα1 , . . . , E

α7 ) denotes the expression levels of gene α in the 7 species of our data set. This derivation

consists of three steps: we obtain the probability distribution Q(Γ | c,Φ) of population mean traits Γα =(Γα1 , . . . ,Γ

α7 ) in a given seascape model, we include sampling effects to determine the distributionQ(E |Φ),

and we use Bayes’ theorem to infer the posterior distribution Q(c,Φ |Eα).The basic building block of evolutionary statistics in the minimal seascape model has been derived pre-

viously (Held et al., 2014): the lineage propagator Gτ (Γ, E∗|Γa, E∗a) is the probability density of meanand optimal trait values (Γ, E∗), given the values (Γa, E

∗a) in an ancestral population at scaled evolution-

ary distance τ . The lineage propagator is related to the stationary distribution of the seascape dynamics,Qstat(Γ, E

∗) = limτ→∞Gτ (Γ, E∗|Γa, E∗a). Both distributions are Gaussian functions that depend on theseascape model parameters and on the neutral variance (trait scale) D0; their detailed analytical form isgiven in equations (30)–(33) and (A.15)–(A.20) of ref. (Held et al., 2014). The probability distribution ofpopulation mean traits across the Drosophila genus is the stationary distribution for its last common ancestormultiplied by the lineage propagators for all branches of the phylogeny; this expression is to be integratedover all unknown expression levels. Specifically, we obtain

Q(Γα | c,Φ, D0) =

∫Qstat(Γ

αl , E

∗l )

l−1∏i=1

Gτ(i)(Γαi , E

∗i |Γαa(i), E

∗a(i)) dΓαk+1 . . . dΓαl dE∗1 . . . dEl, (29)

where i = 1, . . . , k labels the extant species and i = k + 1, . . . , l the clade ancestor species (with l =2k − 1 = 13 and the index l referring to the last common ancestor of all species), a(i) denotes the closestancestor of species i, and τ(i) is the scaled length of the branch between i and a(i). The deviations of theexpression measurements Eαi from the population mean trait Γαi can be described by a Gaussian samplingerror model with variance ∆α

i + (δαi /ki), as given by equation (6). We obtain

Q(Eα | c,Φ, D0) =1

Z

∫Q(Γα | c,Φ) exp

[−1

2

(Eαi − Γαi )2

∆αi + (δαi /ki)

]dΓα1 . . . dΓαk , (30)

where Z is a normalization factor. This multi-variate Gaussian integral can be evaluated in a straightforwardway by the saddle point method. Here we approximate the noisy cross-replicate variance of individual genesby the species average δi. Due to the limited data on heritable species-specific expression diversity ∆α

i , weuse the expected functional form of the trait diversity dependent on the trait scaleD0 and stabilizing strengthc, as given by eq. 20 and ref. (Nourmohammad et al., 2013b). Finally, Bayes’ theorem gives the posteriordistribution

Q(c,Φ |Eα) =

∫Q(Eα | c,Φ)P0(c,Φ) dD0∫

Q(Eα | c,Φ)P0(c,Φ) dD0 dc dΦ, (31)

21

where P0(c,Φ) denotes the prior distribution of seascape parameters. This distribution determines the max-imum likelihood posterior values of stabilizing strength, fitness flux, and adaptive fraction of expressiondivergence,

(cα,Φα) = arg maxc,Φ

Q(c,Φ |Eα), ωαad(τ) =(τ/τDros.) Φα

(τ/τDros.) Φα + 1/N; (32)

see equation (24). In equation (31), we use a prior distribution P0(c,Φ) ∼ exp(−ac − bΦ) with Lagrangemultipliers a, b that calibrate the average posterior values 〈c〉 and 〈Φ〉 over all genes to our inference fromaggregate data (see above). This choice generates a conservative inference of gene-specific seascape param-eters that reflects two statistical features of our data. First, gene data E explained by a seascape model withparameters (c,Φ) and a neutral trait variance D0 (see text above and ref. (Nourmohammad et al., 2013b))have a similar likelihood in a family of models with parameters (λc,Φ) and neutral trait variance λD0,where λ > 0 is a rescaling factor, as long as the stabilizing strength is above some minimum value. In otherwords, there is a residual freedom in model parameters that leaves the fitness flux Φ invariant. This freedomexists because the gene-specific diversity values ∆α

i are too noisy to be included in the inference. Our priordistribution favors posterior values c close to the minimum stabilizing strength, which are consistent withthe inference from aggregate data. Second, the distribution (29) has an algebraic tail, Q(E | c,Φ) ∼ Φ−1 for2NΦ 1, which is caused by the diffusive dynamics of the fitness peak. Our prior distribution suppressesthis tail and favors posterior values Φ close to the maximum-likelihood value Φ∗. The validation of thisinference scheme by simulation tests is described in section 5.

Statistical significance of the inference. The probabilistic extension of the Ω plays an important role inour global inference: to quantify the statistical significance of our evidence for adaptive evolution underdirectional selection. Specifically, we evaluate the cumulative log-likelihood score for all genes of ourdataset as a function of the evolutionary variables of stabilizing strength c and cumulative fitness flux Φ,

S(c,Φ) =

g∑α=1

logQ(c,Φ |Eα), (33)

where Q(c,Φ |Eα) is given by equation (31). This function is shown in Fig. 3B with its maximum shiftedto 0. The global maximum-likelihood seascape model with divergence and diversity estimated from datahas parameters

(c∗,Φ∗) = arg maxc,Φ

S(c,Φ) =

(18.4,

3.8

2N

), (34)

We can use the log-likelihood difference ∆S = S(c,Φ)− S(0, 0) to rank all other models against theircorresponding neutral model. For the cases discussed in this paper, we find the following ∆S values:

landscape, landscape,seascape 〈∆〉 inferred 〈∆〉 from data

syn. phylogeny 12464 6194 4068aa. phylogeny 9810 5452 1970

(35)

In all inference schemes (divergence times inferred from synonymous or amino acid sequence divergence),we find the same ranking: the optimal seascape model is significantly more likely than the optimal landscapemodel, and the neutral model. The landscape model with the diversity 〈∆〉 (or equivalently, trait scale D0)as a fit parameter has a higher likelihood that the landscape model with the diversity estimated from theD. simulans strains. By a log-likelihood test, the score differences ∆S translate into P values as reportedin the main text. Maximum-likelihood values analogous to equation (34) can also be defined for classes ofgenes (Table 1).

22

3. Analysis of alternative evolutionary scenarios

Lineage-specific demography. Demographic effects, such as population bottlenecks, affect the patternsof sequence variation in Drosophila (Aquadro et al., 2001; Glinka et al., 2003; Lachaise et al., 1988; Stephanand Li, 2007; Thornton et al., 2007). Here we examine the effects of strong, long-term demographic hetero-geneities on the divergence and diversity of expression levels. Specifically, we consider changes in effectivepopulation size to a valueNi = λN in a given Drosophila lineage i, which is defined by the terminal branchof species i in the phylogeny and extends over a scaled evolutionary time τi (Fig. 1). A depletion of effectivepopulation size leads to a global relaxation of stabilizing selection on gene expression, given by a reducedstabilizing strength λc in the fitness seascape (equation 11). For each clade C with i ∈ C, we define thepolarized rescaled divergence,

ΩC,i =1

|C \ C1|∑

j∈C\C1

Ωij , (36)

where (C1, C \C1) is the partitioning of clade C defined by its root node and we assume i ∈ C1. The pairwiserescaled divergence Ωij is given by equation (9). Similarly, we define the polarized divergence time,

τC,i =1

|C \ C1|∑

j∈C\C1

τij . (37)

In Fig. S4A,B, we plot polarized data (τC,i,ΩC,i) together with background data (τC ,ΩC) from partial cladesexcluding species i. Under a change of population size in lineage i with τi & Ωstab(λc), the polarized datashould follow a pattern with reduced (λ < 1) or increased (λ > 1) long-term constraint,

Ω(τ, τi) = Ωeq(τ, τi) + Ωad(τ, τi) (38)

=

[1 + G(c)] τ for τ Ωstab(λc)

12(Ωstab(λc) + Ωstab(c)) + 1

2υτ + F(λ, c) for τ τi + Ωstab(c),

where the shift F(λ, c) is generated by the demographic inhomogeneity on intermediate time scales; thispattern is shown in Fig. S4A,B for λ = 1/2 and λ = 3. A similar calculation shows that short-term pop-ulation bottlenecks have a negligible effect on the statistics of trait divergence Ω. We observe no deviationbetween polarized and background Ω data, indicating the absence of strong demographic effects shaping theevolution of expression levels. Equation (38) also shows that demographic effects do not confound the testof selection based on the time-depended divergence Ω for adaptive evolution under directional selection.For time-independent optimal trait value (υ = 0), global relaxation of stabilizing selection increases thedivergence as noted in previous studies (Fraser, 2011; Gilad et al., 2006b; Khaitovich, 2005); however, itdoes not generate the linear increase Ωad(τ) ' υτ/2 characteristic of fitness peak displacements (Fig. S4).

Gene-specific relaxation of stabilizing selection. We can also test for lineage- and gene-specific relax-ation of stabilizing selection on gene expression, which arises, for example, from a partial loss of genefunction. We model the loss dynamics by a stochastic process: with a small rate γ, individual genes switchthe stabilizing strength of their fitness seascape to a reduced value λc (with λ < 1). We choose the model pa-rameters of switch rate γ and stabilizing strength c so as to approximately match the observed Ω(τ) pattern.To discriminate between relaxed stabilizing selection and directional selection, we can use the distribu-tions of clade-specific expression differences ∆EαC , which are defined as averages over pairwise differences

23

∆Eαij = Eαi − Eαj in analogy to equation (10). The observed distributions are of approximately Gaussianform,

PC(∆E) =1√

2πDCexp

[−(∆E)2

2DC

], (39)

as shown by the collapse plot of Fig. S5A. This is in accordance with the minimal seascape model, whichpredicts a Gaussian distribution Pτ (∆E) with variance 〈D(τ)〉. In contrast, stochastic relaxation of stabi-lizing selection generates broad non-Gaussian tails increasing with divergence time τ that are not observedin the data (Fig. S5C, bottom). Furthermore, the loss dynamics generates a nonlinear time-dependent Ω(τ)(Fig. S5C, top), which is not observed in the data (Fig. 2). We conclude that relaxed stabilized selectionalone cannot explain the observed statistics of Drosophila gene expression levels. This does not excludethat relaxation of selection affects some genes in our data set and more broadly genes with complete loss offunction, which are suppressed in the set of conserved orthologs.

Punctuated directional selection. The Ornstein-Uhlenbeck dynamics of fitness peaks in the minimalseascape model (equation 27) describes the accumulation of small but continual changes of optimal ex-pression levels. Larger peak shifts can be caused by discrete ecological events, including major migrationsand speciations, and by gene-specific factors such as neo-functionalization (Lynch and Force, 2000). Herewe model such events by a punctuated fitness seascape (Held et al., 2014): with a small rate υµ/(2r2), indi-vidual genes are subject to fitness peak shifts by an amount of order D0. This stochastic model differs fromprevious models of lineage-specific selection (Bedford and Hartl, 2009; Brawand et al., 2011; Butler andKing, 2004; Hansen, 1997; Hansen et al., 2008; Kalinka et al., 2010; Rohlfs et al., 2014), where fitness peakshifts are constrained to known branch points of the phylogeny. Evolution in a punctuated fitness seascapegenerates rescaled time-dependent divergence Ω(τ) of the form (equation 19); adaptation is signalled bythe same term Ωad(τ) ' υτ/2 as in a minimal seascape of the same driving rate υ (Fig. S5D, top). Todiscriminate between the two models, we use again the distributions PC(∆E) of clade-specific expressiondifferences. In a punctuated seascape, these distributions have broad non-Gaussian tails increasing withdivergence time τC that are not observed in the data (Fig. S5D, bottom). We conclude that large peak shiftsare a subleading factor of expression changes in our data set.

Other modes of adaptation. Further evolutionary modes affecting gene expression include:

(a) Time-dependent stabilizing selection (Held et al., 2014). This type of selection can be modeled by afitness seascape of the form (11) with time-dependent stabilizing strength c(t), given by a generalizedOrnstein-Uhlenbeck process with constraint c(t) > cmin. The recurrent tightening of expression con-straint driven by increases of c(t) is a mode of adaptation that is independent of fitness peak changes.The rescaled divergence Ω does not trace this mode: as long as the expression optimum E∗ is time-independent, the function Ω(τ) reaches an asymptotic value Ωstab(cmin). This pattern is similar to evo-lutionary equilibrium in a single-peak fitness landscape and does not contain the term Ωad(τ) ' υτ/2characteristic of fitness peak displacements.

(b) Adaptive gene turnover, including sub- and neo-functionalization after gene duplication (Lynch andKatju, 2004; Lynch et al., 2001), regulatory sequence duplication (Nourmohammad and Lassig, 2011),and de novo formation of genes (Tautz and Domazet-Loso, 2011). This mode is suppressed in our dataset of conserved orthologous genes, but it is likely to be more prevalent in the complementary set ofDrosophila genes.

24

(c) Adaptation by large-effect loci. Our diffusive evolutionary model assumes that expression levels aredetermined by multiple eQTL, and changes at individual loci have only moderate effects. The evolutionof more general traits with few large-effect loci can be studied in simulations. We find that in diffusivefitness seascapes, mutations at large-effect loci are mostly deleterious. In this case, the population adaptsto the gradual changes of the expression optimum predominantly by fixation of small-effect mutations,while large-effect substitutions are suppressed. In punctuated fitness seascapes, large-effect mutationscan accelerate the adaptive response to large shifts of the fitness peak, but such shifts are a subleadingfactor of expression changes in our data set (see above).

A detailed investigation of these evolutionary modes is beyond the scope of this study. Importantly, however,they do not confound the inference of adaptation under directional selection reported here.

4. Analysis of specific gene classes

Codon usage. The effective number of codons, n, measures the redundancy of the genetic code withina given gene (Wright, 1990). This number takes values between 20 (each amino acid is determined bya specific codon) and 61 (all sense codons are used). Genes with specific codon usage (small n) tend tohave higher expression than genes with broad codon usage (Ikemura, 1985; Shields et al., 1988). Here wecompute the species-averaged effective number of codons, nα = 1

7

∑i n

αi for all genes in our data set. We

find a consistent dependence of expression adaptation on codon usage:

(a) Aggregate analysis by time-dependent divergence Ω signals strongly reduced adaptation for genes withspecific codon usage (n < 42) and an enhanced adaptation for genes with broad codon usage (n > 50),compared to the average over all genes (Fig. 4A and Table 1).

(b) The pattern of expression divergence also signals strongly reduced adaptation for genes with high aver-age expression level, Eα = 1

7

∑iE

αi > 0.9 (Table 1). Additionally, we compare the fitness flux of a

gene to its codon adaptation index (CAI), which measures the similarity between the codon usage in aspecific gene and the codon preference of highly expressed genes (Sharp and Li, 1987). Consistently,we find a reduced amount of fitness flux in genes with high codon adaptation index (CAI & 0.65); thesegenes are likely to be highly expressed.

(c) At the level of individual genes, there is a clear correlation between fitness flux Φα and effective codonnumber nα (Fig. 4B).

Inference of adaptive sequence evolution. For the genes in our data set, we estimate the fractions ofsynonymous and non-synonymous polymorphic nucleotides (Ps and Pn) from the database of Drosophilamelanogaster Genetic Reference Panel (DGRP) (Mackay et al., 2012). The corresponding nucleotide diver-gence measures (Ds and Dn) are obtained from sequence alignments between the D. melanogaster and D.simulans reference genomes (Drosophila 12 Genomes Consortium et al., 2007). The McDonald-Kreitmantest (McDonald and Kreitman, 1991; Smith and Eyre-Walker, 2002) signals adaptive evolution of aminoacids if αseq = (DnPs/DsPn) − 1 > 0. Fig. S6 shows the distribution of αseq values for classes of geneswith different amount of expression adaptation, measured by the fitness flux Φ (equation 32). We find nocorrelation between these statistics. In each class, about 30% of the genes have αseq > 0. This resultdoes not contradict the correlation of gene expression divergence and amino acid divergence reported inref. (Zhang et al., 2007), because an enhanced amino acid substitution rate measured by a Dn/Ds test (Li,1993) may be caused by adaptive changes or by relaxation of negative selection.

25

Analysis of functional gene classes. We use The Ontologizer (Bauer et al., 2008) for statistical analysis offunctional enrichment in our dataset. From a base set of all 6332 genes in our database, we identify enrichedfunctional categories in the query sets of adaptively regulated genes (2NΦα > 4) and genes with sex-specificadaption of expression (2NΦα

mf > 4.5, see below). We use the calculation method Parent-Child-Union withBonferroni correction and resampling steps of 1000. The enriched functional categories in these gene setsare reported in Tables S1 and S2 with a significance threshold P < 0.1 (multiple hypothesis test). We listthree main categories: biological processes, cellular components, and molecular functions. Each functionalcategory is assigned to a functional cluster (in bold letters) that is inferred by REVIGO (Supek et al., 2011),using the semantic similarity measure SimRel with threshold 0.5. This clustering facilitates the interpretationof functional gene classes associated with adaptation of gene expression.

Sex-specific evolution and sex bias of expression. To quantify differences of gene expression betweenmales and females, we we define the sex specificity trait of a given gene as the difference between itsexpression levels in males and in females (Zhang et al., 2007),

Eαmf,i = Eαm,i − Eαf,i. (40)

We analyze these traits by the same methods as the sex-averaged expression levels Eαi defined by equa-tion (5). Specifically, we define the rescaled time-dependent divergence Ωmf,C and the fitness flux 2NΦmf

in analogy to equations (10) and (15), and we infer gene-specific maximum-likelihood values 2NΦαmf in

analogy to equation (11). We define two conceptually distinct measures of male-female differentiation:

(a) Sex-specific adaptation. In accordance with ref. (Zhang et al., 2007), we find that most genes of ourdata set have well-conserved and often small sex specificity; these genes evolve their expression levelscoherently between males and females. We use the rescaled fitness flux 2NΦmf to delineate coherentevolution of expression levels (i.e., conservation of the specificity trait) from sex-specific adaptation(i.e., adaptive changes of the male-female expression difference), as illustrated in Fig. 5A. A set of 1155sex-specific adaptive genes is identified by the condition 2NΦα

mf > 4.5 (Table S2); we use a morestringent threshold than for Φα because the sex-specificity trait statistics has larger statistical errors.

(b) Sex bias. We identify genes with male- and female-biased expression in Drosophila using the resultsof Assis et al. (Assis et al., 2012), which are based on a number of statistical tests in the whole bodyand in gonads of males and females in D. melanogaster and D. pseudoobscura. A gene is classified asexpression sex-biased if flagged by at least three of these tests, which produces a list of 450 male-biasedand 499 female-biased genes. A related measure of bias within our data set is the species-averagedspecificity trait, Eαmf = 1

7

∑iE

αmf,i.

Our analysis establishes a relation between these two measures in our data set: strong sex-specific adaptationof expression occurs in male-biased, but not in female-biased genes. First, the aggregate rescaled divergenceΩmf in male-biased genes show evidence for adaptive evolution with a linear adaptive component υτ/2.Unbiased and female-biased genes have only a small average divergence in their sex-specificity trait that isof the order of the expression diversity (i.e., they within the error range of the observed expression levels),providing no evidence for adaptation (Fig. 5B). Second, the fitness flux Φα

mf is strongly enhanced for geneswith large Eαmf (Fig. 5C). Accordingly, 32% of male-biased genes are classified as sex-specific adaptive.Functional categories associated with sex-specific adaptation of expression are reported in Table S2.

26

5. Simulation tests

In-silico evolution of quantitative traits. We use a Fisher-Wright process for the evolution of popula-tions along the Drosophila phylogeny of Fig. 1. A population consists of N individuals with genomesa(1), . . . ,a(N). A genotype is an `-letter sequence a = (a1, . . . , a`) with alleles ak = 0, 1 (k = 1, . . . , `).It defines an expression level E(a) =

∑`k=1 Ekak with neutral variance D0 = 1

2

∑`k=1 E2

k . We use uniformsingle-locus effects Ei; our results are insensitive to the form of the effect distribution (Held et al., 2014).In each generation, the sequences undergo point mutations with a probability µτ0 per generation, where τ0

is the generation time. The sequences of next generation are then obtained by multinomial sampling with aprobability proportional to [1 + τ0f(E(a), t)], where the fitness function f(E, t) is given by equation (11).Simulations are performed with N = 100, π0 = 0.1 for traits with ` = 100, uniform effects Ei = 1, andaverage fitness optimum E = 70. We use three different types of selection (for details, see ref. (Held et al.,2014)):

(a) Minimal fitness seascape. Before each reproduction step, a new optimal trait value E∗(t+ τ0) is drawnfrom a Gaussian distribution with meanE∗(t)(1−µτ0υ/(2r

2))+E µτ0υ/(2r2) and variance µτ0υD0/2.

(b) Fitness landscape. The optimal trait value E∗ is time-independent (Fig. S4A,B). In the model of gene-specific relaxation of selection (see section 3), the stabilizing strength of individual genes switches to asmaller value, c→ 0.01c, with a small rate γ (Fig. S5C).

(c) Punctuated fitness seascape (see section 3). Before each reproduction step, a new, uncorrelated optimaltrait value is drawn with probability µτ0υ/(2r

2) from a Gaussian distribution with variance r2D0/2,where r2 is a constant of order 1 (Fig. S5D).

Validation of the probabilistic inference scheme. To test the performance of our inference scheme, wegenerate expression values Eα = (Eα1 , . . . , E

α7 ) for individual genes with trait scalesE2

0,α by Fisher-Wrightsimulations along the Drosophila phylogeny of Fig. 1. We use minimal fitness seascapes of the form (11)with input parameters (cin, υin) and a sequence diversity π0 = 4µN = 0.05. We then infer maximum-likelihood posterior values (cα, 2NΦα) by the probabilistic method described in section 2 (equation 32).In Fig. S7A, we plot the distribution of inferred fitness flux values 2NΦα against the input expectationvalue 2NΦin = 2cinυin τDros. (equation 16). The underlying simulations use a range of trait scales E2

0,α =0.25 − 4.0 appropriate for log expression levels; the inference of Φα does not require knowledge of thisscale (see section 2). Fig. S7B shows the corresponding distribution of inferred values cα as a function ofthe input stabilizing strength cin. These simulations use a uniform trait scale E2

0,α = 1 (inferring the actualscales requires sufficiently reliable gene-specific expression diversity data). The posterior values (Φα, cα)are seen to provide reasonable, on average conservative estimates of the input model parameters (cin,Φin).In particular, the inference of a significant fitness flux (Φ > 1/2N) is incompatible with evolution understatic stabilizing selection (υ = 0, c > 0) or near neutrality (c ' 1), independently of the underlying modelfor the adaptive evolution of a molecular trait.

Robustness of the inference to trait epistasis. The analytical theory underlying our inference method (Heldet al., 2014; Nourmohammad et al., 2013b) covers molecular quantitative traits with a linear genotype-phenotype map, E(a) =

∑`k=1 Ekak (see above). Here we extend this method to nonlinear traits of the

form E(a) =∑`

k=1 Ekak +∑

k<k′ Ekk′akak′ ; such nonlinearities are commonly referred to as trait epis-

27

tasis. The strength of epistasis can be defined as the ratio of nonlinear and linear neutral trait variance,ε2 = (

∑k<k′ E2

kk′)/(∑

k E2k ).

Trait epistasis introduces only minor changes to the quantitative genetics theory of refs. (Held et al.,2014; Nourmohammad et al., 2013b). In particular, the quasi-neutral growth of the trait divergence is stillgiven by equation (21), where ∆ is now the total genetic diversity of the trait.

To specifically test our inference method, we perform Fisher-Wright simulations as described above overa wide range of the parameter ε2; individual epistatic effects Ekk′ are drawn from a Gaussian distributionwith mean 0. In an ensemble of 6000 independently evolving traits, we record both the actual average fitnessflux (equation 15) and the inferred fitness flux determined from the aggregate divergence Ω (equation 24).Both quantities show no systematic dependence on ε2 (Fig. S7C), suggesting that our inference of adaptiveevolution is not confounded by trait epistasis.

28

Supplemental ReferencesAquadro, C., Bauer DuMont, V., and Reed, F. (2001). Genome-wide variation in the human and fruitfly: a comparison.

Curr Opin Genet Dev, 11(6):627–634.

Bauer, S., Grossmann, S., Vingron, M., and Robinson, P. (2008). Ontologizer 2.0–a multifunctional tool for GO termenrichment analysis and data exploration. Bioinformatics, 24(14):1650–1651.

Beaulieu, J., Jhwueng, D.-C., Boettiger, C., and O’Meara, B. (2012). Modeling stabilizing selection: expanding theOrnstein-Uhlenbeck model of adaptive evolution. Evolution, 66(8):2369–2383.

Bolstad, B., Irizarry, R., Astrand, M., and Speed, T. (2003). A comparison of normalization methods for high densityoligonucleotide array data based on variance and bias. Bioinformatics, 19(2):185–193.

Butler, M. and King, A. (2004). Phylogenetic comparative analysis: A modeling approach for adaptive evolution.American Naturalist, 164(6):683–695.

Chakraborty, R. and Nei, M. (1982). Genetic Differentiation of Quantitative Characters Between Populations orSpecies .1. Mutation and Random Genetic Drift. Genetical Research, 39(3):303–314.

de Vladar, H. and Barton, N. (2011). The statistical mechanics of a polygenic character under stabilizing selection,mutation and drift. J. R. Soc. Interface, 8(58):720–739.

Gilad, Y., Oshlack, A., and Rifkin, S. (2006a). Natural selection on gene expression. Trends Genet, 22(8):456–461.

Hansen, T., Pienaar, J., and Orzack, S. (2008). A comparative method for studying adaptation to a randomly evolvingenvironment. Evolution, 62(8):1965–1977.

Huber, W., von Heydebreck, A., Sultmann, H., Poustka, A., and Vingron, M. (2002). Variance stabilization applied tomicroarray data calibration and to the quantification of differential expression. Bioinformatics, 18 Suppl 1:S96–104.

Kalinka, A., Varga, K., Gerrard, D., Preibisch, S., Corcoran, D., Jarrells, J., Ohler, U., Bergman, C., and Tomancak, P.(2010). Gene expression divergence recapitulates the developmental hourglass model. Nature, 468(7325):811–814.

Lande, R. (1992). Neutral Theory of Quantitative Genetic Variance in an Island Model with Local Extinction andColonization. Evolution, 46(2):381.

Le Corre, V. and Kremer, A. (2012). The genetic differentiation at quantitative trait loci under local adaptation. Mol.Ecol., 21(7):1548–1566.

Li, W. (1993). Unbiased estimation of the rates of synonymous and nonsynonymous substitution. Journal of MolecularEvolution, 36(1):96–99.

Lynch, M. and Force, A. (2000). The probability of duplicate gene preservation by subfunctionalization. Genetics,154(1):459–473.

Lynch, M. and Katju, V. (2004). The altered evolutionary trajectories of gene duplicates. Trends Genet, 20(11):544–549.

Lynch, M. and Walsh, B. (1998). Genetics and analysis of quantitative traits. Sinauer Associates Inc.

Mackay, T., Richards, S., Stone, E., Barbadilla, A., Ayroles, J., Zhu, D., Casillas, S., Han, Y., Magwire, M., Cridland,J., et al. (2012). The Drosophila melanogaster Genetic Reference Panel. Nature, 482(7384):173–178.

Nourmohammad, A. and Lassig, M. (2011). Formation of regulatory modules by local sequence duplication. PLoSComput. Biol., 7(10):e1002167.

29

Perry, G., Melsted, P., Marioni, J., Wang, Y., Bainer, R., Pickrell, J., Michelini, K., Zehr, S., Yoder, A., Stephens, M.,et al. (2012). Comparative RNA sequencing reveals substantial genetic variation in endangered primates. GenomeRes., 22(4):602–610.

Rohlfs, R., Harrigan, P., and Nielsen, R. (2014). Modeling gene expression evolution with an extended Ornstein-Uhlenbeck process accounting for within-species variation. Mol. Biol. Evol., 31(1):201–211.

Sharp, P. and Li, W. (1987). The codon Adaptation Index–a measure of directional synonymous codon usage bias, andits potential applications. Nucleic Acids Res., 15(3):1281–1295.

Smith, N. and Eyre-Walker, A. (2002). Adaptive protein evolution in Drosophila. Nature, 415(6875):1022–1024.

Spitze, K. (1993). Population structure in Daphnia obtusa: quantitative genetic and allozymic variation. Genetics,135(2):367–374.

Supek, F., Bosnjak, M., Skunca, N., and Smuc, T. (2011). REVIGO summarizes and visualizes long lists of geneontology terms. PLoS ONE, 6(7):e21800.

Tautz, D. and Domazet-Loso, T. (2011). The evolutionary origin of orphan genes. Nature Rev. Genet., 12(10):692–702.

Tsankov, A., Thompson, D., Socha, A., Regev, A., and Rando, O. (2010). The role of nucleosome positioning in theevolution of gene regulation. PLoS Biol., 8(7):e1000414.

Wright, S. (1943). Isolation by distance. Genetics, 28:114–138.

Wright, S. (1950). Genetical structure of populations. Nature, 166(4215):247–249.

30

Adaptive Evolution of Gene Expression in Drosophila · Article Adaptive Evolution of Gene Expression in Drosophila Graphical Abstract Highlights d Adaptive evolution of gene expression

Documents