Top Banner
The Evolution of Multimeric Protein Assemblages Michael Lynch* ,1 1 Department of Biology, Indiana University *Corresponding author: E-mail: [email protected]. Associate editor: Jeffrey Thorne Abstract Although the mechanisms by which complex cellular features evolve constitute one of the great unsolved problems of evolu- tionary biology, it is clear that the emergence of new protein–protein interactions, often accompanied by the diversification of duplicate genes, is involved. Using information on the levels of protein multimerization in major phylogenetic groups as a guide to the patterns that must be explained and relying on results from population-genetic theory to define the relative plausibility of alternative evolutionary pathways, a framework for understanding the evolution of dimers is developed. The resultant theory demonstrates that the likelihoods of alternative pathways for the emergence of protein complexes depend strongly on the effective population size. Nonetheless, it is equally clear that further advancements in this area will require comparative studies on the fitness consequences of alternative monomeric and dimeric proteins. Key words: complex adaptation, dimer, genome evolution, heteromer, molecular evolution, random genetic drift. Research article Although the earliest cells must have been considerably simpler than any of today’s free-living organisms, the mech- anisms by which complex cellular features emerge remain unclear. However, recent insights into the molecular archi- tecture of protein complexes and the population-genetic conditions required for their establishment provide guid- ance as to the range of likely possibilities. Many of the protein complexes that comprise cellular features are as- sembled from subunits derived from the same gene or from loci related via gene duplication rather than from prod- ucts of unrelated genes. This appears to be the case, for example, for the flagellum (Liu and Ochman 2007), centri- oles (Carvalho-Santos et al. 2010), the nuclear pore com- plex (Alber et al. 2007), the spliceosome (Scofield and Lynch 2008), the cytoskeleton (owe and Amos 2009), the pro- teasome (Hughes 1997), chromatin-remodeling complexes (Monahan et al. 2008), ion channels (Dent 2010), nucle- osomes (Malik and Henikoff 2003), the ribosome (Smits et al. 2007), and many other components of prokaryotic and eukaryotic cells. Thus, an essential first step for under- standing the emergence of complex cellular adaptations is the development of a general theory for the evolution of protein–protein interactions. Potential advantages to protein complex formation include increased structural size and diversity, reduced problems of folding single large proteins, and increased op- portunities for allosteric regulation and protein activation (Marianayagam et al. 2004; Hashimoto et al. 2011). On the other hand, proteins with oligomerization potential can also be dangerous, human disorders involving the production of inappropriate protein aggregates (e.g., Alzheimer’s and Parkinson’s disease and amyotrophic lateral sclerosis) being prime examples (Chiti and Dobson 2009; Huntington 2011). Overexpression of genes encoding adhesion-prone proteins often encourages deleterious promiscuous protein–protein interactions (Semple et al. 2008; Vavouri et al. 2009), and such negative selection pressure seems to be reflected in the fact that many highly expressed genes have features that minimize the propensity for self-aggregation ( Tartaglia et al. 2007). This fine line between adaptation and potentially mal- functional protein aggregation raises the possibility that some oligomeric associations may not have arisen initially as de novo adaptations, but as simple compensatory mech- anisms for ameliorating defects incurred by individual sub- units. There is, for example, a significant tendency for the proteins of larger more complex organisms (with smaller effective population sizes) to be more adhesive and hence more likely to engage in promiscuous protein–protein in- teractions (Fern´ andez et al. 2004; Fern´ andez and Lynch 2011). This situation, which results from the accumulation of amino acid changes that reduce the protection of back- bone hydrogen bonds from water attack, may secondarily promote the evolution of multimeric complexes with bet- ter overall wrapping. Relative to the situation in prokary- otes, eukaryotic proteins also exhibit substantial increases in the lengths of interdomain regions ( Wang et al. 2011), which presumably influences the tendency to engage in in- tramolecular versus intermolecular interactions. It is also notable that universal mutation pressure toward A/T nu- cleotides (Hershberg and Petrov 2010; Hildebrand et al. 2010; Lynch 2010a) encourages a bias toward more hy- drophobic residues with A/T-rich codons ( Knight et al. 2001; Bastolla et al. 2004), which might further reduce the abil- ity of species with relatively small population sizes to re- sist the accumulation of mutations that encourage protein adhesion. Background Approximately 65% of proteins in prokaryotes and 55% of those in eukaryotes exist as dimers or higher-order © The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 29(5):1353–1366. 2012 doi:10.1093/molbev/msr300 Advance Access publication December 5, 2011 1353 at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from at Indiana University Bloomington Libraries on October 7, 2013 http://mbe.oxfordjournals.org/ Downloaded from
14

The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

Sep 11, 2018

Download

Documents

dangminh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

The Evolution of Multimeric Protein AssemblagesMichael Lynch*,1

1Department of Biology, Indiana University

*Corresponding author: E-mail: [email protected].

Associate editor: Jeffrey Thorne

AbstractAlthough the mechanisms by which complex cellular features evolve constitute one of the great unsolved problems of evolu-tionary biology, it is clear that the emergence of new protein–protein interactions, often accompanied by the diversificationof duplicate genes, is involved. Using information on the levels of protein multimerization in major phylogenetic groups asa guide to the patterns that must be explained and relying on results from population-genetic theory to define the relativeplausibility of alternative evolutionary pathways, a framework for understanding the evolution of dimers is developed. Theresultant theory demonstrates that the likelihoods of alternative pathways for the emergence of protein complexes dependstrongly on the effective population size. Nonetheless, it is equally clear that further advancements in this area will requirecomparative studies on the fitness consequences of alternative monomeric and dimeric proteins.

Key words: complex adaptation, dimer, genome evolution, heteromer, molecular evolution, random genetic drift.

R esearcharticle

Although the earliest cells must have been considerablysimpler than any of today’s free-living organisms, the mech-anisms by which complex cellular features emerge remainunclear. However, recent insights into the molecular archi-tecture of protein complexes and the population-geneticconditions required for their establishment provide guid-ance as to the range of likely possibilities. Many of theprotein complexes that comprise cellular features are as-sembled from subunits derived from the same gene or fromloci related via gene duplication rather than from prod-ucts of unrelated genes. This appears to be the case, forexample, for the flagellum (Liu and Ochman 2007), centri-oles (Carvalho-Santos et al. 2010), the nuclear pore com-plex (Alber et al. 2007), the spliceosome (Scofield and Lynch2008), the cytoskeleton (Lowe and Amos 2009), the pro-teasome (Hughes 1997), chromatin-remodeling complexes(Monahan et al. 2008), ion channels (Dent 2010), nucle-osomes (Malik and Henikoff 2003), the ribosome (Smitset al. 2007), and many other components of prokaryoticand eukaryotic cells. Thus, an essential first step for under-standing the emergence of complex cellular adaptations isthe development of a general theory for the evolution ofprotein–protein interactions.

Potential advantages to protein complex formationinclude increased structural size and diversity, reducedproblems of folding single large proteins, and increased op-portunities for allosteric regulation and protein activation(Marianayagam et al. 2004; Hashimoto et al. 2011). On theother hand, proteins with oligomerization potential can alsobe dangerous, human disorders involving the productionof inappropriate protein aggregates (e.g., Alzheimer’s andParkinson’s disease and amyotrophic lateral sclerosis) beingprime examples (Chiti and Dobson 2009; Huntington 2011).Overexpression of genes encoding adhesion-prone proteinsoften encourages deleterious promiscuous protein–proteininteractions (Semple et al. 2008; Vavouri et al. 2009), and

such negative selection pressure seems to be reflected in thefact that many highly expressed genes have features thatminimize the propensity for self-aggregation (Tartaglia et al.2007).

This fine line between adaptation and potentially mal-functional protein aggregation raises the possibility thatsome oligomeric associations may not have arisen initiallyas de novo adaptations, but as simple compensatory mech-anisms for ameliorating defects incurred by individual sub-units. There is, for example, a significant tendency for theproteins of larger more complex organisms (with smallereffective population sizes) to be more adhesive and hencemore likely to engage in promiscuous protein–protein in-teractions (Fernandez et al. 2004; Fernandez and Lynch2011). This situation, which results from the accumulationof amino acid changes that reduce the protection of back-bone hydrogen bonds from water attack, may secondarilypromote the evolution of multimeric complexes with bet-ter overall wrapping. Relative to the situation in prokary-otes, eukaryotic proteins also exhibit substantial increasesin the lengths of interdomain regions (Wang et al. 2011),which presumably influences the tendency to engage in in-tramolecular versus intermolecular interactions. It is alsonotable that universal mutation pressure toward A/T nu-cleotides (Hershberg and Petrov 2010; Hildebrand et al.2010; Lynch 2010a) encourages a bias toward more hy-drophobic residues with A/T-rich codons (Knight et al. 2001;Bastolla et al. 2004), which might further reduce the abil-ity of species with relatively small population sizes to re-sist the accumulation of mutations that encourage proteinadhesion.

BackgroundApproximately 65% of proteins in prokaryotes and55% of those in eukaryotes exist as dimers or higher-order

© The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, pleasee-mail: [email protected]

Mol. Biol. Evol. 29(5):1353–1366. 2012 doi:10.1093/molbev/msr300 Advance Access publication December 5, 2011 1353

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

at Indiana University B

loomington L

ibraries on October 7, 2013

http://mbe.oxfordjournals.org/

Dow

nloaded from

Page 2: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

Lynch ∙ doi:10.1093/molbev/msr300 MBE

FIG. 1. Left: Relative frequencies of the three main classes of protein structures deposited in 3D Complex.org v2.0 (Levy et al. 2006, 2008), basedprimarily on taxa for which at least 20 records were available. Sample sizes are 3545, 628, and 4125 for eubacteria, unicellular eukaryotes, andvertebrates, respectively. Right: Frequency distributions of levels of homomeric complexity (including monomers) for six major taxonomic groups,drawn from the same data base. Although the data are more limited, the distributions are similar for heteromers.

complexes (not including very high-order structures suchas the cytoskeleton, ribosomes, etc.) (fig. 1, left). Amongcomplexes, homomers are about 4× more frequent thanheteromers in unicellular species, whereas the two typesare equally frequent in vertebrates. As a consequence, het-eromers constitute ∼10% of proteins in unicellular speciesbut nearly 30% in vertebrates. In all phylogenetic groups,there is also a strong nearly negative exponential frequencydistribution for the numbers of subunits within proteincomplexes (fig. 1, right). Although there are many possibleexplanations for such distributions, such patterns can ariseas natural outcomes of a steady-state process in which sub-units are stochastically gained and lost (Lynch 2007).

These observations are subject to bias, as they are derivedfrom the protein structures that happen to be depositedinto the Protein Data Base (the PDB), which need not be ran-dom. Nevertheless, the fact that very similar distributionsare found for Eubacteria and Archaebacteria as well as forinvertebrates and vertebrates, suggests that the results infigure 1 provide reasonable first-order approximationsof the dispersion of protein-complex sizes in the majordomains of life.

At least three issues are relevant to understanding theconditions under which multimeric structures might evolve.First, colocalization of gene products is an essential start-ing point for the coevolution of protein subunits. The fre-quent subcellular localization of specific mRNAs promotesspatial aggregation (Holt and Bullock 2009), and the innatetendency for proteins to self-aggregate generates additionaloligomerization potential (Ispolatov et al. 2005; Wright et al.2005; Lukatsky et al. 2007; Andre et al. 2008). By physicallytying two loci together, gene fusion provides still anotherpowerful way to facilitate mutual adhesion between two

protein domains, with subsequent gene fission potentiallyleading to the evolution of a heterodimeric relationship(Kuriyan and Eisenberg 2007). Alternatively, loop shorten-ing between two interfacing domains of a monomeric pro-tein can promote homodimer formation when the ancestralbinding contacts can no longer access each other (Bennettet al. 1994).

Second, the establishment of stable protein–protein in-teractions often involves the substitution of just a smallnumber of amino acids or the insertion/deletion of a smallstretch of residues, with individual mutations often havingmildly deleterious effects unless compensated by changes atother locations (Jones and Thornton 1995; Bogan et al. 1998;Janin et al. 1998; Hashimoto et al. 2010). Protein complexesare generally stabilized by hydrophobic interactions and/orhydrogen bonding, and the addition of just a few pairs of ap-propriately spaced residues (such as Arg–Asp, Lys–Glu, orCys–Cys) or the elimination of a few unfavorable contacts(such as Arg–Arg or Glu–Glu) may be sufficient to gener-ate a functional interface. Likewise, the alteration of a fewkey residues can cause a multimer to revert to a monomericstate.

Third, as products of more than one genetic locus, het-eromers are expected to incur elevated mutation rates todefective structures relative to homomers. Heteromers mayalso experience stoichiometric imbalance if the source genesare expressed at different levels, leading to the circulationof potentially harmful monomeric subunits. Thus, if a favor-able homomeric interaction is to be displaced evolutionarilyby a heteromeric structure, the latter must either enjoy a netadvantage or the magnitude of genetic drift must exceedthe net mutational and selective disadvantage (Lynch et al.2001; Lynch 2007). Drawing from these observations, an

1354

Page 3: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

The Evolution of Multimeric Protein Assemblages ∙ doi:10.1093/molbev/msr300 MBE

attempt is made below to identify the population-geneticconditions that most plausibly promote the emergence ofmultimeric protein assemblages. The focus will be on theevolution of dimers, although the approaches taken shouldhave more general utility, as most higher-order complexesare derived from the dimerization of lower-order structures(e.g., tetramers are often dimers of dimers). It is assumedthat a protein starts its evolutionary history as a functionalmonomer, with certain kinds of amino acid substitutions in-ducing structural modifications that cause the monomericsubunit to become more prone to adhesion with other colo-calized proteins with similar features. Reliable aggregationwith a complementary subunit may simply maintain theoriginal level of protein efficiency, increase functionality be-yond that of the initial monomer, and/or yield a multimerwith entirely novel capabilities.

Although these are the general steps in the transforma-tion of a monomer to a dimer, there are several dimensionsto the problem. First, a newly arisen dimer may simply re-sult from the aggregation of monomeric subunits of theoriginal protein (homodimerization) or be composed ofproteins from divergent genes (e.g., heterodimerization in-volving the members of a duplicate-gene pair). Second, theemergence of a dimeric structure may require the accumu-lation of one or more neutral or deleterious intermediatestate mutations. Third, for situations involving gene dupli-cation, the potential exists for one locus to initially evolve aneofunctionalized homodimer, while the original locusmaintains the initial function conferred by the monomericstate, with still another gene duplication allowing the twosubunits of the homodimer to secondarily diverge to a morenonsymmetric (heterodimeric) state (Ispolatov et al. 2005;Pereira-Leal et al. 2007; Reid et al. 2010).

ResultsThe initial focus will be on the origin of homodimers (theaggregation of two protein subunits derived from the samelocus), both because of the high frequency of such aggre-gates and because such a condition provides a likely launch-ing pad for the emergence of heterodimers following geneduplication and subsequent divergence. Three alternativemodels will be considered, with emphasis on the depen-dence of the mean time required for a population-leveltransition from a monomeric to a homodimeric state onpopulation size, mutation rate, and the selective advan-tages/disadvantages of the stepwise mutations contributingto the final adaptation. Most of the derived expressions canbe readily modified to describe the reversion of dimers tomonomers.

The theory, which is supported by computer simulationresults in the supplemental material (Supplementary Ma-terial online), is developed initially in a manner that treatspopulation sizes and mutation rates as independent pa-rameters. However, although such an approach follows pasttradition, empirical evidence implies that mutation ratesare strongly negatively correlated with effective popula-tion sizes across the tree of life, with the single-site rate of

base-substitution mutation per generation being approxi-mated by 0.000025N−0.6 in observed taxa, where N is the ef-fective population size (Lynch 2010c). Thus, to determinethe natural scaling of the time to establishment with effec-tive population sizes and also to reduce the dimensionalityof the analyses, the results in the main text will rely on thisrelationship. Unless otherwise stated, it is assumed that thegenetic effective population size is equal to the actual num-ber of reproductive adults.

The Domain-Swapping ModelA frequently invoked mechanism for the origin of ho-modimers is encapsulated in the domain-swapping model(Bennett et al. 1994), whereby a monomeric protein withtwo interfacing domains is physically altered in such a waythat binding between domains within the same polypep-tide chain is prevented and can only be accomplished by re-ciprocal domain sharing between two monomeric subunits(fig. 2). An attractive feature of this model is the presence ofcomplementary binding domains in the ancestral protein,and convincing examples of domain-swapping proteins ex-ist (Liu and Eisenberg 2002), but the conditions required forsuch an evolutionary transition remain unclear. Moreover, itis plausible that the process is bidirectional, with mutationsin domain-swapping dimers sometimes causing reversionsto the monomeric condition.

Here, we consider the simplest case in which an allele forthe domain-swapping protein arises by a single mutationthat denies self-accessibility within the ancestral monomer(such as a deletion in a loop between the two domains of theancestral protein). If the dimer is beneficial, such a mutantallele can readily proceed to fixation by positive selection ina haploid species. However, in a diploid outcrossing species,the mutant allele will initially be present exclusively in het-erozygotes, where fitness may be compromised by the pro-duction of malfunctional composites of the two alternativemonomeric subunits, for example, chimeras with unbounddomains. Such heterozygote disadvantage will impose a bar-rier to the spread of the mutant allele, as this requires thepopulation to pass through a bottleneck in mean fitness, ahighly unlikely event unless the power of random geneticdrift is substantially greater than the heterozygote disad-vantage. The magnitude of the latter will presumably de-pend on the rate of folding of the ancestral monomer andthe overall cellular concentration of both allelic products,as slow folding and/or high concentration should magnifythe likelihood of encounters between the two alternativeproteins.

In this and all remaining analyses, we will assume anidealized random-mating Wright–Fisher population con-taining N diploid individuals (i.e., discrete generations withconsecutive phases of selection, mutation, and random ge-netic drift), inquiring as to the time that is expected toelapse between a starting point of a population monomor-phic for allele A, which produces a monomer, and a finalstate of monomorphism for domain-swapping allele a. Thistotal time for establishment, denoted as te, can be viewedas the sum of the arrival time for the first mutant a allele

1355

Page 4: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

Lynch ∙ doi:10.1093/molbev/msr300 MBE

FIG. 2. The mean time to establishment of a homodimeric complex for various values of heterozygote disadvantage (δ) and homozygote advantage(s) under the domain-swapping model. The mutation rate per site is assumed to scale negatively with the effective population size, as describedin the text, with the mutation rate to a domain-swapping allele (u) being assumed to equal 10× the single-site base substitution rate. The solidpoints denote the threshold effective population size beyond which the time to establishment of the homodimer becomes exponentially large, asdefined by equation (4). The small discontinuities in the lowest curves result from the use alternative approximations noted in the supplementarymaterial (Supplementary Material online).

destined to fixation and the subsequent time for such anallele to progress from initial frequency 1/(2N) to 1.0.

Letting the fitnesses of genotypes AA, Aa, and aa be 1,1 − δ, and 1 + s, respectively, the probability of fixation ofan allele a exhibiting heterozygote inferiority is given by

φf =erf{[p0 − (0.5/(1 + ω))]

√4θ(1 + ω)} + erf{

√θ/(1 + ω)}

erf{[1 − (0.5/(1 + ω))]√

4θ(1 + ω)} + erf{√

θ/(1 + ω)},

(1)

where p0 is the initial frequency of a, θ = Ns, ω = s/(2δ)(Walsh 1982), and

erf(x) =∫ x

0e−y2

dy,

which is readily solved with numerical approximations inAbramowitz and Stegun (1964). For generations in which amutation arises, the initial frequency of mutant a alleles hasexpectation p0 ' [1/(2N)]+u, where u is the mutation ratefrom A to a, which reduces to ' 1/(2N) when 2Nu � 1,and ' u when 2Nu � 1. With 2Nu new mutations arisingper generation, the average arrival time of the first muta-tion destined to fix is ' 1/(2Nuφf), and the expected timeto complete establishment of the domain-swapped allele isthen

te '1

2Nuφf+ tf, (2)

where tf is the mean fixation time (approximated with theexpression for a recessive mutation presented in the supple-mentary material [Supplementary Material online]). Resultsfrom computer simulations of newly arisen mutations in theclassical Wright–Fisher framework demonstrate that theseformulations are quite accurate (supplementary fig. 1, Sup-plementary Material online).

Of special interest is the population size (N*) above whichthe efficiency of selection is so strong that there is effectivelyno possibility of passing through the fitness bottleneck im-posed by heterozygotes. With heterozygotes having a fitnessreduction of δ, homozygotes an advantage of s, and p beingthe frequency of the domain-swapping allele, mean popu-lation fitness is defined as W = 1 − 2p(1 − p)δ + p2s.The latter reaches a minimum when p = δ/(s + 2δ), withp < p implying net selection against and p > p net se-lection in favor of allele a. Thus, the key issue is whethera mutant allele can drift from initial frequency p0 to p. Asa first-order approximation, when p is small and the fre-quency of aa homozygotes is negligible, allele a acts like adeleterious mutation being removed from the populationat rate δ. Transition to the domain-swapping state thenrequires allele a to drift a distance p against a persistent se-lection gradient of ∼δ, the probability of which is given bythe diffusion approximation

1356

Page 5: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

The Evolution of Multimeric Protein Assemblages ∙ doi:10.1093/molbev/msr300 MBE

φm =e2δ − 1

e4Nδp − 1. (3)

Noting that φm → 0 as 4Nδp becomes large, the populationsize barrier to the establishment of the domain-swappingprotein is then

N* '1

pδ=

s + 2δ

δ2. (4)

Although these results provide a framework for evalu-ating the mean time to establishment of an underdomi-nant mutation for arbitrary magnitudes of drift, mutation,and selection, as noted above, the strong negative correla-tion between mutation rates and effective population sizesacross the tree of life will cause the mean time to establish-ment in natural populations to scale more weakly with pop-ulation size than would be the case with independent N andu. Nevertheless, even when this issue is accounted for, it isstill clear that te increases with population size under thedomain-swapping model (fig. 2).

Under this model, a transition from a monomeric to adimeric state is most plausible under two sets of conditions:1) a haploid population in which heterozygote disadvan-tage is never experienced and 2) a diploid population inwhich selection against heterozygotes is inefficient, eitherbecause the effective population size is small or becausethe reduction in heterozygote fitness is negligible. In bothcases, the mean time to establishment scales with (Nuφf)−1

(preceded by a factor of 0.5 in the case of diploidy). Thus,assuming negligible selection on heterozygotes, if the selec-tive advantage for homozygotes (or the beneficial haploidstate) substantially exceeds the power of drift (Ns � 1),then φf ' 2s, and te scales with (Nus)−1, which implies pro-portionality to N−0.4 when the mutation-rate scaling notedabove is employed. This result even holds when the effectivepopulation size (N) is unequal to the actual population size(Na), as in this case the mutational input is proportional toNau and φf to sN/Na, the product again being Nus. If, on theother hand, the power of drift overwhelms even positive se-lection for homozygotes, φf ' 1/(2N), and te scales simplywith u−1, which is ∝ N0.6.

Notably, these rather different scalings are on a per-generation time scale, and a more meaningful comparatoracross the tree of life ought to involve absolute time units.Because organisms with small N generally have much longergeneration times than those with large N (e.g., generationtimes of land plants and metazoans are typically on the or-der of weeks to years, whereas those for microbes are on theorder of hours to days), the scaling of te with N in absolutetime units must be more negative than that based on gen-erations. The approximate scaling of generation length with∼ N−0.8 suggested in Lynch (2010b) implies a scaling of te inthe above cases between N−1.2 and N−0.2. Thus, if an advan-tageous allele conferring domain-swapping ability is able toproceed to fixation with no significant inhibition from het-erozygote disadvantage, such an architecture is expected toemerge most rapidly in species with large N.

In contrast, if there is significant enough heterozygotedisadvantage that N � (s + 2δ)/δ2, it is virtually

impossible for a domain-swapped allele to proceed to fix-ation. Clearly, a knowledge of the fitness consequences ofmixtures of the products of ancestral and derived allelesis essential to resolving how readily domain swapping canevolve in diploid populations. There appears to be no di-rect evidence on the matter of whether domain-swappingdimers confer greater or lesser fitness than monomers, and ifs = 0, the threshold-barrier to domain-swapping evolutionis just N* = 2/δ.

The theory presented above is entirely general in thata simple change in definitions of terms is all that is re-quired for estimating the reverse transition of homodimerto monomer. In principle, a lineage might wander backand forth between alternative states, with the long-termprobability of being monomeric equaling rdm/(rdm + rmd),where rdm and rmd are the rates of transition from dimer tomonomer and vice versa.

The Compensatory Mutation ModelA second scenario by which homodimers might ariseinvolves two (or more) changes that are individually neutralor deleterious but together alter the monomeric structurein a way that encourages stable dimeric complexation. Incontrast to the situation with the domain-swapping model,here the ancestral monomer is nonadhesive and there-fore not compromised by the presence of derived alleleswithin heterozygous carriers. Consequently, even thoughmore mutations are required to make the transition to ahomodimeric state, the population need not experience abottleneck in mean fitness because recurrently introducedintermediate-step alleles are either neutral or kept rare byselection while also serving as substrate for secondary muta-tions to beneficial final-step alleles, which then enjoy a clearpath to fixation by positive selection.

We start with a two-site model with no recombina-tion, with single-site mutations experiencing a reductionin relative fitness equal to δ, and dimerization of the dou-ble mutant causing a fitness increase of s per gene copy(fig. 3). A diploid random-mating population is again as-sumed, with the genotypic fitnesses being determined bythe additive effects of the two alleles. Single mutations of rel-evance to the final adaptation are assumed to arise at rate uper gene, summed over all relevant sites. Back mutations areignored, as we assume that multiple amino acid alterationscan lead to the relevant (and functionally equivalent) first-step changes so that the forward mutation rate dominatesthe evolutionary process. Mutations with major deleteriouseffects are ignored as well, as these will remove all allelesfrom the population at equal rates. Finally, enough poten-tial sites are assumed to be involved in the initial dimeriza-tion process that the mutation rate can be assumed to bethe same at both steps in the process.

As in the case of the domain-swapping model, the rateof establishment of the homodimer under this model de-pends on the population size (Weissman et al. 2009; Lynch2010b; Lynch and Abegg 2010). Starting with the case ofneutral intermediates (δ = 0), if the population is suffi-ciently small, the evolutionary dynamics will proceed in two

1357

Page 6: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

Lynch ∙ doi:10.1093/molbev/msr300 MBE

FIG. 3. Expected number of generations to establishment of a homodimer requiring two mutational changes under the assumption that themutation rate per site scales negatively with the effective population size, as described in the text, with u being assumed to be equal to 10× theper-site rate. Results are given for three levels of selective disadvantage of the first-step mutants (δ) and four levels of advantage for the second-stepmutants (s), using the expressions presented in the text. Note that the times to establishment at small population sizes are lower with δ > 0 thanwith δ = 0 because first-step alleles are assumed to be present at the frequency defined by selection–mutation balance in the former case but arerequired to arise by new mutations in the latter case. In the lower right, only monomeric subunits with two complementary alterations assembleinto dimers. The small discontinuities in the upper left curves result from the use alternative approximations noted in the supplementary material(Supplementary Material online).

discrete steps, with a first-step mutant becoming fixed bydrift prior to the arrival of a successful second-step mu-tation. The rate of establishment by this sequential pathis equal to the reciprocal of the sum of the expected ar-rival times for the mutations contributing to the adaptation,which in the case of neutral intermediates is

rs ' u/[1 + (1/2Nφb)], (5)

where

φb =1 − e−2s

1 − e−4Ns(6)

is the probability of fixation of a beneficial (second-step)mutation with additive effects, initially in single copy(Kimura 1962). This expression essentially fully describes therate of establishment when N � 1/

√8uφb, but this is a po-

tentially narrow domain as with u = 10−7 and φb = 0.01,N must be < 104.

With larger population sizes, prior to fixation, first-stepalleles can lead to the establishment of second-step mu-tations by a process sometimes called stochastic tunnel-ing or the rescue effect. Some of the basic results havebeen derived in different contexts by Walsh (1995), Lynchet al. (2001), Komarova et al. (2003), and Iwasa et al. (2004).Consider a newly arisen single-site mutation, assumed to beeffectively neutral so that the initial dynamics are governedentirely by random genetic drift, and the probability of be-ing lost is 1 − (2N)−1 unless a secondary mutation to a

beneficial function can propel a sublineage to fixation. Asthe details for this rescue effect have been worked out pre-viously, it is simply stated here that the probability that afirst-step mutation acquires a beneficial (homodimerizing)secondary mutation destined to fixation is approximatelyequal to

√uφb. Noting that 2Nu first-step mutations arise

per generation, the expected rate of appearance of second-site mutations destined to fixation by stochastic tunnelingis then 2Nu

√uφb, with the overall rate being

rt '√

uφb

( 1

2Nu+ 1)−1

, (7a)

where the second term (negligible when 2Nu � 1) ac-counts for the additional waiting time for the second mu-tation (Weissman et al. 2009). For populations large enoughto generate at least one single-site mutation per generation(2Nu > 1), a more appropriate approximation is

rt ' 2u√

Nφb/π, (7b)

(Weissman et al. 2009).In summary, the mean time to establishment of the ho-

modimer via two mutations with a neutral intermediatestate is equal to the sum of two terms: the reciprocal of therate of appearance of double mutants destined to fixationand the time to fixation. The former can be approximated bythe sum of the sequential and stochastic tunneling rates for2Nu < 1, and otherwise by the semideterministic tunneling

1358

Page 7: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

The Evolution of Multimeric Protein Assemblages ∙ doi:10.1093/molbev/msr300 MBE

rate, equation (7b). An expression for the time to fixation isderived in the supplementary material (Supplementary Ma-terial online). Taken together, these expressions provide arelatively simple and accurate description of a fairly com-plex process as illustrated by comparison with computersimulations (supplementary fig. 2, Supplementary Materialonline).

Ignoring the fixation time, for populations large enoughto experience the rescue effect, equation (7a) implies thatprovided 2Nu < 1, te scales with [2Nu(uφb)0.5]−1 forthe two-site model with a neutral intermediate (fig. 3).This expression can be extended to allow for additionalintermediate-step mutations en route to the final adapta-tion by simply substituting nested terms of (uφb)0.5 for eachprevious φb, yielding, for example, [2Nu ∙ u0.5 ∙ (uφb)0.25]−1

for the three-site model. Assuming large enough pop-ulation sizes that φb ' 2s, for two-, three-, four-,and five-step adaptations, this leads to a population sizescaling for te of N−0.10, N0.05, N0.125, and N0.1625, respec-tively, on a per-generation basis, and N−0.90, N−0.75, N−0.675,and N−0.6375 on an absolute time basis. When 2Nu >1, the scaling is altered to [(4Nu2φb/π)0.5]−1 for thetwo-site model, which translates to a scaling of N−0.70 inabsolute time. In this case, for larger numbers of inter-mediate states, we expect the exponent on u to increaseaccordingly, leading to an absolute time scaling for te ofN−0.4, N−0.1, and N0.2, respectively, for three-, four-, and five-site adaptations. Because the mutational input is propor-tional to the actual population size (Na), which clearly scalesmore rapidly than linearly with N for species with large Na

(Lynch 2007; Neher and Shraiman 2011), actual scalings ofte with N are likely to be even more negative than thosegiven above.

This general approach is readily extended to situations inwhich the intermediate states are deleterious. Consideringfirst the two-site model, the rate of establishment by the se-quential pathway becomes

rs ' 2Nu/[(1/φd) + (1/φb)], (8a)

where φd is the rate of fixation of the deleterious interme-diate, obtained by substituting −δ for s in equation (6); andthe rate of establishment by the rescue effect is

rt ' 2Nu2φb/δ. (8b)

This expression follows simply from the fact that prior tothe arrival of a second-step mutation, deleterious first-stepmutations will remain in approximate selection–mutationbalance with frequency u/δ, with each copy having a prob-ability uφb of giving rise to a successful second-step mu-tation. The mean time to establishment can then againbe approximated by adding the reciprocal of the sum ofthe two arrival rates to the fixation time (given in the sec-ond section of the supplementary material [Supplemen-tary Material online]), an approach that yields an excellentfit to simulated data (supplementary fig. 3, SupplementaryMaterial online).

If multiple (d > 2) equally deleterious intermediatestates precede a successful dimer, the rate of stochastic

tunneling is ' 4N(u/δ)d(s/δ) per generation (Weissmanet al. 2009; Lynch and Abegg 2010). Again, ignoring the fi-nal (and usually shorter) phase of fixation, the mean timeto establishment in absolute time units then scales as N−0.6,N0.0, N0.6, and N1.2, respectively, for adaptations involvingone, two, three, and four intermediate deleterious states. Asshown in the supplementary material (Supplementary Ma-terial online), if the spatial clustering of mutations causesthe rate of double mutation to be substantially greater thanu2, as the evidence suggests (Schrider et al. 2011), the rateof adaptation under the deleterious intermediate modelcan be greatly accelerated, although the scaling with N isunaltered.

Finally, it is worth considering the consequences of re-combination, as the preceding results assumed completelinkage between selected sites. Although an intermediatelevel of recombination (∼s/2 between sites for two-siteadaptations) maximizes the rate of establishment, the ef-fect is not great, and recombination has little influence onthe scaling of te with N for neutral intermediates (Lynchand Abegg 2010; Weissman et al. 2010). However, in thecase of deleterious intermediates, when the rate of recom-binational breakdown exceeds the selective advantage, theconsistent return of adaptive alleles to deleterious interme-diate states strongly inhibits the rate of establishment atlarge N.

With the average recombination rate in prokaryotes be-ing on the order of 10−9 per nucleotide site (Lynch 2007),and average lengths of coding regions being on the or-der of 1 kb, the maximum rate of recombination betweensites in the same protein is just 10−6 per generation. Inthis case, the scaling of te with N is expected to be nearlyindependent of the recombination rate unless selection isextremely weak. On the other hand, recombination ratesin unicellular eukaryotes tend to be on the order of 10−7

to 10−6 per site, whereas those in metazoans and landplants generally range from 10−9 to 10−7, yielding morepotential for recombinational interference in the establish-ment of epistatically interacting mutations, particularly forsites contained within different exons in species with longintrons.

Taken together, these results suggest that under the com-pensatory mutation model, transitions to a homodimericstate will generally be either made more rapidly in largepopulations or at approximately equal rates at all popu-lation sizes. The only potential exceptions to this general-ity occur when intermediate states are deleterious and therecombination rate exceeds the final selective advantageor when four or more linked intermediate states must betraversed to achieve the final adaptation (although in thiscase, the time to establishment may be so large as to makesuch a pathway highly unlikely). This negative to weak pos-itive scaling of te with population size under the compen-satory mutation model is quite different than the situationwith domain swapping even though there is an ancestralprecedent for domain interaction in the latter case, andeven though the compensatory mutation route may involvedeleterious intermediate alleles.

1359

Page 8: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

Lynch ∙ doi:10.1093/molbev/msr300 MBE

Gene Duplication ModelsAlthough gene duplication may be the most commonroute to the origin of heterodimers, it may also potentiatethe emergence of homodimers. To fulfill the definition of ahomodimer, the products of a duplicate gene must interactentirely with each other rather than with those of theparalogous gene. In principle, this might be accomplishedif the two paralogs were expressed at different subcellularlocations, different times in development, etc., perhapsmade possible by an initial step of subfunctionalizationresulting from incomplete regulatory-region duplication(Force et al. 1999; Katju and Lynch 2006). We will assumethat following such a duplication event, the pair of geneduplicates suffers one of two fates: 1) loss of one of thecopies via the fixation of a nonfunctionalizing mutationor 2) permanent preservation of both copies, as one ac-quires beneficial mutations to a new function (in this case,embodied in the creation of a homodimer) at the expenseof the ancestral essential function, which is retained bythe other copy. This model does not deny the possibilityof neofunctionalizing mutations that do not involve ho-modimeric construction but the concern here is primarilywith this particular path. The following results rely on anumber of methods developed in Lynch et al. (2001). Geneduplications arise at rate 2ND at the population level, withD being the rate of duplication per gene per generation, andthe initial frequency at a new locus being 1/(2N), with allother “absentee alleles” at the novel locus being effectivelynull. Starting with the case of complete linkage betweenduplicates, a newly arisen gene duplicate is initially destinedto fixation with probability 1/(2N) and to loss with prob-ability 1 − (2N)−1. Conditional on proceeding to fixationand assuming there are no intrinsic advantages to func-tionally redundant duplicates, either one member of thepair will be lost by a nonfunctionalizing mutation (whicharise at rate u0 per locus) or the pair will be preserved by aneofunctionalizing mutation in one copy (the probabilityof which is denoted ρ1). Conditional on the haplotypecarrying the duplicate pair being initially destined to loss,the possibility exists that it will be rescued and propelled tofixation by a neofunctionalizing mutation (the probabilityof which is denoted ρ2). Letting 1 − (2N)−1 ' 1, therate of establishment of the new locus (which in this casespecifically leads to dimerization) can then be expressed as

rd ' D

(ρ1

ρ1 + 2u0+ 2Nρ2

)

. (9)

The preservation probabilities, ρ1 and ρ2, may take onvarious forms depending on the evolutionary path to ho-modimerization, but unless the population size is extremelysmall, the second term will almost always dominate so thatrd ' 2NDρ2. For example, under the assumption that justa single mutation is required for dimerization at the newlocus, and letting u1 be the rate of origin of neofunctional-izing mutations, the rate of fixation of neofunctionalizingmutations conditional on fixation of the duplicate locus isρ1 ' 2Nu1φb; and following the logic outlined above forthe rescue effect, ρ2 '

√u1φb, where φb is the fixation

probability defined by equation (6), with s being the selec-tive advantage of each copy of the dimerizing allele. Thesimplified function, rd ' 2NDρ2, provides a generally goodfit to data acquired by simulations (supplementary fig. 4,Supplementary Material online).

If more than one mutation is required for dimerization,the previous expressions must be modified to allow for raresequences of mutational events that can insure fixation bypositive selection. For example, if a neutral intermediatemutation is required prior to the construction of a dimerby a second mutation, φb in the previous expressions for ρ2

must be replaced by the probability that a first-step muta-tion is rescued by a second-step mutation,

√u2φb, where

u2 is the rate at which first-step alleles acquire second-stepmutations, here assumed to equal u1, leading to ρ2 '√

u1√

u1φb (Weissman et al. 2010). In addition, because ofthe longer time span involved in the acquisition of addi-tional mutations, the probability that duplicate genes are hitwith a nonfunctionalizing mutation prior to procuring oth-erwise adaptive mutations must be considered. These issues,as well as derivations for smaller population sizes, are devel-oped in the supplementary material (Supplementary Mate-rial online), where it is again shown that the overall resultsclosely approximate observations derived from computersimulations (supplementary fig. 4, Supplementary Materialonline).

A key feature of this gene duplication model is that al-though neofunctionalized (dimerizing) a alleles will almostcertainly historically arise by mutation at the ancestral lo-cus prior to gene duplication, because aa homozygotes arelethal due to the lack of the essential ancestral function, al-lele a cannot advance beyond the low frequency expectedunder balancing selection (heterozygote superiority). Tan-dem a–a duplicates that are completely linked also can-not contribute to homodimer origin because the absenceof essential ancestral function in the linked pair would againprevent fixation.

On the other hand, there are two ways in which an A/apolymorphism at the ancestral locus can lead to rapid ne-ofunctionalization if the duplicated gene is unlinked to itsparental copy (Spofford 1969; Lynch et al. 2001). Considerthe case in which just a single mutation is required for ne-ofunctionalization. Provided Ns2 > 4 and u0 < s2, theneofunctionalized allele will be maintained at the ances-tral locus at equilibrium frequency pn ' (s2 − u0)/s bybalancing selection. Details are worked out in Lynch et al.(2001), and the overall consequences for homodimerizationare shown in figure 4, but an especially simple limit for theprobability of fixation of the homodimer can be obtainedwhen u0 � s2, in which case the probability that the du-plicate locus is randomly initiated from a neofunctionalizedallele is pn ' s. As most individuals at the original locuswill be AA, this neofunctionalized allele will experience thefull selective advantage s, and with the probability of fixa-tion at large N being ' 2s (the asymptotic value given byeq. 6), this path yields a probability of establishment of thehomodimerized allele of s ∙ 2s = 2s2. Alternatively, the newlocus will be founded by the normal allele with probability

1360

Page 9: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

The Evolution of Multimeric Protein Assemblages ∙ doi:10.1093/molbev/msr300 MBE

FIG. 4. Scaled probability of preservation by homodimerization of a newly arisen gene duplicate (relative to the neutral expectation of 1/(2N)),derived from simulations of a Wright–Fisher population. The mutation rate for sites involved in the construction of dimerizing alleles follows thenatural negative scaling with effective population size as described in the text, whereas the rate of mutation to defective gene copies is u0 = 100u1.Complete linkage is assumed in all panels, but the upper right, where free recombination with the ancestral gene copy is assumed. Discontinuitiesin some of the curves result from the use alternative approximations for different population size domains.

(1 − s), and in this case a selective advantage is derivedfrom the masking of null homozygotes at the original locus,whose frequency is ' s2, leading to a probability of fixationof 2s2. Should the latter event occur, the ancestral locus willgo to fixation for the already-present homodimerizing allelewith probability close to 1.0, which is already established ata fairly high frequency, so the overall probability of estab-lishment by this path is also ' 2s2. Thus, for large popula-tions, the rate of establishment of a homodimer via unlinkedduplicates,'4NDs2, is essentially independent of the muta-tion rate due to the fact that an ample supply of novel allelesis present at the outset.

Finally, we consider the situation in which the evolutionof a homodimer following duplication requires two muta-tions with the first-step allele being deleterious, the the-ory for which is presented and shown to agree reasonablywell with simulations in the supplementary material (Sup-plementary Material online). In this case, the mean time toestablishment scales inversely with 1/

√δ to 1/δ depend-

ing on whether the population size is large or small (fig. 4),although the overall scaling with population size is notgreatly different from that seen with neutral intermediates.

In summary, three gene duplication models for the originof homodimers have been considered. First, in the simplestcase of a single mutation with additive beneficial effectsbeing required, when duplicates are completely linked te

scales as [2ND(uφb)0.5]−1 in generations, implying a pop-ulation size scaling of N−1.5 on an absolute time basis. If,however, such duplicates are unlinked, neofunctionalizationcan proceed without any mutational input in large popula-tions that maintain neofunctionalizing alleles by selection–mutation balance, and the time to establishment scalesas N−1.0 in generations and N−1.8 in absolute time units.Second, for the two-site model with neutral intermediates,when the duplicates are linked, the absolute time scalingvaries from N−1.9 to N−1.35 for small versus large popula-tion sizes. Finally, for the two-site model with a deleteriousintermediate state, the absolute time scaling for te is notmuch different from the neutral-intermediate case, rang-ing between N−1.6 and N−1.2 for small to large populations.Thus, we again conclude that the evolution of an advanta-geous homodimer is considerably more likely to evolve inpopulations with large effective sizes.

Homodimer to Heterodimer TransitionsAlthough heterodimers can, in principle, arise from promis-cuous interactions among nonorthologous proteins, mostseem to arise from interactions between paralogs arisingfrom gene duplication, which will be the focus here. Ho-modimerization may precede gene duplication, providing anatural launching pad for heterodimerization following par-alog divergence, or gene duplication may occur first, with

1361

Page 10: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

Lynch ∙ doi:10.1093/molbev/msr300 MBE

FIG. 5. Potential paths to the evolution of heterodimers, with the involvement of gene duplication as described in the text. Left: Solid circlesrepresent individual proteins, which together make dimers consisting of A (blue) and/or a (red) subunits. Ancestral heterozygotes produce threetypes of dimers in a 1:2:1 ratio, assuming random assembly. Following gene duplication, the encoded products at each locus diverge to the pointthat self-assembly is avoided, leading to “fixed heterozygosity.” Right: Two cases are shown: on the left and right, respectively, the ancestral stateis a dimer and a monomer. In both cases, duplication and divergence results in a situation where the protein is always assembled as a heterodimer(one subunit from each locus).

complexation of the paralogous products arising secondar-ily. Three general scenarios, not necessarily exclusive, aresketched out below.

Consider first the situation in which a locus encoding ahomodimer harbors two alleles, such that the cross-productdimer created within heterozygotes elevates fitness beyondthat for either of the two pure types produced in homozy-gotes. In this case, the two alleles will have been maintainedin the ancestral (preduplication) population by balancingselection, with frequencies s1/(s1 + s2) and s2/(s1 + s2) forthe a and A alleles under the assumption that the fitnessesof the AA, Aa, and aa genotypes at the ancestral locus are1 − s1, 1, and 1 − s2, respectively, with 0 < s1, s2 < 1.As described above, gene duplication then provides an op-portunity for each locus to fix an alternative allele, in whichcase every member of the population would have the ex-pression pattern found in the ancestral heterozygote (fig. 5).Assuming the products of each locus randomly assemble atthis early stage of duplicate-gene establishment, three typesof dimers would be found within individuals in a 1:2:1 ratio(as is also true for heterozygotes in the ancestral single-locusstate). Following the establishment of this complementingduplication state, subsequent mutational modifications atone or both loci might then lead to a pure heterodimer withthe products of the individual loci no longer self assembling.

The selective advantage of a newly arisen duplicate un-der this model, derived in the supplementary material (Sup-plementary Material online), has a simple form. Providedthe population size is large enough that the power of selec-tion outweighs that of drift, the probability of fixation of theduplicate is

φdup '2s1s2

3(s1 + s2), (10)

which reaches a maximum of s/3 when s1 = s2 = s.The expected time to transition (in generations) from a

homodimer to a heterodimer under this model is then(2NDφdup)−1, which again implies a much shorter time toestablishment in larger populations (∝ N−1.8 in absolutetime units). This particular model is, of course, only rele-vant to diploid species, as haploids cannot harbor ancestralheterozygosity.

The remaining two scenarios by which a heterodimermay evolve are conceptually very similar to the subfunc-tionalization model of gene duplication (Force et al. 1999;Lynch and Force 2000; Lynch et al. 2001), with duplicategenes being reciprocally preserved when each copy loses acomplementary essential subfunction (fig. 5). On the onehand, duplication of a locus already engaged in homod-imerization may lead to a situation in which both loci ac-quire complementary mutations that together discouragehomodimerization (enforcing the exclusive construction ofheterodimers between the distinct monomers producedby the two loci). A mechanism similar to this has beensuggested for the evolution of chaperonins (Ruano-Rubioand Fares 2007). Alternatively, an ancestral locus engagedin monomer production may become duplicated, in thiscase with subfunctionalization (or partial incapacitation) atthe two loci resulting in heterodimer production. In bothcases, the resultant heterodimeric structure may simplyconserve the ancestral gene function, although it is possi-ble that enhanced fitness may result from novel featuresassociated with dimerization, either at the time of estab-lishment or after the arrival of secondary advantageousmutations (Hughes 1994).

A key aspect of these subfunctionalization models isthat if the fitness of individuals with complementary par-alogs is no greater than that for the ancestral single-locusstate, there is essentially no chance of joint preservation ifNu0 > 1, where u0 is the null mutation rate (Lynch andForce 2000; Lynch et al. 2001). There are two reasons for

1362

Page 11: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

The Evolution of Multimeric Protein Assemblages ∙ doi:10.1093/molbev/msr300 MBE

this. First, because the mean time for an initially neutral al-lele to drift to fixation is 4N generations, at sufficiently largepopulation sizes all descendants of such a duplicate locuswill almost certainly be hit with a silencing mutation priorto fixation of the lineage. Second, a protein function thatrequires the products of two loci will have an elevated mu-tation rate to the null state equal to u0 relative to the single-locus case, which acts to maintain the single-locus state bypositive selection. Thus, in contrast to the situation withfixation of adaptive ancestral heterozygosity, the latter twomodels predict that transitions from homodimers to het-erodimers via gene duplication are much more likely in smallthan large populations if the driving force of duplicate genepreservation is complementary degenerative mutation.

DiscussionHeavy on theory and light on data, the previous results areoffered as a starting point for discourse on the evolution ofprotein complexes. Although comparative biology providesa catalog of the historical products of biodiversification,the mechanisms by which complex cellular features evolveare constrained by fundamental principles of populationgenetics. If a proposed path to the acquisition of a specificmolecular state by natural selection can be shown to benear impossible in common population-genetic environ-ments, either the proposed molecular route of evolution iswrong, a constrained set of population-genetic conditionswas involved, or the evolutionary transition to the novelstate was driven by entirely nonadaptive forces. Consider,for example, the evolution of domain-swapping dimers,transitions to which are generally assumed to result fromsingle deletion events. If the disadvantage of chimericproteins in heterozygotes exceeds the power of randomgenetic drift, such changes cannot proceed to fixation indiploid populations, as the opposing gradient of naturalselection would be overwhelming. Thus, the evolutionof domain-swapping dimers appears to require either ahaploid condition, or for diploidy, a situation in which thefeatures of combined allelic products in heterozygotes donot cause deleterious effects of sufficient magnitude tooffset the vagaries of random genetic drift.

For more complex adaptations requiring multiple muta-tions, it is often argued that adaptive evolution essentiallynever occurs via intermediate deleterious states. Sucharguments derive from the assumption that evolutionaryroutes to adaptation involving deleterious early stepsimpose a bottleneck in mean population fitness. However,it is now clear that the rescue effect (stochastic tunneling)by secondary mutations provides a powerful mechanismfor vaulting an adaptive valley while avoiding negativerepercussions at the population level. With sufficientlylow recombination rates, adaptive progress can be madewithout a population ever experiencing a decline in meanfitness because deleterious intermediate-state alleles neverrise to high frequencies. The existence of such evolution-ary pathways to adaptive exploitation raises significantchallenges for laboratory studies that strive to reconstruct

the historical order of events leading to the establishmentof complex adaptations under the assumption that allintermediate steps must have been nondeleterious andprocured in a stepwise fashion (Weinreich et al. 2006; Deanand Thornton 2007; Ortlund et al. 2007). Via the rescueeffect, mutations that are individually deleterious not onlycan become fixed but do so simultaneously with the rescu-ing mutations, leading to episodic evolutionary change. Anumber of recent studies suggest that this sort of evolutionis common with respect to the evolution of gene function(DePristo et al. 2007; Yokoyama et al. 2008; Field and Matz2010; Carroll et al. 2011), and there is no reason to thinkthat dimerizing interfaces cannot arise in a similar manner.

To close the loop between theory and observation on theevolution of protein complexes, substantial comparativework is needed on the features of orthologous proteinswith monomeric versus multimeric forms but otherwiseidentical functions. To date, there appears to be no com-pelling evidence that multimeric proteins outperform theirmonomeric orthologs in other species, and some examplessuggest otherwise. For example, the mismatch-repairsystem, which plays important roles in replication fidelity,DNA repair, and recombination is comprised of monomericproteins in eubacteria but dimers in eukaryotes (Kunkeland Erie 2005; Iyer et al. 2006), yet the repair efficiency ofeukaryotic systems appears to be lower than than thatin prokaryotes (Lynch 2011). As another example, thesliding clamps used in DNA replication are homodimericin eubacteria but homotrimeric in eukaryotes, with bothstructures having very similar overall architecture (Kelmanand O’Donnell 1995), yet replication-fork progression ratesare nearly an order of magnitude faster in prokaryotes(Lynch 2007). In addition, although the ribosome has amuch more complex protein repertoire in eukaryotes thanin prokaryotes (Smith et al. 2008), there is no evidence thattranslation fidelity is elevated in the former, and althoughthe data are limited, a number of observations suggest theopposite (e.g., Loftfield and Vanderjagt 1972; Buchananet al. 1980; Parker 1989; Salas-Marco and Bedwell 2005;Kramer and Farabaugh 2007).

Many more comparative studies will be required to de-termine if shifts to higher-order complexes are typicallyunaccompanied by significant enhancement in functional-ity, but continued evidence of this nature would furthersupport the idea that multimeric proteins often simplyarise as compensatory responses to accumulated defectsin monomers in species experiencing relatively small effec-tive population sizes (Fernandez and Lynch 2011). For thetime-being, the preceding theory at least provides a plau-sible set of mechanistic explanations for the patterns infigure 1. Consistent with the approximately equal appor-tionment of monomers in prokaryotes and eukaryotes(fig. 1), the theory suggests that homodimers can evolveat least as readily in large as in small populations under awide variety of conditions. On the other hand, the data infigure 1 suggest that once established, homodimers maketransitions to heterodimers much more readily in smallthan in large populations. This pattern appears to be most

1363

Page 12: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

Lynch ∙ doi:10.1093/molbev/msr300 MBE

consistent with a scenario in which such structures ariselargely as a consequence of the accumulation of comple-mentary degenerative mutations in duplicated genes.

These observations raise the possibility that, as in the caseof gene structure and genomic architecture (Lynch 2007),variation in the power of random genetic drift among phy-logenetic lineages has contributed significantly to the emer-gence of many of the complex (and often arcane) featuresof eukaryotic cells. Nonadaptive arguments for the originof cellular infrastructure have been made before. For ex-ample, invoking a process called constructive neutral evo-lution, Stoltzfus (1999), Gray et al. (2010), and Lukes et al.(2011) suggested that the establishment of molecular ma-chines such as the ribosome and the spliceosome may haveemerged via the fortuitous, neutral establishment of inter-actions among lower-level protein components, which inturn suppressed the effects of subsequent mutations thatwould have otherwise inactivated the individual parts. Un-der this model, after the establishment of such degenerativemutations, the components of such a complex would thenbe mutually interdependent. In a related exercise, Frank(2007) argued that any evolutionary step that leads to in-creased robustness of a cellular function will also magnifythe likelihood that the underlying component parts will ac-quire mutational defects, again leading to the growth ofcomplexity at the expense of the previously autonomousparts.

Although the latter arguments provide potentially plau-sible paths for the nonadaptive evolution of complex struc-tures, key aspects of the evolutionary dynamics requiredto arrive at the postulated end points remain to be ex-plored. In both cases, scenarios of stepwise fixation appearto have been assumed, which as noted above are unlikelyto be realized except in very small populations (perhaps toosmall to avoid extinction). Also lacking is attention to theproblematical issues of segregation and recombination indiploid populations, which can substantially alter the fatesof pairs of mutations, especially when carried in differentgenes. Most notable, however, is the need to account forthe fact that essentially all added layers of complexity im-pose a mutational burden relative to simpler structures thatcarry out the same function but comprise smaller targetsfor inactivating mutations. This differential vulnerability tomutation, which acts like a weak form of selection againstadded (gratuitous) complexity, may be the key to explain-ing why so many higher-order structures that have evolvedin eukaryotes maintain simpler forms in prokaryotes (Lynch2007).

Supplementary MaterialSupplementary Material is available at Molecular Biologyand Evolution online (http://www.mbe.oxfordjournals.org/).

AcknowledgmentsThis work has been supported by the National Institute ofHealth (R01 GM036827 to M.L.) and W. K. Thomas, National

Science Foundation (EF-0827411 to M.L.), and US Depart-ment of Defense (ONRBAA10-002 to M.L., P. Foster, H. Tang,and S. Finkel).

ReferencesAbramowitz M, Stegun IA. 1964. Handbook of mathematical func-

tions with formulas, graphs, and mathematical tables. Washington(DC): U.S. Government Print Office.

Alber F, Dokudovskaya S, Veenhoff LM, et al. (12 co-authors). 2007.The molecular architecture of the nuclear pore complex. Nature450:695–701.

Andre I, Strauss CE, Kaplan DB, Bradley P, Baker D. 2008. Emergenceof symmetry in homooligomeric biological assemblies. Proc NatlAcad Sci U S A. 105:16148–16152.

Bastolla U, Moya A, Viguera E, van Ham RC. 2004. Genomic determi-nants of protein folding thermodynamics in prokaryotic organ-isms. J Mol Biol. 343:1451–1466.

Bennett MJ, Choe S, Eisenberg D. 1994. Domain swapping: entanglingalliances between proteins. Proc Natl Acad Sci U S A. 91:3127–3131.

Bogan AA, Thorn KS. 1998. Anatomy of hot spots in protein interfaces.J Mol Biol. 280:1–9.

Buchanan JH, Bunn CL, Lappin RI, Stevens A. 1980. Accuracy of in vitroprotein synthesis: translation of polyuridylic acid by cell-free ex-tracts of human fibroblasts. Mech Ageing Dev. 12:339–353.

Carroll SM, Ortlund EA, Thornton JW. 2011. Mechanisms for the evo-lution of a derived function in the ancestral glucocorticoid recep-tor. PLoS Genet. 7:e1002117.

Carvalho-Santos Z, Machado P, Branco P, Tavares-Cadete F,Rodrigues-Martins A, Pereira-Leal JB, Bettencourt-Dias M. 2010.Stepwise evolution of the centriole-assembly pathway. J Cell Sci.123:1414–1426.

Chiti F, Dobson CM. 2009. Amyloid formation by globular proteinsunder native conditions. Nat Chem Biol. 5:15–22.

Dean AM, Thornton JW. 2007. Mechanistic approaches to thestudy of evolution: the functional synthesis. Nat Rev Genet.8:675–688.

Dent JA. 2010. The evolution of pentameric ligand-gated ion channels.Adv Exp Med Biol. 683:11–23.

DePristo MA, Hartl DL, Weinreich DM. 2007. Mutational reversionsduring adaptive protein evolution. Mol Biol Evol. 24:1608–1610.

Fernandez A, Lynch M. 2011. The origins of interactome complexity.Nature 474:502–505.

Fernandez A, Scott R, Berry RS. 2004. The nonconserved wrapping ofconserved protein folds reveals a trend toward increasing connec-tivity in proteomic networks. Proc Natl Acad Sci U S A. 101:2823–2827.

Field SF, Matz MV. 2010. Retracing evolution of red fluorescence inGFP-like proteins from Faviina corals. Mol Biol Evol. 27:225–233.

Force A, Lynch M, Pickett B, Amores A, Yan Y-L, Postlethwait J. 1999.Preservation of duplicate genes by complementary, degenerativemutations. Genetics 151:1531–1545.

Frank SA. 2007. Maladaptation and the paradox of robustness in evo-lution. PLoS One 2:e1021.

Gray MW, Lukes J, Archibald JM, Keeling PJ, Doolittle WF. 2010. Cellbiology—irremediable complexity? Science 330:920–921.

Hashimoto K, Madej T, Bryant SH, Panchenko AR. 2010. Functionalstates of homooligomers: insights from the evolution of glycosyl-transferases. J Mol Biol. 399:196–206.

Hashimoto K, Nishi H. Bryant S, Panchenko AR. 2011. Caught in self-interaction: evolutionary and functional mechanisms of proteinhomooligomerization. Phys Biol. 8:035007.

Hershberg R, Petrov DA. 2010. Evidence that mutation is universallybiased towards AT in bacteria. PLoS Genet. 6.pii:e1001115.

Hildebrand F, Meyer A, Eyre-Walker A. 2010. Evidence of selectionupon genomic GC-content in bacteria. PLoS Genet. 6:e1001107.

1364

Page 13: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

The Evolution of Multimeric Protein Assemblages ∙ doi:10.1093/molbev/msr300 MBE

Hughes AL. 1994. The evolution of functionally novel proteins aftergene duplication. Proc R Soc Lond Ser B. 256:1190–124.

Hughes AL. 1997. Evolution of the proteasome components. Immuno-genetics 46:82–92.

Holt CE, Bullock SL. 2009. Subcellular mRNA localization in animal cellsand why it matters. Science 326:1212–1216.

Huntington JA. 2011. Serpin structure, function and dysfunction. JThromb Haemostat. 9(Suppl 1):26–34.

Ispolatov I, Yuryev A, Mazo I, Maslov S. 2005. Binding properties andevolution of homodimers in protein-protein interaction networks.Nucleic Acids Res. 33:3629–3635.

Iwasa Y, Michor F, Nowak MA. 2004. Stochastic tunnels in evolution-ary dynamics. Genetics 166:1571–1579.

Iyer RR, Pluciennik A, Burdett V, Modrich PL. 2006. DNA mismatchrepair: functions and mechanisms. Chem Rev. 106:302–323.

Janin J, Miller S, Chothia C. 1988. Surface, subunit interfaces and inte-rior of oligomeric proteins. J Mol Biol. 204:155–164.

Jones S, Thornton JM. 1995. Protein-protein interactions: a review ofprotein dimer structures. Prog Biophys Mol Biol. 63:31–65.

Katju V, Lynch M. 2006. On the formation of novel genes by dupli-cation in the Caenorhabditis elegans genome. Mol Biol Evol. 23:1056–1067.

Kelman Z, O’Donnell M. 1995. Structural and functional similaritiesof prokaryotic and eukaryotic DNA polymerase sliding clamps.Nucleic Acids Res. 23:3613–3620.

Kimura M. 1962. On the probability of fixation of mutant genes in pop-ulations. Genetics 47:713–719.

Knight RD, Freeland SJ, Landweber LF. 2001. A simple model based onmutation and selection explains trends in codon and amino-acidusage and GC composition within and across genomes. GenomeBiol. 2:RESEARCH0010.

Komarova NL, Sengupta A, Nowak MA. 2003. Mutation-selection net-works of cancer initiation: tumor suppressor genes and chromo-somal instability. J Theor Biol. 223:433–450.

Kramer EB, Farabaugh PJ. 2007. The frequency of translational mis-reading errors in E. coli is largely determined by tRNA competition.RNA 13:87–96.

Kunkel TA, Erie DA. 2005. DNA mismatch repair. Annu Rev Biochem.74:681–710.

Kuriyan J, Eisenberg D. 2007. The origin of protein interactions andallostery in colocalization. Nature 450:983–990.

Levy ED, Boeri Erba E, Robinson CV, Teichmann SA. 2008. Assemblyreflects evolution of protein complexes. Nature 453:1262–1265.

Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA. 2006. 3D complex:a structural classification of protein complexes. PLoS Comput Biol.2:e155.

Liu R, Ochman H. 2007. Stepwise formation of the bacterial flagellarsystem. Proc Natl Acad Sci U S A. 104:7116–7121.

Liu Y, Eisenberg D. 2002. 3D domain swapping: as domains continueto swap. Protein Sci. 11:1285–1299.

Loftfield RB, Vanderjagt D. 1972. The frequency of errors in proteinbiosynthesis. Biochem J. 128:1353–1356.

Lowe J, Amos LA. 2009. Evolution of cytomotive filaments: the cy-toskeleton from prokaryotes to eukaryotes. Int J Biochem Cell Biol.41:323–329.

Lukatsky DB, Shakhnovich BE, Mintseris J, Shakhnovich EI. 2007. Struc-tural similarity enhances interaction propensity of proteins. J MolBiol. 365:1596–1606.

Lukes J, Archibald JM, Keeling PJ, Doolittle WF, Gray MW. 2011. How aneutral evolutionary ratchet can build cellular complexity. IUBMBLife 63:528–537.

Lynch M. 2007. The origins of genome architecture. Sunderland (MA):Sinauer Assocs., Inc.

Lynch M. 2010a. Rate, molecular spectrum, and consequences of spon-taneous mutations in man. Proc Natl Acad Sci U S A. 107:961–968.

Lynch M. 2010b. Scaling expectations for the time to establishment ofcomplex adaptations. Proc Natl Acad Sci U S A. 107:16577–16582.

Lynch M. 2010c. Evolution of the mutation rate. Trends Genet. 26:345–352.

Lynch M. 2011. The lower bound to the evolution of mutation rates.Gen Biol Evol. 3:1107–1118.

Lynch M, Abegg A. 2010. The rate of establishment of complex adap-tations. Mol Biol Evol. 27:1404–1414.

Lynch M, Force A. 2000. The probability of duplicate-gene preserva-tion by subfunctionalization. Genetics 154:459–473.

Lynch M, O’Hely M, Walsh B, Force A. 2001. The probability of fixationof a newly arisen gene duplicate. Genetics 159:1789–1804.

Malik HS, Henikoff S. 2003. Phylogenomics of the nucleosome. NatStruct Biol. 10:882–891.

Marianayagam NJ, Sunde M, Matthews JM. 2004. The power oftwo: protein dimerization in biology. Trends Biochem Sci. 29:618–625.

Monahan BJ, Villen J, Marguerat S, Bahler J, Gygi SP, Winston F. 2008.Fission yeast SWI/SNF and RSC complexes show compositionaland functional differences from budding yeast. Nat Struct Mol Biol.15:873–880.

Neher RA, Shraiman BI. 2011. Genetic draft and quasi-neutrality inlarge facultatively sexual populations. Genetics 188:975–996.

Ortlund EA, Bridgham JT, Redinbo MR, Thornton JW. 2007. Crystalstructure of an ancient protein: evolution by conformational epis-tasis. Science 317:1544-1548.

Parker J. 1989. Errors and alternatives in reading the universal geneticcode. Microbiol Rev. 53:273–298.

Pereira-Leal JB, Levy ED, Kamp C, Teichmann SA. 2007. Evolutionof protein complexes by duplication of homomeric interactions.Genome Biol. 8:R51.

Reid AJ, Ranea JA, Orengo CA. 2010. Comparative evolutionary analy-sis of protein complexes in E. coli and yeast. BMC Genomics 11:79.

Ruano-Rubio V, Fares MA. 2007. Testing the neutral fixation ofhetero-oligomerism in the archaeal chaperonin CCT. Mol Biol Evol.24:1384–1396.

Salas-Marco J, Bedwell DM. 2005. Discrimination between defectsin elongation fidelity and termination efficiency provides mech-anistic insights into translational readthrough. J Mol Biol. 348:801–815.

Schrider DR, Hourmozdi JN, Hahn MW. 2011. Pervasive mult-inucleotide mutational events in eukaryotes. Curr Biol. 21:1051–1054.

Scofield DG, Lynch M. 2008. Evolutionary diversification of the SMfamily of RNA-associated proteins. Mol Biol Evol. 25:2255–2267.

Semple JI, Vavouri T, Lehner B. 2008. A simple principle concerning therobustness of protein complex activity to changes in gene expres-sion. BMC Syst Biol. 2:1.

Smith TF, Lee JC, Gutell RR, Hartman H. 2008. The origin and evolutionof the ribosome. Biol Direct. 3:16.

Smits P, Smeitink JA, van den Heuvel LP, Huynen MA, Ettema TJ. 2007.Reconstructing the evolution of the mitochondrial ribosomal pro-teome. Nucleic Acids Res. 35:4686–4703.

Spofford JB. 1969. Heterosis and the evolution of duplications. Am Nat.103:407–432.

Stoltzfus A. 1999. On the possibility of constructive neutral evolution.J Mol Evol. 49:169–181.

Tartaglia GG, Pechmann S, Dobson CM, Vendruscolo M. 2007.Life on the edge: a link between gene expression levels andaggregation rates of human proteins. Trends Biochem Sci. 32:204–206.

Vavouri T, Semple JI, Garcia-Verdugo R, Lehner B. 2009. Intrinsic pro-tein disorder and interaction promiscuity are widely associatedwith dosage sensitivity. Cell 138:198–208.

Walsh JB. 1982. Rate of accumulation of reproductive isolation bychromosome rearrangements. Am Nat. 120:510–532.

Walsh JB. 1995. How often do duplicated genes evolve new functions?Genetics 139:421–428.

1365

Page 14: The Evolution of Multimeric Protein Assemblageslynchlab/PDF/Lynch197_new.pdf · The Evolution of Multimeric Protein Assemblages Michael Lynch*,1 1Department of Biology, Indiana University

Lynch ∙ doi:10.1093/molbev/msr300 MBE

Wang M, Kurland CG, Caetano-Anolles G. 2011. Reductive evolutionof proteomes and protein structures. Proc Natl Acad Sci U S A.108:11954–11958.

Weinreich DM, Delaney NF, DePristo MA, Hartl DL. 2006. Darwinianevolution can follow only very few mutational paths to fitter pro-teins. Science 312:111-114.

Weissman DB, Desai MM, Fisher DS, Feldman MW. 2009. The rate atwhich asexual populations cross fitness valleys. Theor Popul Biol75:286–300.

Weissman DB, Feldman MW, Fisher DS. 2010. The rate of fitness-valleycrossing in sexual populations. Genetics186:1389–1410.

Wright CF, Teichmann SA, Clarke J, Dobson CM. 2005. The importanceof sequence diversity in the aggregation and evolution of proteins.Nature 438:878–881.

Yokoyama S, Tada T, Zhang H, Britt L. 2008. Elucidation of phenotypicadaptations: molecular analyses of dim-light vision proteins in ver-tebrates. Proc Natl Acad Sci U S A. 105:13480–13485.

1366