-
Enzyme DesignDOI: 10.1002/anie.201204077
Computational Enzyme DesignGert Kiss, Nihan Çelebi-�lÅ�m, Rocco
Moretti, David Baker, and K. N. Houk*
AngewandteChemie
Keywords:active-site design ·biomolecular catalysis ·non-natural
reactions ·protein engineering ·theozymes
.AngewandteReviews K. N. Houk et al.
5700 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co.
KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725
http://www.angewandte.org
-
1. Introduction
Life depends on protein catalysts that control andaccelerate
reactions involved in metabolism. Over threebillion years of
evolution has led to enzymes that can catalyzechemical
transformations that are too slow to be measuredunder normal
conditions. The most proficient enzymes canaccelerate reactions
with turnover rates that occur as rapidlyas the diffusion of
reactants to the catalyst. Furthermore,enzymes enhance the
catalytic rate of specific reactions andsubstrates. They frequently
depend on cofactors or coen-zymes, and are often sensitive to
environmental conditions(pH value, temperature, and solvent), yet
operate over a largespectrum thereof—making them ideal components
for com-plex and tightly regulated metabolic pathways.
Humans have taken advantage of enzymatic processes foras long as
naturally occurring fermentation has been con-trolled to preserve
foods, to make bread and cheese, or toproduce alcoholic beverages.
However, the word “enzyme”from Ancient Greek “en zýmē” or “in
dough/yeast” wasn�tcoined until 1876, when German physiologist
Wilhelm K�hnechose to simplify references to “elements that are
responsiblefor fermentation processes” by giving them a name.[1] At
thetime, the identity of enzymes was speculative, but EmilFischer
suggested in 1894 a “lock and key” model to explainthe substrate
specificity of enzymes. 32 years later, in 1926,James B. Sumner
purified and crystallized urease, and showedthat enzymes are
proteins in their own right. In 1946, LinusPauling speculated that
enzymes are “closely complementaryin structure to the activated
complex for the reactioncatalyzed”,[2] a remarkable statement
considering that at thetime “no one [had] succeeded in determining
the structure ofany enzyme nor in finding out how the enzyme does
its job”.[3]
The study of structure–function relationships at the atomiclevel
continued to remain elusive for another two decades,until the first
high-resolution crystal structure of an enzymewas solved and the
field of structural biology emerged.[4]
Since then, a surge of active research related to
enzymecatalysis has continued to probe, adjust, and expand
ourunderstanding of these seemingly miraculous “nano-machines”. A
variety of factors have been proposed to
explain the observed rate enhancements, and range
fromnoncovalent transition-state (TS) stabilization
(electrostatic,desolvation, restriction of motion, etc.) to
covalent bonding(low energy barrier hydrogen bonds, formation of
intermedi-ates, metal–ion interactions, etc.). Researchers have
come toembrace Pauling�s hypothesis as a general statement of
whatis responsible for the catalytic power of natural enzymes,
butthe power of preorganization and chemical catalysis is
nowrecognized.[5] The most proficient[6] of these catalysts offer
farmore than an active site that is complementary to the TS;
theyenter into the reaction by altering the TS and thus change
thefree-energy profile from what it is in solution.[7]
This“covalent hypothesis” explains why the vast majority ofenzymes
can achieve TS binding constants that are orders ofmagnitude beyond
what can be expected from noncovalentinteractions.
Catalytic proficiency is formally the binding constant ofthe
complex formed between the enzyme and the transitionstate, and was
defined by Wolfenden as Ktx
�1 = (kcat/KM)/kuncat.
[6] Remarkably, Ktx�1 spans 21 orders of magnitude (108
to 1029m�1)[6,8] for enzymes that have been studied todate,[6,
9–12] with an average Ktx
�1 value of 1016.0�4.0m�1.[13] Thisvalue corresponds to an
average DG value for transition-statebinding of 22 kcalmol�1, but
can range up to 38 kcalmol�1,much higher than a noncovalent TS
binding free energy of15 kcal mol�1.
Recent developments in computational chemistry and biology
havecome together in the “inside-out” approach to enzyme
engineering.Proteins have been designed to catalyze reactions not
previouslyaccelerated in nature. Some of these proteins fold and
act as catalysts,but the success rate is still low. The
achievements and limitations of thecurrent technology are
highlighted and contrasted to other proteinengineering techniques.
On its own, computational “inside-out” designcan lead to the
production of catalytically active and selective proteins,but their
kinetic performances fall short of natural enzymes. Whencombined
with directed evolution, molecular dynamics simulations,and
crowd-sourced structure-prediction approaches,
however,computational designs can be significantly improved in
terms ofbinding, turnover, and thermal stability.
From the Contents
1. Introduction 5701
2. Protein Engineering 5703
3. The Inside-out Approach toComputational Enzyme Design
5708
4. Computational EnzymeDesign—Achievements 5711
5. Challenges in Enzyme Design 5719
6. Summary and Outlook 5721
[*] Dr. G. Kiss, Dr. N. Çelebi-�lÅ�m, Prof. Dr. Dr. K. N.
HoukDepartment of Chemistry and BiochemistryUniversity of
California, Los Angeles607 Charles E. Young Dr. East, Los Angeles
CA, 90095 (USA)E-mail: [email protected]
Dr. R. Moretti, Prof. Dr. D. BakerDepartment of Biochemistry and
Howard Hughes Medical InstituteUniversity of Washington, Seattle,
WA 98195 (USA)
Dr. G. KissCurrent address: Department of Chemistry, Stanford
University,Stanford, CA 94305 (USA)
Dr. N. Çelebi-�lÅ�mCurrent address: Yeditepe University,
Department of ChemicalEngineering, Istanbul (Turkey)
Computational Enzyme DesignAngewandte
Chemie
5701Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH
Verlag GmbH & Co. KGaA, Weinheim
-
Chemists imagine the possibility of designing and synthe-sizing
molecules with the attributes of enzymes (selective,proficient,
“green”, operating in water under ambient con-ditions, nontoxic,
and biodegradable). To do so, at leasta subset of the above factors
has to be considered, dependingon the target reaction. Furthermore,
it can be desirable to tryand unite catalytic turnover with
substrate-, stereo-, regio-, orchemoselectivity as well as a
tolerance towards organicsolvents, elevated temperatures, and
chemical degradation.Many different approaches of this type have
been reviewedextensively: these include bioinformatics
approaches,[14,15]
natural evolution based engineering,[16] host–guest and
supra-molecular chemistry,[13] directed evolution,[17–21] catalytic
anti-bodies,[22–24] organocatalysis,[25] rational structure-based
pro-tein engineering,[26] and computational protein design.[27]
In this Review we describe the computational “inside-out”
approach to enzyme design, and the beginnings of whatwe strive to
develop into a robust technology to makecatalysts for synthesis,
biotechnology, and therapeutics. Theidea behind our computational
design strategy is to utilizebiochemical building blocks (amino
acids, cofactors, co-
enzymes, etc.) to produce catalysts for nonbiological process-es
that can be made by microbiological techniques. The recentsurge in
computational power has spurred an increase in thedevelopment and
testing of improved structure predictionand conformational search
algorithms. Quantum mechanicalmethods lead to predictions of the
arrangements of functionalgroups that maximize the binding and
stabilization of thetransition states of the desired reaction. If a
protein can bedesigned that will fold into the necessary 3D
geometry,catalytic conversion of non-natural chemicals into
product(s)should be possible. To avoid having to predict the
stability ofnew sequences from scratch, we incorporate the
designedactive site into stable protein folds. Furthermore, we
adapt asmuch of the active-site components from natural precedent
asis possible for the non-natural reaction or substrate.
We discuss the “inside-out” protocol, highlight approx-imations
and bottlenecks, explore examples of successfuldesign, examine
achievements and challenges, and presentcases in which variations
and additions to the original designprotocol were beneficial. We
conclude that molecular dynam-ics (MD) simulations, post-design
directed evolution, and
Gert Kiss studied chemistry in Heidelberg,Germany, carried out
research in the lab ofNoah W. Allen at UNCA, and then went onto
pursue a PhD with K. N. Houk at UCLA.As a graduate student he was
an NIH-CBIfellow, an LLNL Lawrence Scholar, anda recipient of the
Stauffer Research Award.He is currently an NIH Simbios
postdoctoralfellow with Vijay S. Pande at Stanford.
Nihan Çelebi-�lÅ�m received a BS inchemistry at BoğaziÅi
University, Turkey, andan MS in computational and
theoreticalchemistry at Universit� Henri Poincar� inNancy, France.
She received her PhD withViktorya Aviyente at BoğaziÅi University
andthen carried out postdoctoral research withKendall Houk at UCLA.
Recently she joinedthe faculty at Yeditepe University, Turkey.
Rocco Moretti received his BS in biochem-istry at Worcester
Polytechnic Institute, andhis PhD in biochemistry at the University
ofWisconsin with Aseem Ansari and Jon Thor-son. He is currently a
senior fellow in theresearch group of David Baker at the
Uni-versity of Washington.
David Baker received his PhD with RandySchekman at UC Berkeley
and carried outpostdoctoral studies in biophysics with DavidAgard
at UCSF. He is currently a Professorof Biochemistry at the
University of Wash-ington and an Investigator at the HowardHughes
Medical Institute. He is a memberof the National Academy of
Sciences andthe American Academy of Sciences.
K. N. Houk received his PhD with R. B.Woodward at Harvard, and
then taught atLouisiana State University and the Univer-sity of
Pittsburgh before joining UCLA in1986. He is now the Saul Winstein
Chair inOrganic Chemistry. He was Director of theChemistry Division
of the National ScienceFoundation from 1988–1990. He isa member of
the National Academy ofSciences, the American Academy of Arts
andSciences, and the International Academy ofQuantum Molecular
Sciences.
.AngewandteReviews
K. N. Houk et al.
5702 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co.
KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725
http://www.angewandte.org
-
crowd-sourced redesign can lead to improved efficiencies.
Webegin by discussing some of the protein engineeringapproaches
that have paved the way.
2. Protein Engineering
2.1. Catalytic Antibodies
Following the pioneering work of the research groups ofLerner
and Schultz in the mid-1980s,[28, 29] catalytic antibodieshave been
produced for a wide range of chemical trans-formations.[24, 30] The
concept is based on Pauling�s hypothesisthat enzymes provide an
environment complementary instructure and electronic distribution
to that of the rate-limiting TS.[2, 31] When challenged with a
hapten that resem-bles the key TS characteristics for a given
reaction, antibodiesare produced that can bind the hapten and thus
also the TS itmimics. Transition-state binding equates to a lowered
reactionbarrier and thus to an increased turnover rate compared
tothe uncatalyzed reaction in solution.[32] The production
ofcatalytic antibodies takes advantage of the rapid rates
ofmutation and selection against a specific antigen that is a
keycharacteristic of adaptive immune responses. The
resultingbinding interactions are specific and can be harnessed
tocatalyze non-natural reactions, and also to promote theconversion
of non-natural substrates. Quantum mechanicalcomputations are
useful for the design of transition-stateanalogues (TSAs) that can
serve as haptens for a givenreaction.[33] Janda and co-workers
reviewed strategies andchallenges in the development of new
haptens.[24]
Among the reactions that have been catalyzed by anti-bodies are
Diels–Alder cycloadditions, acyl transfer reactions,oxy-Cope
rearrangements, and cyclizations. Catalytic profi-ciencies range
from Ktx
�1 = 104.6m�1 to Ktx�1 = 108.6m�1, and so
the transition states of the reactions they catalyze are
boundmore strongly than the substrates (KM
�1 = 103.5�1.0m�1).[13,30]
Nature�s enzymes, on the other hand, exert massiveKtx
�1 values, with an average range of 1012 to 1020m�1.Exceptional
cases, such as ODCase and alkyl sulfatases,display Ktx
�1 values of 1024m�1 and 1029m�1, respectively.[8,34]
Naturally, the catalytic efficiency (kcat/KM) of such enzymes
isoften limited only by the diffusion rate of the substrate
andranges from 104 to 109m�1 s�1. In comparison, catalytic
anti-bodies fall short of this limit by 4 orders of magnitude or
more(kcat/KM = 10
2–105m�1 s�1).[13,24, 34] This has been attributed tovarious
factors, including product inhibition,[35–37] lower bind-ing
constants,[13] lack of covalent binding and catalysis,[7]
smaller buried surface area,[13] differences in timescales
ofevolution,[22] and inadequacies of the immunoglobulin
fold.[22]
The comparatively low stability of the immunoglobulin foldand
high cost of producing antibody catalysts further limittheir
applications in industrial settings. Nonetheless, there
isincreasing interest in their potential for therapeutic
applica-tions, such as neutralizing HIV-1,[38, 39]
antibody-directedenzyme prodrug therapy (ADEPT), and the
inactivation ofaddictive substances through the antibody-mediated
break-down of drug molecules.[40]
While catalytic antibodies often suffer from productinhibition
and can generally not be programmed for elaboratearrays of
catalytic functionality, they have provided research-ers with an
important toolkit for the study of biocatalyticprocesses. In the
following subsections, we highlight examplesand discuss the role of
computations in designing andunderstanding antibody catalysis.
2.1.1. Diels–Alder Reaction
The first catalytic antibody for a Diels–Alder reaction(1E9) was
reported by Auditor and co-workers in 1989.[41] Itcatalyzes the
cycloaddition between tetrachlorothiopheneand N-ethyl maleimide
with a rate enhancement (kcat/kuncat) of1000m (Scheme 1).[42] The
crystal structure reveals a mostly
hydrophobic binding pocket with a single polar residue(AsnH85).
Chen et al. studied 1E9 through a combination ofQM calculations,
docking studies, molecular dynamics simu-lations, and a linear
interaction energy approach.[43] Theactive site of 1E9 offers high
shape complementarity to the TSgeometry and offers electrostatic
interactions that favor theTS over the reactants.
The Diels–Alder cycloaddition provides the opportunityto aim
beyond catalytic rate accelerations and towardsachieving
stereochemical control. Gouverneur et al. demon-strated this in a
spectacular way: The cycloaddition
betweentrans-1-N-acylamino-1,3-butadiene and
N,N-dimethylacryl-amide affords a mixture of endo and exo
stereoisomersunder thermal conditions. QM calculations were used
toelucidate the characteristics of stereoisomeric transitionstates
and to design transition-state analogues for the endoand exo
pathways.[44]
Antibody 13G5 catalyzes the disfavored exo-Diels–Alderreaction
between methyl N-butadienyl carbamate and N,N-dimethylacrylamide
(kcat = 1.20 � 10
�3 min�1, kcat/kuncat =6.9m), and yields a single enantiomer in
high enantiomericexcess (Scheme 2).[45] The crystal structure and
QM calcu-lations showed that AspH50 and TyrL36 account for most of
thecatalytic effect of 13G5, while AsnL91 better stabilizes the
Scheme 1. Antibody-catalyzed cycloaddition of
tetrachlorothiopheneand N-ethyl maleimide.
Computational Enzyme DesignAngewandte
Chemie
5703Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH
Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org
http://www.angewandte.org
-
ground state, and slightly retards the reaction.[45, 46]
Thisfinding suggests that AsnL91 provides a structural frameworkfor
the antibody to orient the substrates rather than havinga catalytic
effect.
The absolute configuration of the product was
determinedexperimentally to be exo-(3S,4S). The specificity of
thereaction was initially explored by docking the transitionstate
into the crystal structure of antibody 13G5.[45] MDrelaxation of
the antibody around the frozen TS revealed thatthe catalytic base
(AspH50) can be coordinated by one andthree water molecule(s) in
the presence of the exo-(3S,4S) andexo-(3R,4R) TS,
respectively,[46] and the interaction of thecatalytic AspH50 with
the carbamate NH group is significantlyweakened in the exo-(3R,4R)
pathway.
Antibody 10F11 catalyzes a retro-Diels–Alder reactionthat
liberates HNO with a kcat/kuncat value of 2500(Scheme 3).[47]
Inspection of the crystal structure of 10F11
suggested good shape complementarity with the TS, andidentified
specific active-site residues (Trp, Phe, Ser) thatwere proposed to
contribute to catalysis.[48] Density functionaltheory (DFT)
calculations were employed on models of theactive site to study the
interactions that can stabilize thetransition state.[49]
Kim et al. reviewed and compared the noncovalentcatalysis of
Diels–Alder reactions by cyclodextrins, self-assembling capsules,
antibodies, and RNAses, and concluded
that—unlike enzyme catalysts—none of these hosts
providesubstantial specific binding of the transition
states.[50]
2.1.2. Kemp Elimination
The first catalytic antibody to catalyze the ring opening
of5-nitrobenzisoxazole (Kemp elimination) was reported byHilvert
and co-workers in 1996 (Scheme 4).[51] 34E4 displays
a kcat/KM value of 5.5 � 103m�1 s�1 and a kcat/kuncat value of
2.1 �
104 compared to the uncatalyzed reaction (kuncat = 3.1 �10�5
s�1), and a (kcat/KM)/kOAc� value of 3.4 � 10
8 comparedto the rate of the acetate-promoted reaction in water
(kOAc�=1.6 � 10�5m�1 s�1). Both experimental and
computationalinvestigations demonstrated that, similar to other
proton-transfer reactions, the Kemp elimination is sensitive to
thegeometry in which the substrate and base are aligned.[52,53]
Furthermore, when a carboxylate functions as the catalyticbase,
Kemp elimination reactions are also highly sensitive tothe polarity
of the solvent.[54, 55] These features in combinationwith the
simplicity of the reaction and the ease with whichprogress can be
monitored (UV/Vis), has made the Kempelimination a frequently
studied model for base-catalyzedbiochemical transformations.
2.1.3. Aldol/Retro-Aldol Reaction
QM calculations on aldol reactions date back to the1980s,[56]
and offer early geometric descriptions of thetransition state.
Aldolase antibodies ab38C2, ab84G3, andab33F12 were later raised
against TS-analogous haptens andcan catalyze aldol and retro-aldol
reactions with activitiescomparable to natural aldolases, but with
a broader substratescope (Scheme 5).[57–61] These antibodies
resemble class Ialdolases that utilize the e-amino group of an
active sitelysine residue to form a Schiff base with the substrate.
Polarresidues outline the otherwise hydrophobic active site
atdistances that range between 5 and 7 � from the e-aminogroup of
LysH93. In a QM study, Arn� and Domingoinvestigated the role of
some of these residues as potentialgeneral acid catalysts in the
C�C bond-formation step.[62]
2.1.4. Decarboxylation-Catalyzed Ring-Opening Reaction
The rate of the decarboxylation reaction of
5-nitro-3-carboxybenzisoxazole varies by up to eight orders of
magni-tude depending on the solvent polarity.[54, 55] Aprotic
polarsolvents promote the reaction by desolvating the
carboxylate
Scheme 2. Antibody-catalyzed disfavored exo-Diels–Alder reaction
ofmethyl N-butadienyl carbamate and N,N-dimethylacrylamide.
Scheme 3. Antibody-catalyzed retro-Diels–Alder reaction.
Scheme 4. Antibody-catalyzed ring opening of
5-nitrobenzisoxazole.
.AngewandteReviews
K. N. Houk et al.
5704 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co.
KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725
http://www.angewandte.org
-
reactant and by stabilizing the transition state
throughdispersion interactions.[63–66] Antibody 21D8 catalyzes
thedecarboxylation of 5-nitro-3-carboxybenzisoxazole[67] by upto
61000-fold over the background reaction in water(Scheme 6).[68]
QM, MD, and free-energy perturbation (FEP) calcula-tions were
performed to explore the origins of catalysisfurther,[69] and
showed that partial solvation of the carboxyl-ate group was
detrimental to catalysis, but this was counteredby favorable
hydrogen-bonding interactions with the isoxa-zole oxygen atom.
2.1.5. Cyclization of trans-Epoxy Alcohols
Four different antibodies were raised with two differenthaptens
to catalyze the disfavored endo-tet cyclization
reaction of trans-epoxy alcohols (Scheme 7).[70, 71]
Quantummechanical calculations show that the endo-tet transition
statehas SN1 character and can be stabilized electrostatically bya
carboxylate.[72] The hypothesis was tested with QM modelsthat
consisted of motifs using concurrent general
acid/basecatalysis.[73, 74] Analogous calculations for 6-exo and
7-endocyclizations predicted the preferential formation of the
seven-membered product. This was verified experimentally,[75] and
isalso in line with the X-ray structure eventually obtained
forantibody Fab 5C8.[71] The active site contains an
AspH95–HisL89
dyad that appears to be poised for acid/base catalysis.
2.1.6. Hydrolysis of Aromatic Amides and Esters
Antibody 43C9 catalyzes the hydrolysis of aromaticamides and
esters with an unusually efficient kcat/kuncat valueof 2.5 �
105.[30, 76–78] Getzoff and co-workers built a computa-tional
homology model of the antibody�s variable region andproposed that
ArgL96 functions as the oxyanion hole, whileHisL91 is the catalytic
nucleophile.[79] Subsequently, the X-raystructure of 43C9 was
solved and supported the predictions,showing that a water-mediated
hydrogen-bonding network inthe active site is important for
catalysis.[80] Kollman and co-workers later performed QM
calculations, MD simulations,and free-energy calculations on 43C9
and proposed a directhydroxide attack as an alternative to the
mechanism involvingnucleophilic catalysis by HisL91.[81]
2.1.7. Chorismate–Prephenate Rearrangement
The Claisen rearrangement of chorismate to prephenate
iscatalyzed by natural chorismate mutase enzymes,[82,83] and bythe
catalytic antibodies IF7 and IIF1-2E11.[84, 85] Wiest andHouk
employed QM calculations to mimic the active site ofchorismate
mutase and proposed that specific hydrogen-bonddonors were
responsible for the approximately 200-fold rateacceleration
displayed by catalytic antibody IF7.[86]
2.2. Directed Evolution
Directed or laboratory evolution has become one of themore
mature forms of protein engineering and has found itsway into
modern industrial-scale applications.[87] It is a power-
Scheme 5. Examples of antibody-catalyzed aldol and retro-aldol
reac-tions.
Scheme 6. Antibody-catalyzed decarboxylation reaction of
5-nitro-3-carboxybenzisoxazole.
Scheme 7. Antibody-catalyzed disfavored endo-tet cyclization
reactionsof trans-epoxy alcohols. nd = not determined.
Computational Enzyme DesignAngewandte
Chemie
5705Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH
Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org
http://www.angewandte.org
-
ful and commonly used approach to enzyme engineering thatrelies
on iterative cycles of mutagenesis and selection.[17,18]
Examples of its application include improved
themostabil-ity,[88] tolerance to organic solvents,[89]
strengthened protein–protein interactions,[90] altered substrate
promiscuity/specific-ity,[91, 92] enhanced enzymatic activity,[93]
and inversion ofenantioselectivity.[94, 95] In the directed
evolution of catalyticfunction, a starting gene is mutagenized to
create a library ofvariants, which is screened for enzymes with an
improvementof the sought-after property (stability, substrate
specificity,activity, etc.). Typically the improvements in any one
roundare small, and the process is repeated many times.
Strategies for the construction of libraries include
randomwhole-gene error-prone PCR or random mutagenesis (Fig-ure
1a), site saturation or targeted mutagenesis (CASTing,
ISM)[95] (Figure 1b), and the generation of chimeras
throughsequence recombination (Figure 1c).[19] A key strength
ofrandom mutagenesis is that no structural or
mechanisticinformation about the enzyme is required and that
beneficialmutations can be uncovered at unexpected positions
distantfrom the active site.[96]
Site saturation or targeted strategies, on the other hand,focus
on certain areas of an enzyme (i.e. the active site) andrequire
prior structural or biochemical knowledge about theprotein.
Reducing the randomizable sequence space increasesthe probability
with which multiple beneficial mutations canbe uncovered within the
active site.[97] The approach is ofvalue when dramatic alterations
to an enzyme�s function aresought or when improved function depends
on a combinationof active-site variations.
Beneficial mutations within a library can be identified,
forexample, through statistical analysis of protein
sequence–activity relationships (ProSAR),[98] then combined and
incor-porated by gene shuffling.[99] Molecular and
functionaldiversity can be further expanded with neutral drift
libraries,in which mutations are accumulated that are orthogonal
tothe function and stability of the enzyme.[100,101]
A key challenge for directed evolution is the identificationof
individual variants that display the desired improvementsout of a
large set of randomized protein sequences.[20]
Selection-based in vitro techniques, such as mRNA
display[102]
and emulsion-based microfluidic FADS (fluorescence-acti-vated
droplet sorter),[103] exhibit substantial
throughput.Screening-based techniques that measure substrate or
prod-uct concentrations are the most versatile, but are also
morelimited in their throughput.[104] Once a genotype–phenotypelink
is established, directed evolution can work with allbiologically
produced proteins, including those that containnon-natural amino
acids or non-natural prosthetic modifica-tions.[105]
Some experiments have involved completely naive start-ing
points, but directed evolution works best for enzymes thatdisplay
some level of activity towards the desired reaction ortowards a
highly similar one.[19] While the success of directedevolution
programs depends on a clear, uphill path from thestarting point to
a highly active variant,[105] most proteinsequences do not display
the desired initial activity. Thischallenge can be overcome
somewhat through neutral driftlibraries and by gradually changing
the selective pressurefrom the existing function to the desired
one.[20, 21]
Many attempts have been made to engineer and redesignproteins
and enzymes over the past few decades. Those thatmet with success,
employed variations of directed evolutionranging from random
mutagenesis to semirational or focusedlibrary-generating strategies
and sophisticated statisticalselection such as ProSAR in specific
cases. Two recentexamples are the asymmetric synthesis of chiral
amines forthe industrial production of the type-2 diabetes drug
sitaglip-tin (Januvia)[106] and the oxidative desymmetrization of
theprochiral amine for the production of the hepatitis C
drugBoceprevir.[107] In other cases, computational
approachesresulted in significant advances in understanding the
mech-anism by which directed evolution can change the
enantio-selectivity of an enzyme.[108, 109] In the past 5 years
alone, over60 articles were published that reported on enhancing
thethermostability, substrate and cofactor specificity,
enantiose-lectivity, and reaction rate of natural enzymes. Many of
thesewere engineered for applications in asymmetric
organicsynthesis, and include transaminases, enoate
reductases,esterases, monoamine oxidases, dehalogenases, and
aldolases,as well as cytochrome P450s (oxidations and
epoxidations)and Baeyer–Villiger monooxygenases. The topic has
recentlybeen the subject of several excellent
reviews.[87,110–113]
2.3. Natural Evolution and Enzyme Redesign
Nature has experimented with ways to generate newcatalytic
functions for billions of years. The study of thesestrategies can
provide us with insights that can be used tomake educated mutations
to native active sites with the goalof eliciting new functions.
Enzymes that belong to mechanis-tically diverse superfamilies are
valuable starting points forsuch redesign efforts, particularly
when they are structurallyconserved among one another. Members of
such superfami-lies are frequently also promiscuous and one enzyme
oftencatalyzes a number of chemical transformations, albeit atmuch
lower rates than its physiological reaction.[16, 114]
One successful redesign approach has been to enhancepromiscuous
functions based on sequence and structure
Figure 1. Three strategies for creating protein libraries by
directedevolution. a) Random mutagenesis across the full sequence.
b) Tar-geted mutagenesis that is focused on a specific site. c)
Proteinsequence recombination for the replacement of entire
segments.Reprinted from Ref. [19], with permission.
.AngewandteReviews
K. N. Houk et al.
5706 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co.
KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725
http://www.angewandte.org
-
alignments. Here, the redesign is based on a template enzymewith
innate activity for the target reaction. Information ona naturally
existing enzyme that is known to promote thetarget reaction is then
applied for the redesign of the templateenzyme. Fersht and
co-workers, for example, compared thesequence of
N-acetylneuraminate lyase (NAL) to that of thehomologous
dihydrodipicolate synthase (DHDPS) and iden-tified a Leu-Arg
mismatch in the active site.[115] TheLeu142Arg mutant was made
(along with a number ofstabilizing mutations for the new Arg). This
switched theactivity of NAL from its native retro-aldol cleavage
(N-acetylneuraminate to pyruvate and N-acetyl-d-mannos-amine) to
that of DHDPS (condensation of pyruvate
withl-aspartate-b-semialdehyde). The native retro-aldol activityof
NAL was abolished, while the rate of the innate DHDPS ofthe NAL was
increased eightfold. Similarly, a Leu to Argmutation switched the
physiological activity of 4-oxalocrot-onate tautomerase (4-OT) to
that of trans-3-chloroacylatedehalogenase (CaaD).[116] Structural
studies revealed onlyminor geometric changes. the kcat value of the
CaaD activityof the 4-OT was increased 9-fold; kcat/KM increased
50-fold. Asomewhat more ambitious study introduced four
mutationsinto the active site of keto-l-gulonate 6-phosphate
decarbox-ylase (KGPDC) to increase the rate of its
promiscuousactivity for the d-arabinose-hex-3-ulose 6-phosphate
synthase(HPS) 170-fold.[117, 118]
Although similar to the above, the redesign of an enzymetowards
a reaction for which it does not possess anypromiscuous activity is
a grander challenge. Sequence andstructure alignments with members
of the same superfamilyhere too form the basis for redesign. Gerlt
and co-workerscombined a rational mutation with directed evolution
for theredesign of l-Ala-d,l-Glu epimerase (AEE) and
muconatelactonizing ezyme (MLE), respectively.[119] The efforts
wereaimed at introducing OSBS (o-succinyl benzoate
synthase)activity into AEE and MLE, neither of which
showspromiscuity towards the OSBS reaction. The feat wasachieved by
altering the substrate specificity: a single muta-tion (Asp-Gly and
Glu-Gly) allows AEE and MLE to acceptthe OSBS substrate, which
readily reacts with the unchangedcatalytic residues to give
o-succinyl benzoate. Ohta and co-workers went a step further and
generated a-aryl propionateracemase activity in the homologous aryl
malonate decar-boxylase (AMD) by introducing a catalytic acid/base
into theactive site (Gly74Cys).[120] In a separate study, the
enantiose-lectivity of that same decarboxylase was inverted by
usinga double mutant (Gly74Cys, Cys188Ser).[121] The mutant
gives(R)-a-thienyl propionate in a yield of 60 % and an
enantio-meric excess of 84% ee, but also displays an
approximately600-fold lower activity than the wild-type AMD.
Randommutagenesis improved the kcat value 10-fold and decreasedthe
gap to the wild-type AMD to a 60-fold drop in activity.[122]
Dunaway-Mariano and co-workers impressively showed thatfunction
can be transplanted within the crotonase superfamilyby replacing a
His–Asp dyad with a Glu–Glu acid/basepair.[123] Two glutamates were
introduced into the 4-CBA-CoA dehalogenase active site and six
additional mutationswere necessary to give a fully soluble and
stable protein. Witha kcat value of 0.064 s
�1, the octamutant activity is far below
that of the wild-type crotonase (kcat = 1000 s�1), but the
exercise shows that “an entirely new catalytic pathway canbe
created at the expense of the pre-existing pathway througha limited
number of amino acid substitutions”.
The redesign of enzyme superfamily members is useful
indeciphering the strategies and principles that guide thenatural
evolution of catalytic biomolecules. The chemicalversatility that
is accessible to the protein engineer by thisroute, however, is
limited to the generally narrow range ofreaction types within a
superfamily. The crotonase super-family (CS) is a notable exception
in which “nature has variedcommon structural features to evolve
catalysts for a remark-ably diverse set of reactions”[124] spanning
all six classes ofreaction defined in the Enzyme Commission (EC)
classifica-tion scheme.
2.4. Rational and De Novo Protein Design2.4.1. Design and
Prediction of Protein Folds
More drastic engineering approaches that are based ona variety
of computational techniques have led to the redesignof entire
proteins. Early work in the field focused on theredesign of helical
bundles,[125] and employed strategies thataim at generating
specific hydrophobic/hydrophilic pat-terns—a primary determinant
for the orientation and registerof helical bundles. The approach
gave some sense of controlover the formation of a fold without
necessitating theprediction of specific side-chain
orientations.[126–128] Furtherwork extended the computational
design approach to proteinstructures with less-regular
geometries.[129] The generalapplicability of computational
protocols, such as that ofRosettaDesign, was tested by
re-engineering a diverse set ofnine small globular proteins.[130]
The computational design ofproteins of complex topology is assisted
by techniques such asdead-end elimination and Monte Carlo sampling
that canattempt to pack side chains in their minimal energy
positions.The scope of computer-based engineering is not limited to
theredesign of existing topologies. Kuhlman et al., for
example,iterated between sequence design and structure prediction
toaccess novel protein folds, and produced the Top7
a/btopology.[131] In contrast to previous design procedures
thattreated the backbone as rigid and require a vast
conforma-tional space to be sampled, the design of Top7 was
possible inpart because of a flexible backbone minimization step in
theiterative protocol.
2.4.2. Protein–Protein Interactions
Computational approaches have also been employed forthe design
of protein–protein interactions. Huang et al.achieved micromolar
binding affinities by using a designminimization approach in which
the best amino acid identitiesand rotamers were predicted for the
protein–protein inter-face.[132] Improved affinities were obtained
when the naturallyoccurring protein–protein interfaces were used as
a guide.[90]
Key residues that are thought to account for the bulk of
thebinding affinity are chosen as “hot spots” and placed
inlocations likely to maximize binding. The rest of the
interface
Computational Enzyme DesignAngewandte
Chemie
5707Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH
Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org
http://www.angewandte.org
-
is then “filled in” to maximize packing around these
keyinteractions and yielded a binding affinity of 130 nm.
Thesomewhat more challenging task of designing a single binderto a
fixed, biologically relevant partner gave rise to a compu-tational
design with a binding affinity of 200 nm.[133] Directedevolution
further improved the binding constants to 180 pmand 4 nm. Analysis
of the mutations suggest that the computa-tional designs could be
improved by accounting for backboneflexibility, as well as improved
electrostatic and solvationmodels.[133] More recently, DeGrado and
co-workers utilizedtheir computational design approach CHAMP
(computedhelical antimembrane protein method)[134] to produce a
helicalb peptide that targets a transmembrane helix of the
integrinaIIbb3.
[135] The DeGrado research group further showcased theutility of
computational design approaches by generatinghelical protein
assemblies along carbon nanotubes.[136]
2.4.3. DNA Binders
The design of DNA binders is another interestingdirection of
computational protein design. One technique isto combine
preexisting DNA binding modules by redesigningthe intermodule
interfaces, thereby reducing the problem toa design of
protein–protein interactions.[137] More targetedchanges were made
in the computational redesign of homingendonucleases that can
recognize a single base pair differ-ence.[138,139] The design for
recognition of multiple base pairchanges has also been
demonstrated.[140] The simultaneousintroduction of multiple
adjacent base pair changes provedmore successful than a stepwise
combination of mutationsfrom individual base pair changes. The
design of suchsequence specificity changes has been used in the
case ofthe homing endonuclease I-AniI to probe the role of
DNAsequence in binding and catalysis.[141]
2.4.4. Protein–Ligand Interactions
Early work on protein–small-molecule binding appearedpromising,
with reports of binders for metal,[142,143] lactate,[144]
serotonin,[144] TNT,[144] and nerve agents.[145] However,
doubtsarose when the periplasmic binding proteins designed
forlactate, serotonin, TNT, and nerve agents did not show
ligandbinding when assayed by isothermal calorimetry (ITC) orNMR
spectroscopy.[146] It is thought that the initial reports ofsuccess
may have arisen from the reliance on an indirectenvironmentally
sensitive fluorescence-based readout. Whilenot a solved problem,
some progress has been made onprotein–small-molecule binding. Boas
and Harbury appliedcomputational design to periplasmic binding
proteins andfound that native site recapitulation required
high-resolutionrotamer sampling, continuous minimization, and
accurateelectrostatic calculations.[147,148] DeGrado and
co-workerswere able to create an a-helical bundle which was able
tobind a heme-like cofactor.[148] Recent attempts at redesigninga
dipeptide binder, on the other hand, were unsuccessful,presumably
because of inadequately accounting for thebinding site
flexibility.[149]
2.4.5. Catalytic Peptides and Proteins
Early examples for the de novo design of chemicalfunctions
include, but are not limited to, the following studies:
Johnsson et al. designed a metal-free oxaloacetate
decar-boxylase (oxaldie) that operates through an imine mechanismby
incorporating a reactive amine onto an amphiphilic a-helix.[150]
Designed oxaldies catalyze the decarboxylation ofoxaloacetate with
a kcat/KM value of 0.63m
�1 s�1. The rate ofimine formation is found to be three to four
orders ofmagnitude larger with oxaldie than with simple
aminecatalysts and comparable to catalytic antibodies
(103–106).[151]
Sasaki and Kaiser designed “helichrome”, an
artificialhemeprotein, in which four amphiphilic a helices
werecovalently tethered to one face of the porphyrin ring tocreate
a hydrophobic pocket for substrate binding. The FeIII
complex of helichrome showed hydrolase activity and con-verted
aniline into p-aminophenol in the presence of NADPHwith a kcat/KM
value of 1.67m
�1 s�1.[152]
Broo et al. designed a hairpin helix–loop–helix motif
thatdimerizes to form four-helix bundles and that utilizes
histidineresidues to catalyze the acyl-transfer reaction of
activatedesters.[153] Rossi et al. inserted two and four copies of
theartificial triazacyclononane amino acid into three
distincthelix–loop–helix peptides. They generated ZnII binding
sitescapable of catalyzing the transesterification of an RNA
modelsubstrate up to 380-fold.[154]
Dutton and co-workers used a tryptophan and a tyrosineradical
maquette, a3W
1 and a3Y1,[155] as models of radical
enzymes to study how side-chain radicals are
generated,controlled, and directed towards catalysis.[156] In more
recentwork, Pecoraro and co-workers utilized a3D as a scaffold
forthe placement of three cysteine residues that are capable
ofbinding the heavy metal ions CdII, HgII, and PbII.[157]
DeGrado and co-workers described the catalysis of an
O2-dependent phenol oxidase reaction by de novo diiron
modelproteins based on the four-chain heterotetrameric
helicalbundle DFtet.
[158] The most active variant catalyzes theoxidation of
4-nitrophenyl acetate with a 1000-fold rateenhancement.
Bolon and Mayo used a “compute and build” strategy toincorporate
hydrolase activity onto a catalytically inert E. colithioredoxin
scaffold. The resulting PZD2 utilizes a nucleo-philic histidine to
promote the hydrolysis of p-nitrophenylacetate 180-fold.[159]
3. The Inside-out Approach to ComputationalEnzyme Design
In recent years, computational algorithms have
becomeincreasingly reliable for identifying amino acid
sequencescompatible with a target tertiary structure. Efforts
towardssolving the inverse protein folding problem[160–163]
reacheda milestone with the design and successful experimental
proofof the structure of the 93-residue a/b protein Top7.[131]
Thisshowed that, for an arbitrary fold, it is possible to
usecomputational methods to predict sequences that wouldproduce
that stable fold. While a great deal remains to be
.AngewandteReviews
K. N. Houk et al.
5708 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co.
KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725
http://www.angewandte.org
-
done in this area, another great challenge is to
createfunctional proteins that can promote non-natural
chemicalreactions.
A collaborative effort between the research groups ofBaker and
the Houk has led to the development of an “inside-out” protocol
towards this goal (Figure 2). At the core of the
computational design protocol is a theoretical active
site(theozyme, Figure 2, top panel) with the appropriate
func-tionality for catalysis. Here, quantum mechanical
(QM)calculations are employed to determine the catalytic unitsthat
will be most effective at stabilizing the transition state(TS) in a
precise geometrical arrangement. Protein scaffoldsare selected from
the PDB (http://www.rcsb.org)[164] and areused as templates into
which the QM transition-stategeometry is grafted
(RosettaMatch,[165] Figure 2, centerpanel). Amino acid residues
surrounding the QM theozymeare mutated and optimized to ensure good
packing and foldstability, and to complement the geometric and
electronicfeatures of the TS (RosettaDesign, Figure 2, bottom
panel).
3.1. Theozymes
In the first step of the inside-out design protocol,
QMcalculations are carried out to generate
three-dimensionalarrangements of functional groups that are optimal
forstabilizing the TS of the targeted reaction.[166] A theozyme
(short for theoretical enzyme) is typically constructed from
anarray of amino acid side chains and backbone amides,
butincorporation of unnatural amino acids and cofactors canfurther
expand the chemical space. For a given reaction,a number of
distinct theozyme motifs are usually generated,each of which varies
in the composition of its functionalgroups. The energy profile of
each motif is computed and themagnitude of catalysis is assessed.
The theozyme motifs arefurther diversified geometrically by
producing an ensemble ofconformations without disrupting the
catalytic interactions.
3.2. Incorporating Theozymes into Protein Scaffolds
RosettaMatch has been used to search the native activesites of
existing protein structures for backbone positions thatcan
accommodate the three-dimensional side-chain arrange-ment in a
theozyme. The program “matches” the theozymemotif into the pocket
by sequentially attaching each sidechain of the theozyme to the
backbone of the protein scaffold.Side-chain rotamers are generated
for every position in thescaffold active site to which the
functional groups of thetheozyme are mapped. An ideal match is then
one in whichthe exact three-dimensional geometry of the theozyme
can berealized. Deviations from the optimum geometry by just a
fewtenths of an Angstrom and single-digit angles can lead
toenergetic penalties of up to 5 kcalmol�1, which translates tofour
orders of magnitude in terms of the reaction rate (kcat).
Inpractice, an ideal match has not yet been obtained for any ofthe
designed enzymes; a circumstance that can be attributedto the
discrete nature of both the protein backbone and theprimary
matching algorithm as well as to the computationalcost associated
with the mapping out of conformational space.Matching then quickly
becomes a bottleneck in the computa-tional design protocol,
particularly when a theozyme invokesthree or more catalytic
residues. Hence, an exact searchtypically does not give a single
match and it becomesnecessary to assign tolerance values to
catalytic distances,angles, and dihedrals. The resulting matches
are generallydistorted from the theozyme geometry and necessitate
someform of geometric filtering and ranking according to
theirtheozyme-likeness. A useful utility for this purpose is
EDGE(enzyme design geometry evaluation), which uses
geometrichashing to compare theozyme atoms with a target
structureand ranks matches based on the summation of their
devia-tions.
SABER (selection of active/binding sites for enzymeredesign), a
program developed by Houk and co-workers,offers an alternative to
RosettaMatch: instead of placingtheozymes into predefined active
sites, SABER searches theProtein Data Bank (PDB) for proteins with
the appropriatecatalytic functionality already in place. When a
suitable activesite is found, only those amino acid residues need
to bemutated that are required to accommodate the new substratein
its transition-state geometry. This stands in contrast to
theRosettaMatch-based approach, where both the new
catalyticfunctionality and the new substrate must be
accommodated,generally requiring a larger number of mutations than
theSABER-based approach.
Figure 2. Key steps in the computational inside-out design
protocol(shown here for the Kemp elimination): from QM theozyme, to
match,to design.
Computational Enzyme DesignAngewandte
Chemie
5709Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH
Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org
http://dx.doi.org/10.1016/0014-5793(76)80847-2http://www.angewandte.org
-
3.3. Active-Site Design
After the theozyme has been attached to a scaffoldprotein,
either by RosettaMatch or by SABER, the Roset-taDesign module is
used to restrain catalytic residues and togenerate an optimal
sequence/structure for the remainder ofthe active site. Rotamer
sampling by Monte Carlo simulatedannealing is used to optimize the
identity and position ofactive-site residues, both in terms of
their interactions withthe theozyme and also with each other. To
further refine theactive site, this rotamer sampling is performed
for multiplerounds, interspersed with minimization of the side
chains,backbone, substrate conformation, and rigid body
position.Throughout the process, the theozyme geometry is
enforcedthrough restraints. To ensure that the resultant sequence
isintrinsically compatible with the theozyme, rather than
beingexternally forced, a last cycle of repacking and
minimizationwithout the geometric restraints is commonly run.[167]
Ideally,these steps lead to the introduction of amino acid
residuesthat add interactions to stabilize the positions of the
keycatalytic residues, tune their pKa values, and optimize
tran-sition-state binding. In practice, each match that enters
theactive-site design stage contains a theozyme that is
alreadysignificantly distorted compared to the ideal QM TS
geom-etry. RosettaDesign is then tasked with generating the
bestpossible stabilization for a geometry that in itself is
non-ideal.While RosettaDesign attempts to constrain the design to
theideal theozyme, as specified by the geometric
restraints,normally even the highest ranked final designs differ
quiteconsiderably from the original theozyme geometry. Figure
3illustrates this point with four Kemp elimination designs thatare
superimposed onto the catalytic heavy atoms of theirtheozyme. The
individual side chains cluster together in theirgeneral
three-dimensional arrangements (Figure 3a), but lackthe precise
positioning that naturally evolved enzymes displaywithin a
catalytic class (Figure 3 b).[168]
3.4. Filtering, Ranking, and Evaluating Computational
Designs
Prior to the experimental workup, final designs areassessed
towards their capability to stabilize the key catalyticresidues.
They are ranked on the basis of empirical criteriasuch as Rosetta
energy, ligand-binding scores, hydrogenbonding, active-site
geometry, and packing scores. Compar-ison with the original
scaffold protein plays an important role,as the native context
forms a reference for what a well-foldedprotein looks like. Thus
far, assessing the quality of finaldesigns has relied heavily on
the chemical intuition of thehuman designer for assessing how
“enzyme-like” prospectivedesigns are and for capturing properties
that are currently notaccounted for by the Rosetta scoring
framework. Nature�scatalytic units are generally supported by
frameworks ofhydrogen bonds, steric packing, p–p stacking, limited
dynam-ics, and limited water accessibility. At present, some of
this isimplicitly accounted for through various energy scores
thatpenalize poor interactions and reward good ones throughoutthe
design and repacking process. The explicit provision ofsupporting
interactions for the catalytic unit can be viewed as
a second, and in a sense more challenging, stage in the
designprocess, for which we are only now beginning to
establishautomated protocols. Increasingly, tools such as
Foldit,EDGE, various in-house scripts, and more rigorous
computa-tional tests that probe the dynamics of the systems are
beingdeveloped and refined with the goal of maximizing thesuccess
rate, particularly as more challenging reactions arepursued.
The design of Kemp eliminases and retro-aldolases, forexample,
was carried out with the first version of Rosetta-Match and
RosettaDesign. The scaffold set consisted of only87 proteins, only
a discrete matching algorithm was available,and the backbones of
the proteins were treated as rigid. Theassessment of designs was
performed by manual inspection ofthe optimized final structures.
The design of proteins towardscatalysis of a bimolecular
Diels–Alder cycloaddition wascarried out with an updated version of
Rosetta, using thediscrete matching algorithm against a scaffold
set of 227proteins. Final designs were assessed both by
manualinspection of the optimized geometries and by
moleculardynamics simulations.
The current collection of Rosetta modules (Rosetta3)extends the
scaffold set to the entire PDB, introducesa secondary, nondiscrete
matching algorithm, which comple-ments the primary one, and allows
a small degree of backboneplasticity in response to a new
active-site sequence. MDsimulations were found to be valuable for
assessing thestructural integrity of a newly designed active site
and for
Figure 3. Geometric overlay of catalytic atoms. a) Theozyme
(black/orange) over four final Rosetta designs in the TIM barrel
fold (lightgreen). The catalytic heavy atoms are highlighted as
spheres. RMSDvalues of KE designs with the His–Glu/Asp dyad
compared totheozyme: 1.2 � for KE70 (second most active), 0.8 � for
KE38(inactive), 0.8 � for KE54 (inactive), 0.5 � for KE66
(inactive). b) Cata-lytic triad from esterase crystal structures;
RMSD= 0.45 � within thesame fold.[168]
.AngewandteReviews
K. N. Houk et al.
5710 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co.
KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725
http://www.angewandte.org
-
exposing design flaws that are intractable from static
evalua-tions.[169]
The sequence of a final design generally differs by 10% ormore
from that of the wild-type template protein. Dependingon the degree
of the perturbation, the packing and hydrogen-bonding interactions
within the modified protein are expectedto be less ideal than those
of the wild-type scaffold protein.Cycling through repacking and
geometry optimization duringthe design process ensures that the
overall conformation ofthe new protein is at a minimum of its
potential energylandscape. However, neighboring minima may exist
(corre-sponding to alternative conformations of side chains
andloops) that could have become thermodynamically morefavorable in
the design process. The actual conformationalstate of a designed
active site might thus differ significantlyfrom that of the
computational model—a possibility that canreadily be investigated
through molecular dynamics (MD)simulations. MD evaluations are now
performed on a routinebasis for finalized designs as a means to
pinpoint structuralweaknesses and to guide adjustments in the form
of additionaland/or alternative mutations.
3.5. Experimentation
Aside from the source of the genes (chemical synthesisversus
cloning), experimental validation of computationallydesigned
enzymes is much the same as activity measurementsfor any other
enzyme.
3.5.1. Synthesis and Expression
In the case of the retro-aldolases,[170] the Kemp
elimi-nases,[170, 171] and the Diels–Alderases,[172] the final
optimizedprotein sequences were sent to a commercial gene
synthesiscompany for typical codon optimization and cloning intoa
standard His-tagged E. coli expression vector. E. coliBL21(DE3)
cells were then transformed with the plasmid,and the gene expressed
under conventional IPTG or auto-induction conditions.[173] Soluble
protein can thus be obtainedby conventional IMAC
purification,[174–176] along with gelfiltration.
3.5.2. Enzyme Assays
One potential complication in assaying the activity ofa designed
enzyme is the low activity of most of the initialvariants. Assays
that can detect slightly above backgroundlevels of activity are
thus preferred to identify these weakcatalysts. Such assays are
selected on the basis of the targetreaction. The Kemp eliminases of
Rçthlisberger et al. and theretro-aldolases of Jiang et al. were
designed for a reactionwith a spectrophotometic shift, and
continuous monitoring byUV/Vis spectroscopy over the course of over
10 min or 40 h,respectively, allowed for the detection of product
formation.In contrast, the Diels–Alderases of Siegel et al. were
designedagainst a reaction that was spectrophotometrically silent,
soproduct formation was monitored by LC-MS, with time pointstaken
over the course of several days. In this case, a chiral LC-
MS assay allowed for further characterization of the
stereo-specificity of the reaction, which showed that the catalyst
wasspecific for the product configuration selected at the theo-zyme
stage.
3.5.3. Directed Evolution
Typically, the initial successful designs have low activity.This
low starting activity has been further improved throughmultiple
rounds of directed evolution. A combination ofrandom mutagenesis
and targeted diversification has yieldedimproved activities.
Further computational analysis also fedinto this work, thereby
allowing for the selection of potentialmutations (including
insertions) to incorporate during therounds of selection. Specific
examples are highlighted inSection 4.2.3.
4. Computational Enzyme Design—Achievements
4.1. Retro-Aldolases
The design of a novel retro-aldolase is the first example
inwhich the computational inside-out approach was employedto
construct a functional active site. The resulting retro-aldolases
catalyze the C�C bond breaking in
4-hydroxy-4-(6-methoxy-2-naphthyl)-2-butanone.[170] Analogous to
the strat-egy used by type I aldolases, the reaction mechanism
invokesa nucleophilic lysine and the formation of an
iminiumintermediate (Scheme 8).[177]
The computational designs are based on four distincttheozymes
(Figure 4). They feature a lysine as a Schiff baseand a general
acid/base (I: Lys/Asp dyad, II: Tyr, III: His/Aspdyad, IV: H2O) for
deprotonation of the b alcohol. Thecharged side chain (Lys-Asp-Lys)
mediated proton transfer
Scheme 8. Steps in the amine-catalyzed retro-aldol reaction
of4-hydroxy-4-(6-methoxy-2-naphthyl)-2-butanone.
Computational Enzyme DesignAngewandte
Chemie
5711Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH
Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org
http://www.angewandte.org
-
scheme in motif I is analogous to that with
d-2-deoxyribose-5-phosphate aldolase.[178] Motifs II, III, and IV
mimic the activesites of catalytic antibodies, in which a lysine is
placed intoa hydrophobic pocket to lower its pKa value.
The geometries of the four active-site motifs wereobtained from
QM theozyme calculations, which were carriedout for every step
along the retro-aldol reaction path. Thetransition-state geometries
were then combined to generatea composite active site that carries
the geometric informationof the complete reaction profile. The
resulting consensustheozymes of the four motifs were further
diversified byvarying a) the internal degrees of freedom of the
compositetransition state, b) the orientation of the catalytic side
chainswith respect to the composite transition state within
rangesconsistent with catalysis, and c) the conformations of
thecatalytic chains. For each motif, a set of 1013–1018
uniqueactive-site geometries was generated. The hashing
algorithmwithin RosettaMatch[165] was used to search for placements
ofthese into the binding pockets of 71 protein scaffolds. Around180
000 distinct solutions (matches) were found. Rosetta-Design was
subsequently utilized to optimize the active-sitesequence for
optimal packing around the composite transitionstate and the
catalytic lysine. A total of 72 designs in 10different scaffolds
were selected for experimental character-ization. The final
selection criteria were based on a) thepredicted binding energy of
the transition state, b) the extentto which the catalytic geometry
was satisfied, c) the packingaround the active lysine, and d) the
consistency of side-chainconformation after side-chain repacking in
the presence andabsence of the composite transition state.
70 of the 72 proteins were soluble when expressed andpurified
from Escherichia coli, and a respectable 32 showeddetectable
retro-aldolase activity. Product formation wasmonitored with a
fluorescence-based assay. The active designsspan five different
protein scaffolds (1mw4, 1f5j, 1thf, 1i4n,1a53) from the triose
phosphate isomerase (TIM) barrel andjelly-roll folds, and are based
on the active-site theozyme
motifs III and IV. The designs in the relatively open
jelly-rollscaffold show simple linear kinetics, whereas the TIM
barreldesigns with more enclosed active-site pockets displayedmore
complex kinetics—a potential indication of restrictedsubstrate
access and/or product release. Two apo structures(the S210A variant
of RA22 and the M48K variant of RA61)were solved at 2.2 and 1.9 �
resolutions, respectively.[170] Thebackbone geometries and
side-chain orientations are inexcellent agreement with those of the
designs. Respectablerate enhancements of up to four orders of
magnitude wereachieved. However, even the best computational design
fallstwo to three orders of magnitude short of the rate
enhance-ment (kcat/kuncat) that is achieved by comparable
catalyticantibodies.[60, 61] The catalytic efficiencies (kcat/KM)
of thedesigns range between 0.02 and 0.74m�1 s�1 and are
modest,particularly when compared to those of natural enzymes.
In an effort to shed light on the performance discrepancyof
computationally designed enzymes relative to catalyticantibodies,
Ruscio et al.[179] studied the influence of structuralfluctuations
of the protein on the active-site preorganizationof RA22 by using
molecular dynamics. The authors found thatan alternative
orientation of the substrate with respect toHis233 is optimal for
the nucleophilic attack by Lys159. Theyfurther note that the
His233–Asp53 dyad is disrupted due tothe solvation of Asp53, which
in turn provides conformationalflexibility to His233, thus
affecting its interaction with thesubstrate. The authors attributed
the comparatively lowactivity of RA22 to these dynamic distortions
in thedeprotonation step of the reaction.
Lasilla et al. recently showed that the designed interac-tions
of water with Tyr78 and Ser87 in RA61 do notcontribute to
catalysis.[180] Activity is instead largely attrib-uted to the
nucleophilic character of the catalytic lysine(pKa = 6.8–7.5) and
to the favorable interaction energybetween the enzyme and the
naphthyl group of the substrate.
4.2. Kemp Eliminases4.2.1. Computational Designs
The Kemp elimination (Figure 5a) is a well-studied ring-opening
reaction that is initiated by deprotonation of thesubstrate. The
reaction serves as a model for the biochemi-cally relevant
abstraction of a proton from carbon centers,although it does not
have a natural counterpart. The reactionhas become an attractive
target for catalyst design, rangingfrom catalytic antibodies[51] to
“synzymes”.[181] Most recently,DeGrado and co-workers employed a
minimalist designapproach to endow calmodulin with Kemp
eliminationactivity.[182]
The rate of the Kemp elimination depends strongly on themedium
when a carboxylate functions as the general base, andrate
accelerations of 107 can be achieved by simply placingacetate in a
polar aprotic solvent such as acetonitrilecompared to placing it in
water.[183] An additional accelerationof 106 can be achieved
through precise positioning of thedonor and acceptor for this
reaction,[53, 183,184] thereby givinga theoretical limit for the
rate enhancement of 1013 for theKemp elimination.
Figure 4. Retro-aldolase theozyme motifs.
.AngewandteReviews
K. N. Houk et al.
5712 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co.
KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725
http://www.angewandte.org
-
In the first example of active-site design towards catalysisof
an unnatural reaction, Rçthlisberger et al. used the inside-out
protocol to produce eight active enzymes that promotethe
base-catalyzed ring opening of 5-nitrobenzisoxazole (5-NBZ).[171]
Two distinct theozymes (Figure 5b) were employedin the process, and
gave rise to catalysts with rate enhance-ments of up to 105. A
crystal structure was solved for KE07, anactive design with kcat/KM
= 12.2m
�1 s�1, and superimposedwell on the computational model.
The kinetic parameters of these eight computationalKemp
elimination designs are comparable to those ofcatalytic antibodies.
The rate enhancements (kcat/kuncat)range from 103 to 105 and
compare well with the kcat/kuncat value of 10
4 displayed by catalytic antibodies 34E4 and35F10.[51] In terms
of substrate binding, on the other hand, thetwo catalytic
antibodies outperform the eight computationaldesigns up to 10-fold
(antibody KM of 0.6 to 0.1 mm comparedto a range of 4.2 to 0.6 mm
for the designs). Three of thecomputational designs were further
enhanced by in vitrodirected evolution. The kcat/KM value of KE07
was improved200-fold,[185] that of KE70 over 400-fold,[186] and in
the case ofKE59 the kcat/KM value was increased over 2000-fold.
[187] Thestudies demonstrate how computational protein design
can beused to generate enzymes with modest activities that can
thenbe further optimized through directed evolution approaches.
4.2.2. Computational Analyses of De Novo Kemp Designs4.2.2.1.
PDDG/PM3 Monte Carlo Study
Alexandrova et al. described the analysis of the fouractive Kemp
elimination designs KE07 (258 residues), KE10(253), KE15 (258), and
KE16 (258) by mapping out thereaction coordinate with a
semiempirical PDDG/PM3 QM/MM Monte Carlo approach.[188] The
computational setupconsisted of 200 residue cutaways of the four
designs. Thesemiempirical QM part consisted of the 5-NBZ substrate
andthe catalytic base (Glu/Asp). Water molecules were not
included in the PM3 region. Side-chain motions were sampledwhile
the protein backbones were held fixed. The attempt togain insight
into what governs the observed activities and toestablish a
correlation between the computed and experi-mental barriers was met
with limited success. The computedbarriers were plagued by large
absolute error bars and bya trend that was opposite to what was
found experimentally. Itshould be noted, however, that within the
series of fourdesigns that was chosen for this study, the free
energies ofactivation (DG�) span a range of merely 0.9 kcal
mol�1—toonarrow to be picked up by most modern QM methods.
4.2.2.2. DFT-Based Approaches
Density functional theory (DFT) calculations wereemployed to
study six active and four inactive Kempelimination designs with
free energies of activation (DG�)ranging from 18.3 to 20.6
kcalmol�1 (actives) and to DG��23.2 kcal mol�1 (inactives).[169]
Three modeling approacheswere explored, ranging from a minimalistic
representation ofthe catalytic units (Figure 6, upper right), to QM
on the full
active sites, and to computations on the entire protein
systemsafter a short MD simulation (Figure 6, left) in which the
activesite and selected water molecules were treated with QM(Figure
6, bottom right).
Qualitatively, the full-protein MD-QM/MM approachcompared best
to experiment; the computed barriers forinactive designs were
significantly higher than those of activedesigns. Aside from the
qualitative agreement, however, theapproach shows only a weak
correlation with the experimen-tally determined energy barriers (R2
= 0.58), thus indicatingthat significant contributions to catalysis
also escape thiscomputational model. A lesson from these studies is
thatcomputing energy barriers for base-catalyzed reactions suchas
the Kemp elimination, necessitates an explicit treatment ofsolvent
molecules and other polar groups as part of the QMcalculations
Figure 5. The Kemp elimination. a) Reaction scheme of
5-nitrobenz-isoxazole. b) The two theozymes that were employed.
Figure 6. Modeling approaches for analysis of Kemp designs
rangefrom QM on the catalytic unit (top right; with circled
backbone heavyatoms frozen) to full enzyme ONIOM QM/MM after 2 ns
MD (QMlayer shown as sticks at bottom right). Modified from Ref.
[169].
Computational Enzyme DesignAngewandte
Chemie
5713Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH
Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org
http://www.angewandte.org
-
4.2.2.3. Empirical Valence Bond (EVB) Calculations
Warshel and co-workers used a two-layer EVB approachto evaluate
Kemp designs. They applied free energy pertur-bation umbrella
sampling (FEP/US) calculations on designsKE07 (and directed
evolution variants),[189] KE70, KE59, andHG-2.[190] The EVB setup
was calibrated to reproduceab initio calculations of the reaction
surface in a solventcage and then applied to obtain free energies
of activation(DG�). For many systems, these are in exceptional
agreementwith experimentally determined values, yet for others
(e.g.KE59, HG-2) the deviations are significant (Table 1).
4.2.2.4. Active-Site Dynamics from MD Simulations
Molecular dynamics simulations of 23 Kemp eliminases(14 active,
9 inactive), although of a rather qualitative nature,were more
conclusive than previous computational studies.Analysis of the
simulation data showed that the failedcomputational designs are
unable to maintain essentialactive-site hydrogen bonds.[169] This
becomes particularlyclear in the example of the inactive KE38
(Figure 7c).Compared to the catalytic His–Asn contact in the
naturallyevolved cathepsin K (Figure 7a) and the catalytic
His–Aspdyad in the active KE70 (Figure 7 b), there is no
significantpopulation in which the KE38 His–Glu dyad is intact, and
Hisalone is too weak a base to deprotonate the substrate on
itsown.
Overall, a disassembly of the designed catalytic contactswas
observed to occur through a combination of two factors:excessive
solvent accessibility and alternative side-chainpacking, both of
which give rise to distinct distributionpatterns. This observation
is relevant to rational enzymedesign in general, but particularly
for the catalysis of reactionsthat depend on a carboxylate base, as
they are usuallysensitive to polar protic solvents such as water.
Solventmolecules that come into direct contact with the
carboxylateoxygen atoms can significantly reduce their base
strength (upto 106 in terms of kcat) and Figure 8a shows this trend
fora cross-section of the dataset. On average, the active sites
offunctional designs are less hydrated than those of
inactivedesigns (Figure 8b), but even the microenvironments of
themost active designs are still far from those of naturallyevolved
acid/base catalysts such as cathepsin K (outermostright column in
Figure 8b).
Taken together, active designs can be discerned frominactive
ones when a multidimensional problem can besimplified to a
two-dimensional model (Figure 9).
What transpires then from this study is that by queryingthe
dynamics of a protein–substrate complex in the presenceof
explicitly represented solvent molecules, and by askingspecific
questions based on chemical intuition, one can gathera wealth of
information about the system at hand and relatethat to experimental
observables. On this basis, it has becomea useful approach to
combine MD-based analyses with the
Table 1: EVB activation free energies for computational Kemp
elimi-nases.[189,190]
System PDB entry Base DG�exp[a] DG�EVB
[a]
HG-2 (S265T) NA Asp127 17.7 18.234E4 antibody 1vol GluH50 17.9
17.3KE59 NA Glu231 18.3[b] 31.7HG-2 3nyd Asp127 18.5[c] 24.3KE70
3npu His16-Asp44 18.5 19.31A53-2 3nyz Glu178 20.0 20.7KE07 2rkx
Glu231 20.1 19.5
[a] In kcalmol�1. [b] Computed with kcat = 0.29 s�1.[171] [c]
Computed from
an extrapolated kcat = 0.22 s�1.[191] NA= not available.
Figure 7. Angle (q) versus distance (d) scatter plots of the
catalyticcontact. a) His–Asn contact of the naturally evolved
cathepsin Kcatalytic triad; b) His–Asp contact of the active design
KE70; c) His–Glu contact of the inactive design KE38. Data points
are from 20 nsMD simulations. Three hydrogen bond categories[192]
are outlined withdotted lines. The individual distributions are
projected onto the axes.The progression of the catalytic contact
from QM theozyme, to finaldesign, and the fully relaxed MD starting
geometry is plotted withfilled, half-filled, and empty circles,
respectively. Modified fromRef. [169].
.AngewandteReviews
K. N. Houk et al.
5714 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co.
KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725
http://www.angewandte.org
-
design and refinement of new enzymes and the interpretationof
results from directed evolution experiments.
4.2.3. Directed Evolution of Kemp Eliminases KE07, KE70,
andKE59
Tawfik and co-workers combined directed evolutionmethods with
rational design and were able to furtherimprove the catalytic
activities of three computationallydesigned Kemp eliminases.
4.2.3.1. KE07
Seven rounds of random mutagenesis and selectionresulted in up
to eight mutations and a 200-fold increase inthe kcat/KM value
compared to the computationally designed“wild-type”-KE07.[185] The
improvement resulted from a 2.6-fold lower KM and a 76-fold higher
kcat value, which canlargely be attributed to the Ile7Asp mutation
adjacent to the
active site. The Ile7Asp mutation weakens the partial saltbridge
between the catalytic Glu101 and Lys222 (Figure 10 a)in a dual
fashion: Asp7 directly competes with Glu101 forLys222 and also
recruits additional water molecules that candirectly interact with
Glu101 (Figure 10b), effectively break-ing up the salt bridge and
tuning the pKa value of the catalyticbase.[169,185]
4.2.3.2. KE70
A combination of computational optimization and ninerounds of
random mutagenesis resulted in a > 400-foldincrease in the
kcat/KM value (up to 12-fold lower KM and53-fold higher kcat).
[186] The improvement was attributed totighter substrate
binding, fine-tuned electrostatics (Fig-ure 11a,b), and
stabilization of the catalytic dyad in anorientation optimal for
catalysis (Figure 11c,d). Progressiverounds of directed evolution
cause the “D loop” to becomeless mobile (Figure 11 c) and allow the
catalytic dyad residues(His17 and Asp45) to form a stronger
hydrogen bond(Figure 11 d).
The active site of KE70 is based on theozyme II inFigure 5b. The
interaction potential near the ideal distance(r0) of the
His17–Asp45 contact can be approximated toa harmonic function.
Thus, the energetic penalty for devia-tions from r0 is
approximately proportional to (Dr)
2. Further-more, by assuming a simple transition-state model and
usingthe Eyring equation, ln(kcat) is proportional to the
activationfree energy. The linear relationship between (Dr)2 and
ln(kcat)(Figure 11 d) then suggests that the increase in the kcat
value ofthe evolved variants results in a large part from the
tighteningof the hydrogen bond between His17 and Asp45, as the
activesite residues of the more evolved KE70 variants become
moreoptimally placed and less mobile.
In contrast to KE07, beneficial mutations were notexclusive to
the second and third shell, but also includedfirst-shell
residues.
4.2.3.3. KE59
The functional mutations that gave rise to a comparativelyhigh
initial activity of this computational Kemp design alsocaused it to
be one of the least stable. In contrast to KE07 and
Figure 8. Water coordination distributions from MD
simulations(d
-
KE70, a number of fold-stabilizing consensus mutations hadto be
introduced prior to the directed evolution. KE59 wasthen subjected
to 16 rounds of directed evolution, whichresulted in a >
2000-fold increase in the kcat/KM value, mostlythrough a
significantly increased kcat value.
[187] The mostproficient variant displayed a KM value of 37 mm,
a kcat/KM value of 0.6 � 10
6 s�1m�1, and a kcat/kuncat value of approx-imately 107—kinetic
parameters that approach those ofnatural enzymes.
KE59 is also the only design in the series that acceptsa variety
of benzisoxazoles besides 5-NBZ. In fact, the largestoptimization
was achieved for 5,7-dichlorobenzisoxazole,a significantly less
activated substrate. The large improvementin the kcat value was
attributed mainly to the ability of moreadvanced variants to more
effectively exclude bulk solventfrom interacting with the catalytic
base (Figure 12a). Sub-
stituents at the 5- and 7-positions of the substrate were
foundto be well-suited for this purpose (Figure 12b).
Conversely,Glu230 can adopt an alternative and catalytically
suboptimalconformation in which it interacts with the
backbone-NHgroup of Ser210 and an average of four water
molecules(Figure 12 c).
4.3. Diels–Alderases
Siegel et al. describe the inside-out computational designand
experimental characterization of enzymes catalyzinga bimolecular
Diels–Alder reaction (Figure 13a) with highstereoselectivity and
substrate specificity (Figure 13b).[172] Nonaturally occurring
enzymes are known that can catalyze thiscycloaddition. The
catalytic motif was inspired by previouscatalytic antibody studies,
where an Asp, Asn, and Tyr formedthe catalytic arrangement.[44, 46]
The catalytic motif hereconsists of a Gln and two Tyr residues
positioned such as tobind the bimolecular transition state leading
to the 3R,4S-endo product. Two active proteins were produced: DA20
wasdesigned into a b-propeller scaffold and DA42 into the
KSIscaffold (Figure 14).
Computational evaluation methods (QM and MD) helpedrationalize
experimental observations and guided adjust-ments to early designs
that resulted in improved kineticparameters. A notable example is
the development ofDA_20_10 from DA_20_00. Molecular dynamics
simulationsof DA_20_00 (kcat = 0.1 h
�1) show that the catalytic Tyr121can access a noncatalytic
conformation in which it binds to thebackbone carbonyl group of
residue 271 (Figure 15, red).Increasing the steric bulk at position
272 was proposed tointerfere with this interaction (Figure 15a,
blue), allowing
Figure 11. a) Crystal structure of the “wild-type” KE70 design.
b) Crystalstructure of the round 6 variant R6 6/10A. Gly101Ser
stabilizes Arg69in an alternative conformation that does not
interfere with the Asp45–His17 catalytic dyad. c) Atomic
fluctuation profiles from MD simula-tions. The active site residues
(circles) and the catalytic dyad (stars)are labeled. Peaks
correspond to loops with elevated flexibility. d) Thesquare of the
deviation (Dr)2 from the ideal hydrogen-bond distance(1.8 � for
this contact) versus �ln(kcat). Modified from Ref. [186].
Figure 12. Number of water molecules within 3.2 A of either of
theGlu230 carboxylate oxygen atoms (from MD simulations), plotteda)
over all evolved KE59 variants with available kcat values for
5-nitro-benzisoxazole, and b) over all substrates with available
kcat values forvariant R13. Error bars correspond to + /� the
standard deviation ofthe MD-based distributions. c) The predominant
conformation ofGlu230 as observed in MD simulations with
5-nitro-6-chlorobenzisox-azole (blue, substrate in orange) versus
the conformation observed inthe crystal structures (green), shown
here for the R13 3/11H variant.Modified from Ref. [187].
.AngewandteReviews
K. N. Houk et al.
5716 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co.
KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725
http://www.angewandte.org
-
Tyr121 to assume the conformation required for binding
andcatalysis. Figure 15b shows an overlay of the active sites
ofDA_20_00, DA_20_10, and the QM theozyme to showcasethese
observations pictorially. DA_20_10 was characterizedwith a kcat
value of 2.1 h
�1.The crystal structure that was solved for a variant of
the
DA_20 design superimposes well onto the computationaldesign. The
catalytic efficiency is comparable to that pre-
viously achieved by catalytic antibodies with equal or
highercatalytic rates, but a relatively weak binding of the
dienophile.
4.4. Iterative Approach to “Inside-Out” Design of Enzymes
The Houk and Mayo research groups together exploredan iterative
variation[191] to the Baker/Houk inside-outapproach.[171] Rather
than expressing and characterizinga large number of computationally
designed proteins, theefforts were focused on a single template
protein (PDB-ID1gor;[193] Figure 16a). As with the Rçthlisberger
designs, thenative active site of the template was constructed to
comple-ment the transition-state geometry for the Kemp
eliminationof 5-NBZ (Figure 5a). Theozyme I (Figure 5b) served as
thecatalytic motif, and Phoenix[194] rather than Rosetta was
usedfor the design of the active-site in silico. The overall
protocolwas similar to that of the Rçthlisberger study and
wasvalidated in its utility to generate active Kemp
eliminases.However, no activity could be produced with the
1gorscaffold. HG-1 (Figure 16 b), the resulting inactive
first-generation design, differs from the wild-type 1gor by
sevenmutations and is fully folded under the conditions of
theactivity assays, with a secondary structure that is very
similarto the wild-type scaffold 1gor. Analysis of the structure
anddynamics of HG-1, however, highlighted a number ofproblems. The
innately flexible active-site pocket of 1gorcould not be engineered
to provide the necessary support forthe theozyme geometry in HG-1
(compare Figure 18 a,b).Additionally it was found that a
substantial number of watermolecules can access the active site of
HG-1—an observationthat has implications for both the binding of
the hydrophobicsubstrate (Figure 18c) and the base strength of
Glu237, theintended catalytic residue. Efforts to increase the
hydro-phobic character of the HG-1 active site were
unsuccessful,and so a more invasive strategy was explored: rather
thansearching the RCSB for a protein with a native active site
thatis better suited for theozyme I (Figure 5b), the focus of
thecomputational design was shifted away from the native activesite
and onto a pre-existing small pocket inside the b barrel(shaded
area at center-bottom of Figure 16a,b).
Figure 13. Diels–Alder reaction between
4-carboxybenzyl-trans-1,3-buta-diene-1-carbamate and
N,N-dimethylacrylamide (a), which gives onlythe 3R,4S endo product
(b). Part (b) is reprinted from Ref. [172].
Figure 14. Computationally designed Diels–Alderases. a)
DA_20_10,b) DA 42 04. Active-site Gln, Tyr, and substrates are in
red sticks.c) Active site of DA_20_00 and d) of DA 20_10. Mutations
are high-lighted in red (for DA_20_00 compared to the native
protein scaffold)and in orange (for DA_20_10 compared to DA_20_00).
Parts (b) and(c) are reprinted from Ref. [172].
Figure 15. a) DA_20_00 shows a narrow distribution at 2 �
(hydrogenbond between Tyr121 and the carbonyl group at position
271), whileDA_20_10 shows a wide distribution at 5 � (no hydrogen
bond).b) Overlay of the QM transition-state geometry (orange) with
equili-brated geometries from MD simulations on DA_20_00 (red)
andDA_20_10 (blue). Reprinted from Ref. [172].
Computational Enzyme DesignAngewandte
Chemie
5717Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725 � 2013 Wiley-VCH
Verlag GmbH & Co. KGaA, Weinheim www.angewandte.org
http://www.angewandte.org
-
The resulting HG-2 design differs from the wild-type 1gorby 12
mutations and utilizes Asp127 as the general base(Figure 16 c). The
native pocket was deepened by 7 � andtightened at the entrance,
effectively generating a new activesite inside the b barrel (Figure
17).
MD simulations predicted that the new design was active.They
showed that HG-2 was capable of stabilizing thetheozyme geometry,
that it could limit the influx of watermolecules to the catalytic
base, and that it could support the
catalytic contact between substrate and Asp127 (Figure 18 d).The
protein was expressed, kinetically characterized, and itsactivity
was confirmed with a kcat/KM value of 123.2m
�1 s�1,comparable to the 163m�1 s�1 for the most active
Rçthlis-berger design, KE59.
The crystal structure of HG-2 was solved to a resolution of1.2
�. A transition-state analogue (TSA) was cocrystallizedand occupied
the active site in two distinct but catalyticallycompetent
orientations. The structure validated the dynam-ics-guided
computational design, but it also drew attention tothe importance
of accounting for alternative substrateorientations in future
versions of the enzyme design protocol.Additional single-point
mutations were explored with theSer265Thr variant, and further
enhanced the kcat/KM value bya factor of three.
Figure 16. Variation of the active site. a) The unmodified 1gor
scaffold,b) design HG-1, and c) design HG-2. The TS model is shown
inorange. Reprinted from Ref. [191].
Figure 17. Active site relocated by 7 �. a) HG-1. b) HG-2.
Cutaway viewwith active-site residues in red and the TS model in
orange.
Figure 18. Dynamics of HG-1 (a) and of HG-2 (b). Equidistant
snap-shots from MD simulations are shown with the backbone in blue
andthe active-site residues in red. The backbone dynamics are of
compa-rable magnitude in both HG-1 and HG-2, but side-chain
active-sitedynamics differ significantly. c) and d) show
angle–distance scatterplots of the catalytic contacts between
substrate and base. Modifiedfrom Ref. [191].
.AngewandteReviews
K. N. Houk et al.
5718 www.angewandte.org � 2013 Wiley-VCH Verlag GmbH & Co.
KGaA, Weinheim Angew. Chem. Int. Ed. 2013, 52, 5700 – 5725
http://www.angewandte.org
-
4.5. Structure Prediction and Design through Crowd Sourcing
Modern computer algorithms have become very effectiveat
approximating the physics that governs molecular inter-actions. A
significant challenge that remains, however, is thatof
conformational sampling. The free-energy landscapes ofbiomolecules
are so vast that navigating them is one of themajor bottlenecks in
the study of protein folding, structureprediction, and protein
engineering. Various approaches havebeen developed to address this
numeric problem over the pastfew years and range from structure
prediction (e.g. Rosetta)to simulation (e.g. Markov state models in
combination withFolding@home) as well as highly specialized
hardware (e.g.Anton). Most recently, crowd-sourced structure
predictiondemonstrated its utility as a surprisingly effective
addition tostatistical and deterministic search algorithms.
Cooper et al. introduced Foldit, a graphical user interfaceto
some of Rosetta�s functionality, which has the addedcapability to
serve as an online multiplayer game.[195] The ideabehind recruiting
“homo ludens”—the playing (wo)man—toscientific challenges is based
on observations from human-based computing, in which certain
tasks—such as shaperecognition—can be performed faster and more
efficiently byhumans than by machines. The Foldit study shows that
basicspatial recognition, intuition, and decision making can
out-compete the stochastic component of conformational searchwhen
applied to problems of protein-structure prediction.
The collaborative nature of the game allows participantsto form
groups and to share and evolve their experience andstrategies in
the form of “recipes” to more effectivelycompete against other
groups. Successful “recipes” get usedand tweaked more often than
unsuccessful ones and so aninteresting evolutionary process starts
taking place in whichnew and enhanced prediction algorithms are
discovered bythe gaming community. The two most popular “recipes”,
forexample, encoded an algorithm that turned out to beara striking
resemblance to an improvedstructure-prediction method, the
devel-opment of which had not been publishedat the time.[196]
Two recent reports demonstrate theutility of crowd-sourcing
through Folditbeyond a proof-of-principle stage.Khatib et al.
generated models formolecular replacement with which thelong
elusive structure of the M-PMVretroviral protease could be
solved,[197]
and Eiben et al. achieved an 18-foldimproved substrate binding
for a previ-ously designed Diels–Alderase throughsubstantial loop
redesign (Figure 19).[198]
Although the evolutionary dynamicsof crowd-sourcing and the
applicabilityof nonscientific thought processes toreal-world
scientific problems are fasci-nating topics in their own right,
thequality of the resulting structure predic-tions strongly depends
on how well thescientific objectives can be broken down
into palatable challenges.[197] Many interesting developmentscan
be expected here in the near future.
5. Challenges in Enzyme Design
The field of computational chemistry and biology hasexperienced
significant advances through the development ofnew algorithms and
hardware—but equally important,through an increase and
solidificat