Top Banner
ATLAS of Biochemistry: A Repository of All Possible Biochemical Reactions for Synthetic Biology and Metabolic Engineering Studies Noushin Hadadi, Jasmin Hafner, Adrian Shajkofci, Aikaterini Zisaki, and Vassily Hatzimanikatis* Laboratory of Computational Systems Biotechnology (LCSB), Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland * S Supporting Information ABSTRACT: Because the complexity of metabolism cannot be intuitively understood or analyzed, computational methods are indispensable for studying biochemistry and deepening our understanding of cellular metabolism to promote new discoveries. We used the computational framework BNICE.ch along with cheminformatic tools to assemble the whole theoretical reactome from the known metabolome through expansion of the known biochemistry presented in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. We constructed the ATLAS of Biochemistry, a database of all theoretical biochemical reactions based on known biochemical principles and compounds. ATLAS includes more than 130 000 hypothetical enzymatic reactions that connect two or more KEGG metabolites through novel enzymatic reactions that have never been reported to occur in living organisms. Moreover, ATLAS reactions integrate 42% of KEGG metabolites that are not currently present in any KEGG reaction into one or more novel enzymatic reactions. The generated repository of information is organized in a Web-based database (http://lcsb-databases.ep.ch/atlas/) that allows the user to search for all possible routes from any substrate compound to any product. The resulting pathways involve known and novel enzymatic steps that may indicate unidentied enzymatic activities and provide potential targets for protein engineering. Our approach of introducing novel biochemistry into pathway design and associated databases will be important for synthetic biology and metabolic engineering. KEYWORDS: generalized enzyme reaction rules, novel reaction prediction, metabolite integration, automated network reconstruction, metabolic engineering T he revolutionary ongoing sequencing of whole genomes is yielding tremendous amounts of biological data 1 for revealing new biochemical reactions and cellular processes. In addition, with increasing progress in analytical and instrumental techniques in metabolomics studies, many new compounds and metabolites are being identied in biological systems. 2 However, because the origins and biological functions of the majority of characterized metabolites remain unknown, we must design tools for functional interpretation to map them within the context of metabolic pathways. Another key challenge is to determine the roles of newly identied genes and their associated proteins in an organism. 3-5 To take full advantage of the wealth of generated information, it is of importance to annotate and categorize predicted proteins, identify their biochemical functionalities, and integrate them into their corresponding pathways. 6 To address these challenges, we propose a computational approach with several design elements to gain a systems-level comprehension of the enzymatic reactions across all species and identify all of the theoretically possible enzymatic reactions based on known biochemistry. The Kyoto Encyclopedia of Genes and Genomes (KEGG) database, 7 which we consider to be the reference for known biochemical reactions and metabolites, is one of the most complete repositories of metabolic data. This manually curated knowledge base systematically organizes and visualizes the experimental knowledge of biological systems in an interactive and computable form. 8,9 The KEGG database comprises 15 main databases that categorize biological data at three dierent levels: genomic information, systems information, and bio- chemical information. 10 The KEGG database was created in 1995, and increasing numbers of genes, metabolites, and enzymatic reactions are included in each annual release (Figure 1). The rapid growth of the KEGG database and the results of our study illustrate the abundance of as-of-yet uncharacterized enzymatic reactions in nature, indicating remarkable potential for the discovery of new metabolic functionalities. This knowledge gap calls for the design of computational tools that harness the enzymatic capabilities of known enzymes, aiming to explore all of the Special Issue: Synthetic Biology in Europe Received: February 8, 2016 Published: July 12, 2016 Research Article pubs.acs.org/synthbio © 2016 American Chemical Society 1155 DOI: 10.1021/acssynbio.6b00054 ACS Synth. Biol. 2016, 5, 1155-1166 Downloaded via UNIV NACIONAL AUTONOMA MEXICO on May 6, 2019 at 16:41:14 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
12

ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

Aug 14, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

ATLAS of Biochemistry: A Repository of All Possible BiochemicalReactions for Synthetic Biology and Metabolic Engineering StudiesNoushin Hadadi, Jasmin Hafner, Adrian Shajkofci, Aikaterini Zisaki, and Vassily Hatzimanikatis*

Laboratory of Computational Systems Biotechnology (LCSB), Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne,Switzerland

*S Supporting Information

ABSTRACT: Because the complexity of metabolism cannot beintuitively understood or analyzed, computational methods areindispensable for studying biochemistry and deepening ourunderstanding of cellular metabolism to promote new discoveries.We used the computational framework BNICE.ch along withcheminformatic tools to assemble the whole theoretical reactomefrom the known metabolome through expansion of the knownbiochemistry presented in the Kyoto Encyclopedia of Genes andGenomes (KEGG) database. We constructed the ATLAS ofBiochemistry, a database of all theoretical biochemical reactionsbased on known biochemical principles and compounds. ATLASincludes more than 130 000 hypothetical enzymatic reactions that connect two or more KEGG metabolites through novelenzymatic reactions that have never been reported to occur in living organisms. Moreover, ATLAS reactions integrate 42% ofKEGG metabolites that are not currently present in any KEGG reaction into one or more novel enzymatic reactions. Thegenerated repository of information is organized in a Web-based database (http://lcsb-databases.epfl.ch/atlas/) that allows theuser to search for all possible routes from any substrate compound to any product. The resulting pathways involve known andnovel enzymatic steps that may indicate unidentified enzymatic activities and provide potential targets for protein engineering.Our approach of introducing novel biochemistry into pathway design and associated databases will be important for syntheticbiology and metabolic engineering.

KEYWORDS: generalized enzyme reaction rules, novel reaction prediction, metabolite integration, automated network reconstruction,metabolic engineering

The revolutionary ongoing sequencing of whole genomes isyielding tremendous amounts of biological data1 for

revealing new biochemical reactions and cellular processes. Inaddition, with increasing progress in analytical and instrumentaltechniques in metabolomics studies, many new compounds andmetabolites are being identified in biological systems.2

However, because the origins and biological functions of themajority of characterized metabolites remain unknown, wemust design tools for functional interpretation to map themwithin the context of metabolic pathways. Another keychallenge is to determine the roles of newly identified genesand their associated proteins in an organism.3−5 To take fulladvantage of the wealth of generated information, it is ofimportance to annotate and categorize predicted proteins,identify their biochemical functionalities, and integrate theminto their corresponding pathways.6

To address these challenges, we propose a computationalapproach with several design elements to gain a systems-levelcomprehension of the enzymatic reactions across all species andidentify all of the theoretically possible enzymatic reactionsbased on known biochemistry.The Kyoto Encyclopedia of Genes and Genomes (KEGG)

database,7 which we consider to be the reference for known

biochemical reactions and metabolites, is one of the mostcomplete repositories of metabolic data. This manually curatedknowledge base systematically organizes and visualizes theexperimental knowledge of biological systems in an interactiveand computable form.8,9 The KEGG database comprises 15main databases that categorize biological data at three differentlevels: genomic information, systems information, and bio-chemical information.10

The KEGG database was created in 1995, and increasingnumbers of genes, metabolites, and enzymatic reactions areincluded in each annual release (Figure 1). The rapid growth ofthe KEGG database and the results of our study illustrate theabundance of as-of-yet uncharacterized enzymatic reactions innature, indicating remarkable potential for the discovery of newmetabolic functionalities. This knowledge gap calls for thedesign of computational tools that harness the enzymaticcapabilities of known enzymes, aiming to explore all of the

Special Issue: Synthetic Biology in Europe

Received: February 8, 2016Published: July 12, 2016

Research Article

pubs.acs.org/synthbio

© 2016 American Chemical Society 1155 DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

Dow

nloa

ded

via

UN

IV N

AC

ION

AL

AU

TO

NO

MA

ME

XIC

O o

n M

ay 6

, 201

9 at

16:

41:1

4 (U

TC

).

See

http

s://p

ubs.

acs.

org/

shar

ingg

uide

lines

for

opt

ions

on

how

to le

gitim

atel

y sh

are

publ

ishe

d ar

ticle

s.

Page 2: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

theoretically possible biochemical reactions between knownmetabolites.Another interesting observation is the divergence between

the numbers of newly added reactions and compounds inrecent years. One possible explanation is that the recentadvances in analytical techniques in metabolomic and lipidomicexperiments have characterized many metabolites in biologicalsamples. Another challenge raised by the advances inmetabolomics is identifying the biological functions of novelcompounds and integrating them into biological pathways.We believe that the best way to predict or design a novel

enzymatic biotransformation is to learn from known bio-chemical reactions and, accordingly, to derive the biotransfor-mation rules that govern known enzyme chemistry in nature.11

Several computational tools exist for predicting novel biologicalreactions that rely on the concept of “generalized biochemicalrules”.12−16 These tools differ in their scopes and numbers ofgeneralized rules and curating procedures, and their perform-ances can be compared on the basis of the numbers of knownenzymatic reactions that they can reconstruct. Distinctions incapabilities and applications of these methods have recentlybeen extensively reviewed and evaluated by Hadadi andHatzimanikatis.17 The generalized biochemical rules areformulated on the basis of known biochemistry; they simulatethe biotransformations that occur in enzymatic reactions andare designed in a generalized manner to act on a broader rangeof substrates rather than the native ones. Therefore, applyingthese rules to known metabolites can computationallyreconstruct known enzymatic reactions and predict enzymaticreactions that have never been observed in nature but are likelyfeasible because their reaction mechanisms are based on knownbiochemistry.The Biochemical Network Integrated Computational Ex-

plorer (BNICE.ch) has been under development for the last 15years and is one of the first methods for discovering andcharacterizing new enzymatic reactions based on knownbiochemistry.12,18 The latest version of BNICE.ch has distilledthe cataloged biochemistry from the KEGG database into 361bidirectional generalized reaction rules that recapitulate thebiochemical actions of more than 6500 KEGG reactions. TheBNICE.ch framework has previously been effectively applied inseveral studies to explore retrobiosynthetic routes for thebiosynthesis of different chemicals,19 to identify novel

biodegradation pathways for xenobiotics,20,21 to investigatelipid metabolism,22 and to design novel enzymatic reactions,pathways, and intermediate metabolites.17,18,23,24

In the present work, we applied BNICE.ch for the first timeto all of the biological compounds reported in the KEGGdatabase to determine how the known biochemistry ofestablished enzymatic reactions evolves if we use thegeneralized enzyme reaction rules of BNICE.ch over all ofthe KEGG compounds, thus exploring the possible space ofmetabolic reactions that potentially can be found in nature. Indoing so, we reconstructed all of the known and reproducibleKEGG reactions (as of March 2014) and discovered more than130 000 novel enzymatic reactions between two or moreknown KEGG compounds. We have organized these resultsinto a Web-based database named ATLAS of Biochemistry(shortened to “ATLAS” in the rest of this article) thatcomprises all of the known and hypothetically possibleenzymatic reactions between any two (or more) KEGGcompounds. We have performed several further analyses onthe generated de novo reactions, such as thermodynamicfeasibility studies and comparative structural analyses betweenthe newly discovered and previously known KEGG reactions.To the best of our knowledge, there is no available database

that accounts for all of the theoretically possible enzymaticreactions that connect known biological compounds on thebasis of known biochemistry. Nevertheless, such information isof great interest not only to fill the knowledge gaps in metabolicnetworks but also for synthetic biology and metabolicengineering, in which the discovery of novel reactions fordesigning de novo biosynthesis pathways is an importantmission.Furthermore, we discovered that approximately 65% of the

newly added KEGG reactions in 2015 already exist in ATLASas novel reactions. This finding validates the consistency of ourgeneralized reaction rules with acknowledged biochemistry anddemonstrates that our proposed hypothetical enzymaticreactions are an important complement to known catalogedbiochemistry. Every newly discovered reaction that is alreadypart of our ATLAS database confirms the biochemical validityand importance of our method.

■ RESULTS AND DISCUSSIONWe have generated a repository of all possible enzymaticreactions between KEGG compounds, indicating that there is ahuge potential for biological compounds to be interconvertedby relatively few reaction mechanisms. This collection is avaluable source of information for biological and biochemicalstudies, and its characteristics and significance will be presentedand discussed in the following sections. ATLAS can beconsulted at the Web site http://lcsb-databases.epfl.ch/atlas/and is free for academic use upon subscription.

ATLAS of Biochemistry: Known and Novel Reactions.Network Generation and Exact Reconstruction of KEGGReactions. Database preprocessing allows us to assess thequality of the information in the KEGG database and ensuresthat KEGG compounds and reactions fulfill the minimumrequirements for BNICE.ch (more details can be found inTable 2 in Methods). In the first step of our analysis, we appliedthe 720 generalized reaction rules to the 16 798 compoundswith identified two-dimensional (2D) structures in KEGG2015, resulting in 137 877 generated reactions between KEGGcompounds. Next, we screened the generated reactions againstthe KEGG 2015 reaction database to distinguish between

Figure 1. Growth of the KEGG compounds and reactions databasesover the last 16 years.

ACS Synthetic Biology Research Article

DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

1156

Page 3: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

“known” and “novel” KEGG reactions. We identified 5270reconstructed KEGG reactions that appeared exactly asdescribed in KEGG, and the remaining majority of the132 607 enzymatic reactions interconnected KEGG com-pounds through novel reactions.Pathway Search and Biotransformation Reconstruction.

Of the 8959 preprocessed KEGG 2015 reaction databaseentries (Table 1), 5270 were reconstructed exactly as theyappeared in the KEGG database. We revisited the 3689reactions that were not reconstructed and performed thepathway reconstruction algorithm from their substrate(s) totheir product(s) within ATLAS. As a benchmark, we searchedfor pathways of lengths 1, 2, and 3 to investigate whether thebiotransformation of these reactions can be reconstructedthrough an alternative reaction mechanism. This study resultedin reconstruction of the biotransformation (as opposed to exactmechanism reconstruction) of 1381 KEGG reactions (37% ofthe 3689 reactions); 916 KEGG reactions were apprehended inone step, 387 KEGG reactions were generated in twosequential reaction steps, and 78 KEGG reactions requiredthree sequential reaction steps. The user can perform the sameanalysis on the Web site, choosing any pathway length toconnect any two desired compounds. Table 1 provides anoverview of the KEGG reaction statistics and summarizes theresults of the exact and biotransformation reconstructionapproaches. The lists of curated KEGG reactions in eachapproach are provided in the Table A in the SupportingInformation.The following examples demonstrate the importance of the

results obtained from the pathway search and how these resultscan clarify the reaction mechanisms of KEGG reactions that arenot (yet) fully characterized.Example 1: One-Step Reconstructed Biotransformation.

Out of the 916 KEGG reactions that were reconstructed in onebiotransformation step, 43 are annotated in KEGG with thecomment “enzyme not yet characterized.” As Figure 2illustrates, BNICE.ch also proposes a three-level Enzymatic

Commission (EC) number for such reactions, which can guidethe identification of the exact mechanisms underlying thesereactions. The complete list of this class of reconstructedKEGG reactions (biotransformation reconstruction in one-step) is available on the Web site. Furthermore, the user canperform this analysis and retrieve the demonstrated results.

Example 2: Multistep Reconstructed Biotransformation.The biotransformations of 387 KEGG reactions are recon-structed in two-step reaction mechanisms using ATLASreactions, and for 78 KEGG reactions the biotransformationis captured by three consecutive ATLAS reactions. In both

Table 1. Statistics of Reactions for Three Different Sections: (i) Preprocessing Step; (ii) KEGG Reaction Reconstruction (ExactReconstruction or Biotransformation Reconstruction); (iii) Generation of ATLAS Reactionsa

aThe reference database for creating the generalized reaction rules was the KEGG 2014 reaction database, whereas all of the reconstructions weredone using the KEGG 2015 database (compounds and reactions). The last column shows the analysis of the reactions that appeared in KEGG 2015.Remarkably, 65% of the qualified reactions that were introduced in 2015 (665 reactions) were already part of ATLAS based on KEGG 2014reactions.

Figure 2. (A) Example of a KEGG reaction (R06580) with missingenzyme characterization. (B) Three alternative ATLAS reactions(suggesting different pairs of cofactors) to recover the biotransforma-tion of R06580. The ATLAS reactions provide the EC classificationnumbers up to the third level (1.14.13.- or 1.14.16.-), which areessential for the enzyme characterization.

ACS Synthetic Biology Research Article

DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

1157

Page 4: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

cases, most of these reactions are categorized as “multistep”reactions in the KEGG database, with missing informationregarding their mechanisms (e.g., the individual reaction steps).Figures 3 and 4 present examples of two- and three-step cases,respectively, and the complete information on this class ofreconstructed KEGG reactions is provided on the Web site.Validation of the Generalized Reaction Rules. The reference

reactions that were used to formulate the generalized reactionrules of BNICE.ch were from the KEGG 2014 reactiondatabase (see Methods). Thus, comparing the generated novel

ATLAS reactions to the reactions that were added in 2015 tothe KEGG database helps us investigate the consistency of thegenerated hypothetical reactions with known biochemistry.Moreover, through this analysis we can assess the predictivecharacteristics of our method for creating novel enzymaticreactions that may become known reactions in the future.KEGG introduced 776 additional reactions in 2015 relative

to the 2014 version (Table 1). We performed the samepreprocessing procedures as explained in Table 2 for the 776reactions and after the automatic preprocessing determined that

Figure 3. Reconstruction of a biotransformation of an example KEGG reaction (R03359) with an unknown reaction mechanism as two consecutiveATLAS reactions. (A) Available information for R03359 in the KEGG database, indicating a multistep mechanism without further characterization ofthe reaction steps and intermediates. (B) Proposed sets of two consecutive ATLAS reactions that allow the uncharacterized mechanism of R03359 tobe captured via two different KEGG intermediates. In the first proposed two-step mechanism (upper part), our results suggest several alternativereaction mechanisms for each step (different pairs of cofactors). For the second proposed two-step mechanism (lower part), we found severalalternative reaction mechanisms for the first reaction step and a single reaction mechanism for the second reaction step.

Figure 4. (A) Available information for R07949 in the KEGG database, indicating a multistep mechanism without further characterization of thereaction steps. (B) For reconstructing the biotransformation of R07949, ATLAS proposes three consecutive reaction steps involving twointermediate KEGG compounds. Interestingly, the second step is a KEGG reaction, and also, our results propose two different alternative reactionsfor the last step, one KEGG reaction and one ATLAS reaction.

ACS Synthetic Biology Research Article

DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

1158

Page 5: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

665 of these reactions have the aforementioned requiredcriteria. Remarkably, 354 of these reactions were predicted inATLAS as novel reactions before they were incorporated intothe KEGG database. Additionally, through the pathway searchalgorithm, we reconstructed the biotransformations of 61 of theremaining reactions in one-step mechanisms, 17 in two-stepmechanisms, and four in three-step mechanisms (Table 1). Thelist of 776 reactions and information regarding theirreconstructions are provided in Table B in the SupportingInformation.Integration of KEGG Compounds in de Novo Reactions.

Fifty-six percent of the 16 798 KEGG compounds with definedstructures (9371 compounds) do not participate in any KEGGreaction. As such, these compounds are not connected to anyother compound in the KEGG database (disconnectedmetabolites).An important outcome of this study is the integration of

3945 of the 9371 KEGG compounds (42%) that are not part ofany reaction in the KEGG database into at least one novelenzymatic reaction. The participation of these disconnectedKEGG metabolites in ATLAS reactions ranges from 1 to 708reactions (Figure 5). This finding validates the potential of ourmethodology for future applications to explain the origin ofmetabolomics data and their integration into metabolicpathways. The list of KEGG compounds that have been

integrated into novel reactions together with the number ofreactions for each compound is provided in Table C in theSupporting Information.

Reconsideration of Biochemistry in the Context ofChemical Knowledge. After reconstruction of the exactmechanisms of the biotransformations of 6651 KEGG reactionsthrough first- and second-level analyses, 2308 out of the 8959KEGG 2015 reactions remained non-reconstructed (Table 1).For these reactions, we investigated whether their biotransfor-mations could be reconstructed by BNICE.ch when we allowedchemical compounds from PubChem to be used asintermediates in the network generation and reactionreconstruction. We first identified the substrate(s) of eachreaction and performed BNICE.ch using all of the generalizedreaction rules for two iterations. In each iteration, we allowedboth KEGG and PubChem compounds in the generatednetwork. We next performed pathway searches as described inthe previous section and reconstructed the biotransformationsof 100 additional KEGG reactions that include at least onePubChem intermediate between the original substrates andproducts (Figure 6 and Table A in the SupportingInformation). These results further suggest that the reasonwhy the mechanisms of several biological reactions are not yetcharacterized is a lack of information at the metabolite level(unknown intermediates).

Figure 5. Disconnected KEGG compounds and their appearance in novel reactions of ATLAS. The two extreme cases at the right and left ends ofthe graph show that 460 compounds that are not in any KEGG reaction participated in at least 1 novel reaction in ATLAS and that 190 compoundswere integrated in numerous novel reactions ranging from 101 to 708 novel reactions.

Figure 6. (A) Example of a not well characterized KEGG reaction (R05540) than can be reconstructed in two consecutive BNICE.ch reactions; noEC classification or other biochemical and biological information is provided in the database. (B) BNICE.ch reconstruction of the uncharacterizedreaction using a PubChem compound as an intermediate metabolite.

ACS Synthetic Biology Research Article

DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

1159

Page 6: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

We further investigated in detail the remaining 2208 non-reconstructed reactions. These reactions are either from 2014or appeared only in KEGG 2015. In the case of the 2014reactions, the primary reasons why these reactions were notused for the formulation of the generalized reaction rules andwere therefore not reconstructed by BNICE.ch are presented inTable 2 in Methods. The 2015 reactions will require furtheranalysis to investigate whether new generalized reaction rulescan be formulated to capture their mechanisms (Table 2 andTable B in the Supporting Information).Furthermore, we investigated the 5385 KEGG compounds

that were not integrated into any ATLAS reaction and thereforeremained disconnected from any other KEGG metabolite.Specifically, we examined whether each of these compoundscould be connected to any other metabolite by the use ofPubChem compounds in the BNICE.ch network generationstep. Notably, 3641 of these compounds (68%) were integratedinto at least one reaction that involves both KEGG andPubChem compounds (Figure 7 and Table C in the SupportingInformation).The reconstruction of KEGG reactions and the integration of

KEGG compounds into enzymatic reactions can be increasedby considering compounds from chemical databases such asPubChem. Thus, many compounds that are currently only

known to chemical databases can potentially participate inmetabolism and therefore could be included in biologicaldatabases. Moreover, the existence of these intermediates in thePubChem database suggests that they can be chemicallydetected and that their absence from biological databases couldbe due to several reasons, e.g., their instability in a biologicalenvironment.

EC Number Analysis. BNICE.ch either exactly or partiallyreconstructed the mechanisms of 6528 KEGG reactions alongwith a repository of 132 607 novel reactions and proposedpotential third-level EC numbers for all of the reactions that itgenerates. Each reconstructed KEGG reaction is thereforeannotated with one (or several alternative) EC numbers thatare either consistent with or different from the EC classificationthat KEGG proposes.Several KEGG reactions lack information regarding EC

classification and can be categorized on the basis of theiravailable EC information, e.g., full (fourth-level) EC number,partial EC number (can be first-, second-, or third-level), or noEC number at all (Figure 8). Notably, for 60% of these KEGGreactions lacking complete EC information, we propose third-level EC annotation. The information regarding the ECannotation for reconstructed KEGG and novel reactions isavailable on the Web site.

Figure 7. The KEGG compound aspartame, without any information regarding its corresponding metabolic reactions, reacts in silico with severalgeneralized reaction rules using different pairs of cofactors. The reactions result in four combinations of one KEGG product and one PubChemproduct.

ACS Synthetic Biology Research Article

DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

1160

Page 7: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

Thermodynamic Analysis. The standard Gibbs free energyof reaction (ΔrG′°) is estimated for all of the reactionsgenerated by BNICE.ch using the group contribution method(GCM).25 GCM decomposes the compounds into severalpredefined groups with their corresponding Gibbs free energiesof formation (ΔfG′°) and estimates the ΔfG′° values for all ofthe compounds on the basis of the group values. In some cases,the decomposition of compounds results in groups without acorresponding free energy in the GCM database, in which caseGCM cannot estimate the ΔfG′° value. ΔrG′° is furthercalculated from the estimated ΔfG′° values for the compounds.In this study, GCM reported ΔrG′° for 76% of the KEGGreactions and 39% of the novel ATLAS reactions.To gain more insight into the novel reactions, we performed

comparative analyses of the KEGG and novel ATLAS reactionswith respect to the EC number distribution and the ranges ofthe estimated ΔrG′° values. When we compared thedistributions of estimated ΔrG′° for the known and novelreactions, we observed that the estimated ΔrG′° values for thenovel reactions fall within the same range as those of the KEGGreactions for the same EC class (Figure 9). Moreover, in thereaction distribution along the six EC classes, we observed that

a large portion of novel reactions are transferases, which mightindicate an underrepresentation of transferase activity inKEGG.

BridgIT Analysis of Novel Reactions. We performedBridgIT analysis to compare the structural similarity of the132 607 ATLAS novel reactions predicted by BNICE.ch withknown KEGG reactions. The results are available on the Website (http://lcsb-databases.epfl.ch/atlas/) and include for eachnovel reaction the structurally most similar KEGG reactionwith its Tanimoto similarity score. The information regardingthe closest KEGG reaction is provided as a link to the KEGGdatabase, which directs users to the KEGG Web site for thegiven KEGG reaction. This design allows users to attain otheruseful information such as genes, organisms, and pathways thatare provided by KEGG for that reaction. The association ofnovel reactions with gene sequences is crucial for metabolicengineering purposes and gap filling in metabolic networks.

Description of the Online Database. The ATLASinformation on the Web site is organized in two tables:“BNICE.ch curated KEGG reactions” and “BNICE.ch ATLASreactions.” The first table, “BNICE.ch curated KEGGreactions”, lists each BNICE.ch-curated KEGG reactiontogether with its KEGG reaction ID, reaction equation, enzymename, and EC number. Furthermore, each KEGG reaction isannotated with information regarding the correspondingBNICE.ch reaction rule (third-level EC number fromBNICE.ch), the Gibbs free energy of reaction, and informationregarding how the reaction is reconstructed in ATLAS (eitherthe exact KEGG reaction reconstruction or the biotransforma-tion reconstruction). The second table, “BNICE.ch ATLASreactions”, indexes the ensemble of reactions generated byBNICE.ch, comprising KEGG and novel reactions. Theinformation in this table is organized in the same way as inthe previous table but presents BridgIT results for all of thenovel reactions.The user-friendly interface allows users to search for

keywords, sort the entries of columns, and export the tablesin the desired format. The user can also replicate all of thereported examples of biotransformations via the integrated“pathway search tool”. Furthermore, one can query for all of the

Figure 8. In KEGG, the information regarding EC classification rangesfrom no EC assignment to complete EC classif ication. Since BNICE.chproposes EC numbers up to the third level for reconstructed KEGGreactions, we could improve the EC information for 732 KEGGreactions with missing or incomplete EC classification.

Figure 9. Distributions of the Gibbs free energy of reaction for each type of first-level EC classification (classes 1 to 6). Red bars represent KEGGreactions, and blue bars represent ATLAS reactions.

ACS Synthetic Biology Research Article

DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

1161

Page 8: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

possible pathways (one, two, or three steps or longer) betweenthe substrate(s) and product(s) of any specified KEGGreaction and obtain all of the alternative biotransformationsthat ATLAS proposes for that KEGG reaction. One possibleapplication of this tool is gap filling in metabolic networks, inwhich the user introduces the two disconnected metabolites inthe pathway search tool and the maximum desired pathwaylength. The output is a list of all possible pathways between thetwo metabolites sorted by pathway length, with informationregarding the intermediate metabolites and the reaction IDs(KEGG or novel reactions) catalyzing each reaction step. Adetailed visualization of the generated pathways is available viathe “Graph” button, including information regarding cofactorutilization and the Gibbs free energy of the reaction.Additionally, for novel reactions we provide the BridgITresults. The second integrated tool on the Web site is the“connectivity map”, which loads all of the reactions in ATLASfor any given KEGG compound. The user introduces theKEGG compound ID or the compound name and a maximalnumber of desired enzymatic steps, and the underlyingalgorithm generates a map of all of the enzymatically connectedcompounds and reactions.

■ METHODS

To explore the biochemistry of enzymatic reactions, we (i) usedthe KEGG compound and reaction databases as a reference, (ii)preprocessed the KEGG compound and reaction information,(iii) applied the BNICE.ch framework to generate all of thetheoretically possible biochemical reactions between the KEGGcompounds, and (iv) performed several other complementaryanalyses to assess the closeness of the generated information toknown biochemistry. In the following sections, we discuss thedifferent elements of our approach. An outline of the workflowis presented in Figure 10.

1. Preprocessing of the KEGG Compound andReaction Databases. We used the KEGG reaction database(2014 and earlier) as a reference for developing the generalizedreaction rules. Prior to this step, we preprocessed the KEGGreactions on the basis of several criteria to assess the integrity ofthe cataloged reactions to be used as a reference for theformulation of reaction rules. For certain reactions, theinformation provided in KEGG is not complete; hence, thosereactions are not reconstructable using BNICE.ch and had tobe excluded from further analyses. Our reaction preprocessingstep was accomplished in two levels (Table 2):

Figure 10. Workflow for generating ATLAS, which is divided into three major blocks. Block I includes the process of database preparation and datageneration. Block II contains the different analyses of the generated data. Block III corresponds to the publicly available database “ATLAS ofBiochemistry”, with two sections “BNICE.ch curated KEGG reactions” and “BNICE.ch ATLAS reactions”. The numbers (from 1 to 8) indicate thecorresponding subsections in Methods.

ACS Synthetic Biology Research Article

DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

1162

Page 9: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

(i) Automatic preprocessing. We introduced a list ofpredefined criteria to be used as a metric for filteringthe reactions that do not fulfill these criteria. In this firststep, we excluded reactions involving compounds withundefined molecular structures, e.g., compounds withouta structural file (molfile). Furthermore, we excludedreactions that change only the stereochemistry of themolecules (e.g., racemase and epimerase) becauseBNICE.ch does not constrain the stereochemistry ofmolecules. However, BNICE.ch can reformat the resultsto introduce and analyze the molecules on the basis ofstereochemistry.24

(ii) Manual quality control step. We manually analyzed theoutput of the first preprocessing step reaction byreaction. In this second level of preprocessing, weperformed manual quality control of the remainingreactions to identify well-characterized enzymatic reac-tions with detailed information regarding their reactionmechanisms that are eligible to be used as references fordeveloping the generalized reaction rules. Some examplesof types of reactions that were excluded include reactionswithout EC numbers and incomplete and unbalancedreactions.

We considered the outcome of the first step to be a databaseof biochemical reactions that we could use to test thefunctionality of our generalized reaction rules and the outcome

of the second step to be a reference set of well-explainedenzymatic reactions for developing the generalized reactionrules.To generate ATLAS, we applied the generalized reaction

rules to the KEGG 2015 compound database. Because we usedthe KEGG 2014 reaction database as a reference for developinggeneralized reaction rules, applying these rules to the KEGG2015 compound database helped us investigate the predictiveproperty of our proposed method. Therefore, we couldexamine whether the generalized reaction rules could predicta reaction as novel on the basis of the KEGG 2014 database thatbecame known in KEGG 2015.Before applying the reaction rules to the KEGG compounds,

we performed a preprocessing step to evaluate the quality of thecompound information. Table 3 summarizes the criteria that weconsidered for preprocessing and excluding KEGG compoundsand the ensuing results.After preprocessing of the KEGG compound and reaction

databases with the mentioned criteria, we applied the 361bidirectional generalized reaction rules incorporated intoBNICE.ch to the 16 798 KEGG compounds with defined 2Dstructures. The list of KEGG reactions in the reference databaseand the list of preprocessed KEGG compounds can be found inTable D in the Supporting Information.

BNICE.ch. The BNICE.ch framework involves two maincomponents: (i) a database of generalized enzymatic reactionrules and (ii) a network generation algorithm. The output of

Table 2. Two-Level Preprocessing of KEGG Reactions To Exclude Reactions That Did Not Fulfill the Minimum Requirementsfor Formulating the Generalized Reaction Rules

Table 3. In KEGG 2015, 655 out of 17 453 Compounds Did Not Have a Defined Structure and Were Therefore Excluded fromFurther Analyses in This Work; On the Other Hand, We Identified 1729 Entries for Compounds Having More than OneCompound, with Their Only Difference Being Stereochemistry

ACS Synthetic Biology Research Article

DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

1163

Page 10: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

BNICE.ch is subject to screening against biological andchemical databases, which are chosen on the basis of the typeand objectives of the study. Furthermore, the groupcontribution method25 for the estimation of the Gibbs freeenergies of formation and reaction is integrated into theBNICE.ch framework, and the energetic properties arecalculated and reported for all of the generated compoundsand reactions. These methods allow us to evaluate thethermodynamic feasibility of the generated information.2. BNICE.ch: Generalized Enzymatic Reaction Rules. The

enzymatic reaction rules constitute the backbone of theBNICE.ch framework. These rules are based on the followingconcept: a given enzyme is expected to recognize substratesother than the native ones that share the same reactive site(s)and a similar neighboring structure and catalyze (or evolve tocatalyze) the same biotransformation for the non-nativesubstrate. BNICE.ch reaction rules are named and classifiedon the basis of the Enzymatic Commission (EC) system.26

BNICE.ch rules describe the biochemistry of the reaction andthe reactive site(s) of a substrate rather than the specificstructure of the entire molecule and therefore group enzymaticreactions that involve similar chemistry. Because the rules canact on different substrates, they not only reconstruct the specificnative reactions for which they have been designed but alsopostulate many more novel reactions. Every novel reactiongenerated with BNICE.ch is associated with a third-level ECnumber and is attributed to a biochemically relevant reactionmechanism.If a specific KEGG reaction can be replicated using a

generalized reaction rule, we denote the KEGG reaction asbeing “reconstructed” by BNICE.ch. The percentage of KEGGreactions that can be reconstructed with BNICE.ch is called the“coverage”, which is an important indicator of the performanceof our method and makes it distinguishable from other similartools. It should be noted that the coverage is a moving targetbecause it depends on the actual number of “known KEGGreactions”, which increases every year.BNICE.ch currently incorporates 361 bidirectional general-

ized reaction rules (or 722 forward and reverse rules). Moredetails regarding the procedure for developing the generalizedreaction rules are provided in previous publications on theBNICE.ch framework.11,12,19,22,27

3. BNICE.ch: Network Generation Algorithm. BNICE.chemploys an automated network generation algorithm thatworks in an iterative manner. Starting with a set of inputmolecule(s), cofactors, and generalized reaction rules, thealgorithm proceeds as follows:

(1) Every molecule is checked for reactivity, i.e., it isevaluated to find whether it has the appropriate reactivesites (functionalities) to undergo the reactions corre-sponding to the specified list of reaction rules.

(2) Upon acting on a molecule, the generalized reaction rulesrecognize the reactive sites of the molecule and apply thebiotransformation in which the atoms and bonds arerearranged to form the product.

(3) Next, all of the reactants are placed in a “reacted” list, andall of the products from these reactants are placed in an“unreacted” list if they are molecules that have not beenspecified or generated previously. This completes the firststep and is defined as “iteration 1”.

(4) Each molecule in the unreacted list is checked for itsreactivity, the reaction rules are applied, and new reactedand unreacted lists are created for “iteration 2”.

(5) The procedure is repeated iteratively, and an iterationcount is maintained as new molecules are created,keeping track of the iteration number of each species,which corresponds to the number of steps required tocreate a given product from the original reactant(s).

To manage the exponentially growing number of products thatare created because of the combinatorial nature of the networkgeneration process, we have defined several input parametersthat govern the size of the generated reaction network andshould be specified by the user prior to running thealgorithm:17

(1) number of generalized reaction rules(2) number of iterations(3) databases to be included in the process of screening and

eventually filtering the generated compounds andreactions.

In this study, because we aimed to explore knownbiochemistry, we used all of the 361 bidirectional reactionrules and applied them in one iteration to all of the KEGGcompounds. With respect to database integration, we allowedonly KEGG compounds to be included in the generatedreaction network without any constraints on the generatedreactions.

4. Known and Novel Reactions: Biological and ChemicalDatabases Integrated. Another important component ofBNICE.ch is its interaction with the integrated biological andchemical databases, which allows us to screen our resultsagainst available information on compounds and reactions andinvestigate which portion of the obtained information(compounds and reactions) is already known and which portionis novel.The most important integrated sources of metabolic data in

BNICE.ch are the KEGG,7 SEED,28 MetaCyc,29 and ChEBIdatabases30 and PubChem,31 the most comprehensive availabledatabase for compound structures. In this study, KEGG wasused as our reference database for compounds (we allowedonly KEGG compounds in our results) and reactions (fordifferentiating between known and novel reactions). If areaction could be found in the KEGG reaction database, wedesignated it as known; otherwise, we designated it as novel.

Reconstruction of KEGG Reactions. We reconstructedthousands of known enzymatic KEGG reactions. We presentthis collection of reconstructed KEGG reactions as “BNICE.chcurated KEGG reactions” because we not only reconstructedthem as reported in KEGG but also annotated them with athird-level EC number (coming from BNICE.ch reaction rules).We also provide ΔrG′° values, which are not available in theKEGG database. The curation of known reactions is performedon two levels:

5. Exact Reconstruction of KEGG Reactions. Aftergenerating ATLAS, we screened all of the ATLAS reactionsagainst the KEGG reaction database. When we identified anexact match for a reaction, we categorized it as “exactlyreconstructed”.

6. Biotransformation Reconstruction. In a second step, forall of the reactions that were not exactly reconstructed aspresented in KEGG, we performed a pathway search analysis toinvestigate whether “biotransformation” of these reactionscould be reconstructed irrespective of their mechanisms and

ACS Synthetic Biology Research Article

DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

1164

Page 11: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

involved cofactors. Our algorithm identifies the substrate(s)and product(s) of each reaction and then performs a pathwaysearch within ATLAS to identify pathways of one, two, or threeenzymatic reaction steps that can replicate the biotransforma-tion of the reaction by connecting its substrate(s) to itsproduct(s). If a pathway between the substrate and the productof the reaction is found, the corresponding generalized reactionrule or a combination of two or three consecutive generalizedreaction rules can supposedly catalyze the KEGG reaction.7. BridgIT Analysis. To further analyze the generated

hypothetical reactions, we utilized the computational toolBridgIT32 to assess the structural similarity of the generatednovel reactions to KEGG reactions. BridgIT has recently beendeveloped in our group as a complementary tool to BNICE.chthat enables the quantification of the similarity of a novelreaction to a known KEGG reaction with respect to thestructures of their substrates and products. BridgIT translatesthe structural definition of the molecules that participate in areaction into a mathematical notation using compoundfingerprints, which are then used to create a so-called “reactionvector.” BridgIT has an integrated database of KEGG reactionvectors that allows it to compare the reaction vector of a givenreaction (for instance, a de novo reaction generated byBNICE.ch) with all of the KEGG reaction vectors using theTanimoto distance.33

Through assessment of the structural similarity betweennovel and known reactions, each novel reaction is assigned aTanimoto similarity score that quantifies its similarity to theexisting reactions. The Tanimoto score varies between 0 and 1,in which 1 indicates high similarity and 0 indicates no similarity.Using this score, we can further assign gene and proteinsequences to novel reactions, which can be useful inevolutionary protein engineering and computational proteindesign for the experimental implementation of the de novoreactions. We performed BridgIT for all of the novel reactionsin our database, and the results are available on the Web site.8. Reaction Characterization. The EC number is a

classification scheme for enzyme-catalyzed reactions thatdefines four levels of information, with the first level describingthe most general classification and the fourth level providinginformation on the substrates participating in the reaction. Thisglobally accepted enzyme nomenclature provides valuableinformation regarding the type of the enzyme-catalyzedreaction. All of the novel reactions of ATLAS are assigned athird-level EC number derived from their correspondinggeneralized reaction rules indicating their potential enzymaticgroup. Furthermore, we estimate the ΔrG′° values for theATLAS reactions, facilitating the evaluation of the thermody-namic feasibility of the reactions under standard conditions. Ifthe value of ΔrG′° is negative, the reaction is feasible in thedesignated direction. We have performed several comparativeanalyses using these two pieces of information to compare theknown KEGG reactions and the novel ATLAS reactions withrespect to the distributions of their EC numbers and ΔrG′°values.

■ CONCLUSIONSWe have introduced here the “ATLAS of Biochemistry”, a largecollection of novel reactions along with their EC identifiers upto the third level and candidate enzymes that can potentiallycatalyze these de novo reactions. ATLAS provides a valuableresource of information for those who build and analyzemetabolic models and for metabolic engineering projects and

synthetic biology studies directed toward finding novelbiosynthesis or biodegradation pathways.Moreover, ATLAS integrates 42% of the KEGG compounds

that are not currently part of any KEGG reaction into de novoenzymatic reactions. This high number of plausible reactionsinterconnecting KEGG compounds is a significant outcome,particularly because we could correctly predict 452 enzymaticreactions on the basis of the 2014 version of KEGG usingBNICE.ch that were identified as known reactions in the 2015version. The potential of the novel introduced biocatalyticreactions can be used by synthetic biology to drive discoveriesin biotechnology, medicine, and green chemistry. This studyalso illustrates that the current knowledge concerning metabolicreactions can be expanded dramatically and that furtherexperimental and computational effort is indispensable tocomplete our understanding of cellular metabolism.

■ ASSOCIATED CONTENT*S Supporting InformationThe Supporting Information is available free of charge on theACS Publications website at DOI: 10.1021/acssynbio.6b00054.

Tables A−D (XLSX)

■ AUTHOR INFORMATIONCorresponding Author*Tel: +41 21 693 9870. Fax: +41 21 693 9875. E-mail: [email protected] ContributionsV.H., N.H., A.Z., and J.H. designed the study; N.H. and J.H.performed the experiments; V.H., N.H., and J.H. analyzed thedata and wrote the manuscript. A.S. has developed the Web siteand its tools.NotesThe authors declare no competing financial interest.

■ ACKNOWLEDGMENTSThis work was supported by the Swiss National ScienceFoundation (SNF) and SystemsX.ch, the Swiss Initiative inSystems Biology.

■ REFERENCES(1) Pareek, C. S., Smoczynski, R., and Tretyn, A. (2011) Sequencingtechnologies and genome sequencing. J. Appl. Genet. 52, 413−435.(2) Sugimoto, M., Kawakami, M., Robert, M., Soga, T., and Tomita,M. (2012) Bioinformatics Tools for Mass Spectroscopy-BasedMetabolomic Data Processing and Analysis. Curr. Bioinf. 7, 96−108.(3) Eren, K., Deveci, M., Kucuktunc, O., and Catalyurek, U. V.(2013) A comparative analysis of biclustering algorithms for geneexpression data. Briefings Bioinf. 14, 279−292.(4) Brown, T. A. (2010) Gene Cloning and DNA Analysis: AnIntroduction, 6th ed., Wiley-Blackwell, Oxford, U.K.(5) Friedberg, I. (2006) Automated protein function prediction - thegenomic challenge. Briefings Bioinf. 7, 225−242.(6) Radivojac, P., Clark, W. T., Oron, T. R., Schnoes, A. M., Wittkop,T., Sokolov, A., Graim, K., Funk, C., Verspoor, K., Ben-Hur, A.,Pandey, G., Yunes, J. M., Talwalkar, A. S., Repo, S., Souza, M. L.,Piovesan, D., Casadio, R., Wang, Z., Cheng, J. L., Fang, H., Gough, J.,Koskinen, P., Toronen, P., Nokso-Koivisto, J., Holm, L., Cozzetto, D.,Buchan, D. W. A., Bryson, K., Jones, D. T., Limaye, B., Inamdar, H.,Datta, A., Manjari, S. K., Joshi, R., Chitale, M., Kihara, D., Lisewski, A.M., Erdin, S., Venner, E., Lichtarge, O., Rentzsch, R., Yang, H. X.,Romero, A. E., Bhat, P., Paccanaro, A., Hamp, T., Kassner, R.,Seemayer, S., Vicedo, E., Schaefer, C., Achten, D., Auer, F., Boehm, A.,Braun, T., Hecht, M., Heron, M., Honigschmid, P., Hopf, T. A.,

ACS Synthetic Biology Research Article

DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

1165

Page 12: ATLAS of Biochemistry: A Repository of All Possible Biochemical … · 2019-05-16 · novel enzymatic steps that may indicate unidentified enzymatic activities and provide potential

Kaufmann, S., Kiening, M., Krompass, D., Landerer, C., Mahlich, Y.,Roos, M., Bjorne, J., Salakoski, T., Wong, A., Shatkay, H., Gatzmann,F., Sommer, I., Wass, M. N., Sternberg, M. J. E., Skunca, N., Supek, F.,Bosnjak, M., Panov, P., Dzeroski, S., Smuc, T., Kourmpetis, Y. A. I.,van Dijk, A. D. J., ter Braak, C. J. F., Zhou, Y. P., Gong, Q. T., Dong, X.R., Tian, W. D., Falda, M., Fontana, P., Lavezzo, E., Di Camillo, B.,Toppo, S., Lan, L., Djuric, N., Guo, Y. H., Vucetic, S., Bairoch, A.,Linial, M., Babbitt, P. C., Brenner, S. E., Orengo, C., Rost, B., Mooney,S. D., and Friedberg, I. (2013) A large-scale evaluation ofcomputational protein function prediction. Nat. Methods 10, 221−227.(7) Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto Encyclopediaof Genes and Genomes. Nucleic Acids Res. 28, 27−30.(8) Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A. C., and Kanehisa,M. (2007) KAAS: an automatic genome annotation and pathwayreconstruction server. Nucleic Acids Res. 35, W182−185.(9) Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., andTanabe, M. (2016) KEGG as a reference resource for gene and proteinannotation. Nucleic Acids Res. 44, D457−D462.(10) Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M.,and Tanabe, M. (2014) Data, information, knowledge and principle:back to metabolism in KEGG. Nucleic Acids Res. 42, D199−205.(11) Hatzimanikatis, V., Li, C. H., Ionita, J. A., and Broadbelt, L. J.(2004) Metabolic networks: enzyme function and metabolitestructure. Curr. Opin. Struct. Biol. 14, 300−306.(12) Hatzimanikatis, V., Li, C. H., Ionita, J. A., Henry, C. S.,Jankowski, M. D., and Broadbelt, L. J. (2005) Exploring the diversity ofcomplex metabolic networks. Bioinformatics 21, 1603−1609.(13) Moriya, Y., Shigemizu, D., Hattori, M., Tokimatsu, T., Kotera,M., Goto, S., and Kanehisa, M. (2010) PathPred: an enzyme-catalyzedmetabolic pathway prediction server. Nucleic Acids Res. 38, W138−143.(14) Cho, A., Yun, H., Park, J. H., Lee, S. Y., and Park, S. (2010)Prediction of novel synthetic pathways for the production of desiredchemicals. BMC Syst. Biol. 4, 35.(15) Hou, B. K., Ellis, L. B., and Wackett, L. P. (2004) Encodingmicrobial metabolic logic: predicting biodegradation. J. Ind. Microbiol.Biotechnol. 31, 261−272.(16) Ellis, L. B., Gao, J., Fenner, K., and Wackett, L. P. (2008) TheUniversity of Minnesota pathway prediction system: predictingmetabolic logic. Nucleic Acids Res. 36, W427−432.(17) Hadadi, N., and Hatzimanikatis, V. (2015) Design ofcomputational retrobiosynthesis tools for the design of de novosynthetic pathways. Curr. Opin. Chem. Biol. 28, 99−104.(18) Soh, K. C., and Hatzimanikatis, V. (2010) DREAMS ofmetabolism. Trends Biotechnol. 28, 501−508.(19) Henry, C. S., Broadbelt, L. J., and Hatzimanikatis, V. (2010)Discovery and Analysis of Novel Metabolic Pathways for theBiosynthesis of Industrial Chemicals: 3-Hydroxypropanoate. Bio-technol. Bioeng. 106, 462−473.(20) Finley, S. D., Broadbelt, L. J., and Hatzimanikatis, V. (2010) Insilico feasibility of novel biodegradation pathways for 1,2,4-trichlorobenzene. BMC Syst. Biol. 4, 7.(21) Finley, S. D., Broadbelt, L. J., and Hatzimanikatis, V. (2009)Thermodynamic analysis of biodegradation pathways. Biotechnol.Bioeng. 103, 532−541.(22) Hadadi, N., Soh, K. C., Seijo, M., Zisaki, A., Guan, X. L., Wenk,M. R., and Hatzimanikatis, V. (2014) A computational framework forintegration of lipidomics data into metabolic pathways. Metab. Eng. 23,1−8.(23) Gonzalez-Lergier, J., Broadbelt, L. J., and Hatzimanikatis, V.(2006) Analysis of the maximum theoretical yield for the synthesis oferythromycin precursors in Escherichia coli. Biotechnol. Bioeng. 95,638−644.(24) Gonzalez-Lergier, J., Broadbelt, L. J., and Hatzimanikatis, V.(2005) Theoretical considerations and computational analysis of thecomplexity in polyketide synthesis pathways. J. Am. Chem. Soc. 127,9930−9938.(25) Jankowski, M. D., Henry, C. S., Broadbelt, L. J., andHatzimanikatis, V. (2008) Group contribution method for thermody-

namic analysis of complex metabolic networks. Biophys. J. 95, 1487−1499.(26) Lilley, D. M., Clegg, R. M., Diekmann, S., Seeman, N. C., vonKitzing, E., and Hagerman, P. (1995) Nomenclature Committee of theInternational Union of Biochemistry and Molecular Biology (NC-IUBMB). A nomenclature of junctions and branchpoints in nucleicacids. Recommendations 1994. Eur. J. Biochem. 230, 1−2.(27) Soh, K. C., and Hatzimanikatis, V. (2010) Dreams ofMetabolism. Trends Biotechnol. 28, 501−508.(28) Overbeek, R., Begley, T., Butler, R. M., Choudhuri, J. V.,Chuang, H.-Y., Cohoon, M., de Crecy-Lagard, V., Diaz, N., Disz, T.,Edwards, R., Fonstein, M., Frank, E. D., Gerdes, S., Glass, E. M.,Goesmann, A., Hanson, A., Iwata-Reuyl, D., Jensen, R., Jamshidi, N.,Krause, L., Kubal, M., Larsen, N., Linke, B., McHardy, A. C., Meyer, F.,Neuweger, H., Olsen, G., Olson, R., Osterman, A., Portnoy, V., Pusch,G. D., Rodionov, D. A., Ruckert, C., Steiner, J., Stevens, R., Thiele, I.,Vassieva, O., Ye, Y., Zagnitko, O., and Vonstein, V. (2005) Thesubsystems approach to genome annotation and its use in the projectto annotate 1000 genomes. Nucleic Acids Res. 33, 5691−5702.(29) Karp, P. D., Riley, M., Saier, M., Paulsen, I. T., Paley, S. M., andPellegrini-Toole, A. (2000) The EcoCyc and MetaCyc databases.Nucleic Acids Res. 28, 56−59.(30) Hastings, J., de Matos, P., Dekker, A., Ennis, M., Harsha, B.,Kale, N., Muthukrishnan, V., Owen, G., Turner, S., Williams, M., andSteinbeck, C. (2013) The ChEBI reference database and ontology forbiologically relevant chemistry: enhancements for 2013. Nucleic AcidsRes. 41, D456−D463.(31) Wang, Y., Xiao, J., Suzek, T. O., Zhang, J., Wang, J., and Bryant,S. H. (2009) PubChem: a public information system for analyzingbioactivities of small molecules. Nucleic Acids Res. 37, W623−633.(32) Seijo, M., Hadadi, N., Soh, K. C., Miskovic, L., andHatzimanikatis, V. (2016) A Method for Evaluating Similarity ofBiochemical Reactions and Its Uses for Mapping Orphan and NovelReactions to Gene Sequences,.(33) Godden, J., Xue, L., and Bajorath, J. (2000) Combinatorialpreferences affect molecular similarity/diversity calculations usingbinary fingerprints and Tanimoto coefficients. J. Chem. Inf. Comput. Sci.40, 163−166.

ACS Synthetic Biology Research Article

DOI: 10.1021/acssynbio.6b00054ACS Synth. Biol. 2016, 5, 1155−1166

1166