Top Banner
Mechanism Deduction from Noisy Chemical Reaction Networks Jonny Proppe and Markus Reiher * December 20, 2018 Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland Abstract We introduce KiNetX, a fully automated meta-algorithm for the kinetic anal- ysis of complex chemical reaction networks derived from semi-accurate but efficient electronic structure calculations. It is designed to (i) accelerate the automated exploration of such networks and (ii) cope with model-inherent errors in electronic structure calculations on elementary reaction steps. We developed and implemented KiNetX to possess three features. First, KiNetX evaluates the kinetic relevance of every species in a (yet incomplete) reaction network to confine the search for new elementary reaction steps only to those species that are considered possibly rele- vant. Second, KiNetX identifies and eliminates all kinetically irrelevant species and elementary reactions to reduce a complex network graph to a comprehensible mechanism. Third, KiNetX estimates the sensitivity of species concentrations to- ward changes in individual rate constants (derived from relative free energies), which allows us to systematically select the most efficient electronic structure model for each elementary reaction given a predefined accuracy. The novelty of KiNetX con- sists in the rigorous propagation of correlated free-energy uncertainty through all steps of our kinetic analyis. To examine the performance of KiNetX, we developed AutoNetGen. It semirandomly generates chemistry-mimicking reaction networks by encoding chemical logic into their underlying graph structure. AutoNetGen allows us to consider a vast number of distinct chemistry-like scenarios and, hence, to discuss the importance of rigorous uncertainty propagation in a statistical context. Our results reveal that KiNetX reliably supports the deduction of product ratios, dom- inant reaction pathways, and possibly other network properties from semi-accurate electronic structure data. * corresponding author: [email protected] 1 arXiv:1803.09346v3 [physics.chem-ph] 19 Dec 2018
36

Mechanism Deduction from Noisy Chemical Reaction Networks

May 08, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mechanism Deduction from Noisy Chemical Reaction Networks

Mechanism Deduction from

Noisy Chemical Reaction Networks

Jonny Proppe and Markus Reiherlowast

December 20 2018

Laboratory of Physical Chemistry ETH ZuumlrichVladimir-Prelog-Weg 2 8093 Zuumlrich Switzerland

Abstract

We introduce KiNetX a fully automated meta-algorithm for the kinetic anal-ysis of complex chemical reaction networks derived from semi-accurate but efficientelectronic structure calculations It is designed to (i) accelerate the automatedexploration of such networks and (ii) cope with model-inherent errors in electronicstructure calculations on elementary reaction steps We developed and implementedKiNetX to possess three features First KiNetX evaluates the kinetic relevanceof every species in a (yet incomplete) reaction network to confine the search for newelementary reaction steps only to those species that are considered possibly rele-vant Second KiNetX identifies and eliminates all kinetically irrelevant speciesand elementary reactions to reduce a complex network graph to a comprehensiblemechanism Third KiNetX estimates the sensitivity of species concentrations to-ward changes in individual rate constants (derived from relative free energies) whichallows us to systematically select the most efficient electronic structure model foreach elementary reaction given a predefined accuracy The novelty of KiNetX con-sists in the rigorous propagation of correlated free-energy uncertainty through allsteps of our kinetic analyis To examine the performance of KiNetX we developedAutoNetGen It semirandomly generates chemistry-mimicking reaction networks byencoding chemical logic into their underlying graph structure AutoNetGen allows usto consider a vast number of distinct chemistry-like scenarios and hence to discussthe importance of rigorous uncertainty propagation in a statistical context Ourresults reveal that KiNetX reliably supports the deduction of product ratios dom-inant reaction pathways and possibly other network properties from semi-accurateelectronic structure data

lowastcorresponding author markusreiherphyschemethzch

1

arX

iv1

803

0934

6v3

[ph

ysic

sch

em-p

h] 1

9 D

ec 2

018

1 Introduction

A detailed understanding of reactive chemical systems on arbitrary time scales wouldsupport the optimization of chemical processes through directed manipulations of pro-moting and interfering factors during the course of a reaction For this purpose we needto uncover the kinetic properties of complex chemical reaction networks from a first-principles perspective as highly accurate free-energy differences are required for modelingthe temporal progress of a reaction However even electronic structure data may lackthe necessary degree of accuracy despite being derived from the first principles of quan-tum mechanics Therefore our objective is to model the kinetics of complex chemicalprocesses on a given time scale with rigorous uncertainty quantification Such a pro-cedure will ultimately allow us to automatically deduce product distributions reactionmechanisms and other network properties from semi-accurate electronic structure dataand to assess their reliability

There are two steps required prior to the kinetic analysis of a chemical reactionnetwork First the possibly vast chemical reaction space underlying the problem athand needs to be explored at least partially to generate a reaction network Thispreparatory task has been considered in detail by us and others1ndash17 see also Ref 18 for arecent review of this topic Any exploration attempt targeting mechanistic completenessis faced with the major challenge to handle a possibly factorial growth of reaction pathsfor an increasing number of species Obviously this combinatorial problem will rapidlyforce every exploration algorithm to stop after a few layers the number of which maybe smaller than what would be needed for a truly comprehensive understanding of acomplex reaction mechanism This unsystematic search (in terms of kinetics) will leadto exploring regions of reaction space that are kinetically irrelevant under given externalreaction conditions

Second the reaction network under consideration needs to be endowed with param-eters (relative free energies or rate constants) obtained from electronic structure cal-culations including estimates of their correlated uncertainty19ndash28 (originating from egapproximate exchangendashcorrelation functionals or partition functions) As an alternativeto ensembles of electronic structure models advanced machine learning methods (egGaussian process regression or Bayesian neural network regression)29 could be employedfor the accurate estimation of uncertainty-equipped model parameters The state of theart in reaction rate theory has recently been presented and discussed at the 2016 FaradayDiscussion on Reaction Rate Theory30ndash33 where we have already presented the principleworkflow to arrive at first-principles free energies equipped with correlated uncertaintiesat the example of a small model network of the formose reaction26 We showed thatthe propagation of correlated uncertainty in activation free energies to time-dependent

2

species concentrations can yield striking variances in equilibration times For a meanstandard deviation of about 3 kcal molminus1 in the activation free energies (maximum stan-dard deviation of 55 kcal molminus1) we found a maximum deviation in the equilibrationtime of almost 23 orders of magnitude

The focus of this paper is on the kinetic analysis of complex chemical reaction net-works Hence our discussion starts from a network for which the initial species concen-trations and all rate constants including estimates for their correlated uncertainty aresupposed to be known We define a reaction network endowed with rate constants as thegraph representation of a kinetic model based on mass action which is expressed as asystem of ordinary differential equations (ODEs) consisting of variables (species concen-trations) and parameters (rate constants) Since numerical integration of ODE systemsis not restricted to chemical kinetics but is relevant in basically all areas of science thereexists a plethora of computer programs for this purpose A review of the correspondingalgorithms would go beyond the scope of this paper We refer to Chapter 91 in Ref 34for a concise overview of ODE solvers employed in the field of chemical kinetics

The number of software packages for kinetic modeling is rich and has a long traditionin chemical kinetics34 For many chemical problems there exist tailored codes a per-sonal selection of which is briefly introduced hereafter One of the most widely appliedsoftware packages for comprehensive kinetic modeling is CHEMKIN35ndash37 As the develop-ment of CHEMKIN was and is particularly driven by gas phase chemistry it comprisesapplication-specific features that take care of eg transport processes or changes inpressure andor temperature during the course of a reaction An open-source alterna-tive in gas phase chemistry to CHEMKIN with comparable feature scope is Cantera38

Another open-source software package developed for comprehensive kinetic modeling isCOPASI (COmplex PAthway SImulator)39 Due to its focus on biochemical systemswhere particle numbers are likely to be very small COPASI contains an implementationfor stochastic kinetic simulations40 in addition to its deterministic counterpart (numer-ical integration of ODEs) The master equation which is the fundamental equation forstochastic chemical kinetics is also crucial in cases where the time scales of reactiveevents and collisional relaxation compete with each other such that a nonequilibriumdescription of state transitions becomes necessary MESMER (Master Equation Solverfor Multi-Energy Well Reactions) has been developed for this purpose by Glowacki andcolleagues4142 based on their results for both gas phase43 and solution phase4445 chem-istry

To match the philosophy of our developments for a new kind of computational quan-tum chemistry (SCINE46) we require a comprehensive kinetic network analysis to bebased on the following steps

3

(i) Translation of a reaction network endowed with uncertainty-equipped rate con-stants to an ensemble of kinetic models each model representing a unique set ofrate constants

(ii) numerical integration of the ensemble of (possibly stiff) kinetic models

(iii) identification of possibly relevant species based on noisy concentration data to guidethe search for new species

(iv) elimination of kinetically irrelevant species and elementary reactions based on noisyconcentration data (statistical mechanism deduction)

(v) global sensitivity analysis to rank reactions according to how much concentrationnoise the uncertainty of the underlying rate constants induce

Here we introduce KiNetX a meta-algorithm that accomplishes all of these tasksin a fully automated fashion The two key aspects of KiNetX are that it (i) can steerthe exploration of chemical reaction space in order to accelerate this exploration and (ii)enables routine statistico-kinetic analyses of reaction networks endowed with rate con-stants for which correlated uncertainty information is available This way we appreciaterecent developments related to the exploration of chemical reaction networks1ndash18 and theuncertainty quantification of reaction energies19ndash28 both derived from the first principlesof quantum mechanics The novelty of KiNetX consists in the uncertainty quantifica-tion approach it is built on Instead of making educated guesses consulting rules orfitting to experimental data34 it has become possible to estimate the correlated uncer-tainty of free-energy predictions from ensembles of electronic structure models whichallows for a direct evaluation of the underlying free-energy covariance matrix The ex-plicit propagation of these correlated uncertainties through the entire workflow rendersKiNetX a novel toolbox for statistico-kinetic modeling that significantly increases thereliability of mechanistic conclusions drawn from quantum chemical reaction data Weaimed at a very generic input format that does not enforce any context-related for-malities Even nonchemical problems (if expressed as mass action-type ODEs) can bestudied The uncertainty framework of KiNetX can in principle be coupled to any ofthe kinetic modeling codes mentioned above if a suitable parser is provided To examinethe importance of rigorous uncertainty propagation in kinetic modeling we developedan automated network generator AutoNetGen AutoNetGen creates chemistry-mimickingreaction networks based on chemical logic with which we can study an arbitrary numberof chemistry-like scenarios

This paper is organized as follows In Section 2 we introduce the basic equations ofkinetic modeling highlight the importance of uncertainty quantification for this field and

4

discuss the technical details of KiNetX and AutoNetGen In Section 3 we present theKiNetX workflow at a specific example and provide statistics on the reliability increaseby incorporating uncertainty quantification into the kinetic modeling framework

2 Theoretical and Algorithmic Details

21 Kinetic Modeling from a Network Perspective

We describe the structure of a chemical reaction network by a graph of N vertices andL bidirectional edges The network is strictly bidirectional as we assume every chemicaltransformation to be reversible Either of both edges corresponding to a reaction pair (areversible elementary reaction) is assigned an arbitrary but unique direction (forward +or backward minus)

We define the N -dimensional column vector of time-dependent species concentrations

y(t) =(y1(t) middot middot middot yN(t)

)gt (1)

which keeps track of the population density of each vertex at a given time Here yn(t)

refers to the concentration of the n-th chemical species at time t which is a strictlynonnegative quantity The 2L-dimensional column vector of rate constants

k =

(k+

kminus

)=(k+

1 middot middot middot k+L kminus1 middot middot middot kminusL

)gt (2)

contains strictly nonnegative scaling factors that determine the transition rate for eachdirection of an edge We define the (N timesL)-dimensional stoichiometry matrix of forwardreactions

S+ =

S+

11 middot middot middot S+1L

S+N1 middot middot middot S+

NL

(3)

where the element S+nl (n isin 1 middot middot middot N and l isin 1 middot middot middot L) describes the number of

molecules of the n-th species that is consumed in the l-th forward reaction Analogouslyto S+ we define the (N times L)-dimensional stoichiometry matrix of backward reactions

Sminus =

Sminus11 middot middot middot Sminus1L

SminusN1 middot middot middot SminusNL

(4)

All elements in S+ and Sminus are strictly nonnegative integers The combined (N times L)-dimensional stoichiometry matrix reads

S = Sminus minus S+ (5)

5

From the quantities introduced above and assuming mass action kinetics we canexpress the L-dimensional column vectors of forward (+) reaction rates and backward(minus) reaction rates as

f+minus(t) =(f

+minus1 (t) middot middot middot f

+minusL (t)

)gt (6)

with components

f+minusl (t) = k

+minusl

Nprodn=1

(yn(t)

)S+minusnl

(7)

By definition the product of concentrations in Eq (7) only equals zero if a speciesinvolved in the l-th forwardbackward reaction (indicated by a positive value of S+minus

nl ) isnot populated (zero concentration) since for all species not involved we obtain a factorof 1 due to S+minus

nl = 0 Based on the combined stoichiometry matrix S and the combinedL-dimensional column vector of reaction rates

f(t) = f+(t)minus fminus(t) (8)

we can express the time-dependent change in the species concentrations as

g(t) =(g1(t) middot middot middot gN(t)

)gt=

ddt

y(t) = S f(t) (9)

The relevant procedure prior to any kinetic analysis of reaction networks is the numericalintegration of g(t) which is therefore the central quantity in kinetic modeling

22 Uncertainty Quantification in Kinetic Modeling

The objective of uncertainty quantification in kinetic modeling34 is to assess the accuracy(or bias) and precision (or variance) of concentration profiles obtained from numericalintegration of ODE systems To obtain reliable results it may be necessary to investgreat effort in estimating the correlated uncertainty of model parameters (free-energydifferences or rate constants) The mathematical object we are searching for is the jointprobability distribution of model parameters47 One way to estimate parameter correlationis the backward propagation of uncertainty observed in measured concentration profiles48

which requires knowledge of the underlying mechanism containing all kinetically relevantelementary reaction steps This strategy is clearly appealing for verifying mechanismcompleteness but it does not serve our purpose of understanding chemical reactivityfrom a first-principles perspective

It has recently been shown that the neglect of parameter correlation in kinetic mod-els can easily lead to false mechanistic conclusions despite being derived from electronicstructure calculations2649 As the parameters of an electronic structure model are (un-known) functions of chemical space free energies of reaction pathways comprising simi-lar species will not change independently of each other when the value of an electronic

6

structure parameter is changed Fortunately recent statistical developments in quantumchemistry19ndash28 enable the estimation of correlated uncertainty in free-energy differencesStill reliably estimating the correlation between free energies is computationally hardas it requires sampling from ensembles of electronic structure models2526 and hencerepeated first-principles calculations for all species of the chemical system studied (in-cluding transition states and possibly other structures along the reaction pathway)Even if machine learning models were employed a comprehensive training set based ona vast number of electronic structure calculations would be necessary50 We have alreadydemonstrated the steps required for the propagation of correlated uncertainty in activa-tion free energies for a model network of the formose reaction26 Here we will generalizethe procedure for the study of arbitrary chemical systems

The estimation of uncertainty of a target quantity from the joint probability distri-bution of model parameters is referred to as forward uncertainty quantificationmdashalsoknown as uncertainty propagation The opposite procedure is referred to as inverse un-certainty quantification and is applied to calibration problems5152 (the backward prop-agation of concentration uncertainty mentioned above belongs to the latter approach)Every statistical analysis implemented in KiNetX is based on uncertainty propagationWe consider an ensemble of B + 1 vectors of rate constants that is obtained by drawingB samples from the joint probability distribution of activation free energies with meanE[A] = A0 and variance V[A] = ΣA which are subsequently mapped to rate constantsbased on eg Eyringrsquos transition state theory53 Each sampled vector of rate constantsis labeled kb with b isin 1 middot middot middot B An additional vector k0 is directly obtained fromA0 The ensemble of rate constant vectors KB = k0k1 middot middot middot kB is the basis for anybottom-up uncertainty analysis in kinetic modeling and is taken into account explicitlyby every subalgorithm of KiNetX

Note that the setup introduced here neglects third- and higher-order moments of thejoint probability distribution of activation free energies which may be a weak assumptionfor actual reaction networks derived from first principles To avoid these limitations onecan always sample directly from the ensemble of electronic structure models2526 whichrequires repeated first-principles calculations and is therefore rather inefficient comparedto sampling from ΣA Another possibility not yet explored by us is the application ofmatrix algebra to construct special matrices that simplify expressions for higher-ordermoments of joint probability distributions (in particular skewness and kurtosis)54

23 Overview of the KiNetX Meta-Algorithm

All reaction networks analyzed with KiNetX in this work were generated with AutoNet-Gen (Section 26) Both algorithms are written in Matlab55 For the numerical integration

7

of (generally stiff) ODE systems we interface to the ode15s module56 of MatlabThe KiNetX workflow consists of three core algorithms (Sections 241ndash243) all of

which take the correlated uncertainty in the underlying model parameters (rate constants)explicitly into account

1 Uncertainty propagation

Solve an ensemble of kinetic models derived from a unique reaction network andestimate the kinetic relevance of every species based on the maximum rate of for-mation (Section 241)

2 Network reduction Identify and eliminate all kinetically irrelevant vertices andedges of the network by applying a hierarchy of flux analyses resulting in a sparsenetwork that is a comprehensible representation of the underlying reaction mecha-nism (Section 242)

3 Sensitivity analysis Determine the effect of rate constant perturbations on time-dependent species concentrations through an extended version of Morris screening57

(Section 243)

The minimal input requirements for KiNetX are

bull A chemical reaction network with N vertices and 2L unidirectional edges (providedby AutoNetGen in this work)

bull A set of N initial species concentrations y0 (provided by the user)

bull An ensemble of B + 1 sets of 2L rate constants each

KB = kb (10)

with b = 0 middot middot middot B which may be derived from an ensemble of electronic structuremodels based on eg density functional theory222425 (provided by AutoNetGen inthis work)

bull A maximum reaction time tmax representing a practical time scale or a time scaleof interest (provided by the user)

At present we require the input to be provided in SI units Optional input parametersare

bull The maximum tolerable concentration error εy between the exact and an approx-imate solution

8

bull A flux threshold Gmin above which a chemical species will be considered kineticallyrelevant

bull The number of log-distributed time points U + 1 between t = 0 and t = tmax atwhich species concentrations will be evaluated based on cubic spline interpolation

bull A confidence level γ important for assessing network properties derived from anensemble of kinetic models

bull The number of Morris samples C considered in our global sensitivity analysis

bull Absolute and relative concentration error tolerances considered during numericalintegration of ODE systems

Every execution of KiNetX is based on one network with a fixed number of verticesand edges Hence for steering the exploration of reaction networks KiNetX needs tobe executed repeatedly We designed KiNetx to suggest the next exploration step byestimating the kinetic relevance of every species based on the maximum rate of forma-tion (Section 25) We will make the KiNetx program available through our webpage(scineethzch) in 2019

Currently KiNetX is limited to the analysis of homogeneous and isothermal reac-tive chemical systems in dilute solution which constitute a major category of chemicalsystems found in nature and examined in chemical research One prominent subcategorythereof that attracts much attention in fundamental and industrial research is homoge-neous catalysis a field that is rather underrepresented in the kinetic modeling literature58

For the study of dilute systems collisions involving three or more reactive species canbe considered negligible from a statistical point of view For this reason each element ofthe forward and backward reaction rate vectors f+ and fminus is of the form

f+minusl = k

+minusl y

n+minusl1

yn+minusl2

(11)

where the vertex index n is determined by the edge index (l) the direction (+ or minus) andthe (arbitrary) position in the rate equation (i = 1 2) In case of a unimolecular reactionwe simply set one of the two vertex indices to n+minus

li = 0 denoting a hypothetical null-species with a defined constant value of y0 = 1 independent of the unit of measurement

24 The KiNetX Workflow

241 Uncertainty Propagation

The first step of our KiNetX workflow is the numerical integration of an ensemble ofB + 1 kinetic models each of them representing a unique set of rate constants Clearly

9

the user also has the choice to choose B = 0 (leading to the usual kinetic modeling setupcomprising a single numerical integration) but KiNetX is actually designed to analyzean ensemble of kinetic models

To assess the magnitude of noise in the B solutions (y-uncertainty) resulting fromthe ensemble of sets of rate constants (k-uncertainty) we introduce two measures

δB(tu) =1

2B

Bsumb=1

Nsumn=1

∣∣ynb(tu)minus langyn(tu)rang∣∣ (12)

where langyn(tu)

rang=

1

B

Bsumb=1

ynb(tu) (13)

is the mean value of concentrations at time t = tu and

∆B =1

tmax

Usumu=1

(δB(tuminus12)

)middot(tu minus tuminus1

) (14)

The factor tmax in the denominator of Eq (14) equals the sum of time differences giventhat t0 = 0

tmax = tU = tU minus t0 =Usumu=1

(tu minus tuminus1) (15)

Furthermore we define

tuminus12 =1

2(tuminus1 + tu) (16)

According to Eq (12) δB(tu) represents the ensemble-averaged y-uncertainty summedover all species at time t = tu Here we focus on δB(tmax) as it measures the variabil-ity in the product distribution a network property of particular interest for syntheticchemists On the contrary ∆B represents a time-averaged y-uncertainty ie the overallvariability of species concentrations between t0 = 0 and tU = tmax The combination ofboth measures may be valuable for determining the underlying reaction mechanism Forinstance if δB(tmax) is close to zero but ∆B is rather large it is likely that we are facedwith both or either of the following two scenarios (i) The sequence of elementary steps isidentical or very similar for some k-vectors but the uncertainty of the activation barrierassociated with the rate-determining step is significant (ii) different k-vectors suggestdifferent routes to the same metastable sink of the reaction network Furthermore ifboth δB(tmax) and ∆B are below a user-defined concentration error εy we can safely ne-glect the ensemble of kinetic models and base all further kinetic analysis on the nominalsample represented by k0 only

When applying Eqs (12)+(14) it is important that the number and identity ofthe time points tu are the same for all solutions However it is very unlikely thatthe time points obtained from an ODE solver are identical for two different k-vectors

10

For this purpose KiNetX calculates a log-distributed vector of reference time points(t0 t1 middot middot middot tU)gt with t0 = 0 and tU = tmax which is a function of the user-defined pa-rameters tmax and U To obtain species concentrations at the same time points forother k-vectors KiNetX interpolates the corresponding solutions through cubic splinesmoothing Note that spline smoothing is only reasonable if data noise is negligible aproperty which one can assess easily in the case of one control variable (here time) Toevaluate the reliability of the cubic spline interpolations we compared them against re-sults from Gaussian process regression59 Under suitable assumptions Gaussian processregression yields an optimal trade-off between fitting and smoothness (cubic splines onlyensure the latter property) Hence it can be employed to predict concentrations frompreviously unconsidered values in the time domain (here the domain between t0 = 0 andtU = tmax) Furthermore as Gaussian process regression is a strictly Bayesian methodone may obtain reliable uncertainty estimates for each prediction However as this re-gression method scales cubically with the number of data points it is not suitable forrepetitive applications (usually hundreds of regressions would be necessary) In all casesstudied we found that both the mean deviation between the two interpolation meth-ods (cubic spline smoothing versus Gaussian process regression) and the mean predictivevariance obtained from Gaussian process regression are negligibly small compared to themean variance of the concentrations themselves (by a factor of lt10minus6) Hence data noiseis indeed negligible and cubic splines work well for interpolating concentration profiles

242 Network Reduction

We introduce a hierarchy of two reduction algorithms with increasing sophistication (interms of both rigor and required computing resources) which identify and eliminatekinetically irrelevant species and elementary reactions Initially we perform a detailedflux analysis followed by a local barrier analysis if the former analysis suggests a reducedmodel resulting in a concentration error that exceeds the user-defined threshold εy Bothalgorithms reflect our interpretation of established kinetic modeling concepts34

Detailed Flux Analysis (DFA) To keep track of the net concentration that haspassed an edge between t = 0 and t = tmax we integrate the absolute values of fl(t)over that time interval

Fl =

int t=tmax

t=0

|fl(t)| dt (17)

which we define as the edge flux corresponding to the l-th reaction pair The vector ofedge fluxes F is the basis for determining the vertex flux corresponding to the n-thspecies

Gn =(s+n + sminusn

)F (18)

11

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 2: Mechanism Deduction from Noisy Chemical Reaction Networks

1 Introduction

A detailed understanding of reactive chemical systems on arbitrary time scales wouldsupport the optimization of chemical processes through directed manipulations of pro-moting and interfering factors during the course of a reaction For this purpose we needto uncover the kinetic properties of complex chemical reaction networks from a first-principles perspective as highly accurate free-energy differences are required for modelingthe temporal progress of a reaction However even electronic structure data may lackthe necessary degree of accuracy despite being derived from the first principles of quan-tum mechanics Therefore our objective is to model the kinetics of complex chemicalprocesses on a given time scale with rigorous uncertainty quantification Such a pro-cedure will ultimately allow us to automatically deduce product distributions reactionmechanisms and other network properties from semi-accurate electronic structure dataand to assess their reliability

There are two steps required prior to the kinetic analysis of a chemical reactionnetwork First the possibly vast chemical reaction space underlying the problem athand needs to be explored at least partially to generate a reaction network Thispreparatory task has been considered in detail by us and others1ndash17 see also Ref 18 for arecent review of this topic Any exploration attempt targeting mechanistic completenessis faced with the major challenge to handle a possibly factorial growth of reaction pathsfor an increasing number of species Obviously this combinatorial problem will rapidlyforce every exploration algorithm to stop after a few layers the number of which maybe smaller than what would be needed for a truly comprehensive understanding of acomplex reaction mechanism This unsystematic search (in terms of kinetics) will leadto exploring regions of reaction space that are kinetically irrelevant under given externalreaction conditions

Second the reaction network under consideration needs to be endowed with param-eters (relative free energies or rate constants) obtained from electronic structure cal-culations including estimates of their correlated uncertainty19ndash28 (originating from egapproximate exchangendashcorrelation functionals or partition functions) As an alternativeto ensembles of electronic structure models advanced machine learning methods (egGaussian process regression or Bayesian neural network regression)29 could be employedfor the accurate estimation of uncertainty-equipped model parameters The state of theart in reaction rate theory has recently been presented and discussed at the 2016 FaradayDiscussion on Reaction Rate Theory30ndash33 where we have already presented the principleworkflow to arrive at first-principles free energies equipped with correlated uncertaintiesat the example of a small model network of the formose reaction26 We showed thatthe propagation of correlated uncertainty in activation free energies to time-dependent

2

species concentrations can yield striking variances in equilibration times For a meanstandard deviation of about 3 kcal molminus1 in the activation free energies (maximum stan-dard deviation of 55 kcal molminus1) we found a maximum deviation in the equilibrationtime of almost 23 orders of magnitude

The focus of this paper is on the kinetic analysis of complex chemical reaction net-works Hence our discussion starts from a network for which the initial species concen-trations and all rate constants including estimates for their correlated uncertainty aresupposed to be known We define a reaction network endowed with rate constants as thegraph representation of a kinetic model based on mass action which is expressed as asystem of ordinary differential equations (ODEs) consisting of variables (species concen-trations) and parameters (rate constants) Since numerical integration of ODE systemsis not restricted to chemical kinetics but is relevant in basically all areas of science thereexists a plethora of computer programs for this purpose A review of the correspondingalgorithms would go beyond the scope of this paper We refer to Chapter 91 in Ref 34for a concise overview of ODE solvers employed in the field of chemical kinetics

The number of software packages for kinetic modeling is rich and has a long traditionin chemical kinetics34 For many chemical problems there exist tailored codes a per-sonal selection of which is briefly introduced hereafter One of the most widely appliedsoftware packages for comprehensive kinetic modeling is CHEMKIN35ndash37 As the develop-ment of CHEMKIN was and is particularly driven by gas phase chemistry it comprisesapplication-specific features that take care of eg transport processes or changes inpressure andor temperature during the course of a reaction An open-source alterna-tive in gas phase chemistry to CHEMKIN with comparable feature scope is Cantera38

Another open-source software package developed for comprehensive kinetic modeling isCOPASI (COmplex PAthway SImulator)39 Due to its focus on biochemical systemswhere particle numbers are likely to be very small COPASI contains an implementationfor stochastic kinetic simulations40 in addition to its deterministic counterpart (numer-ical integration of ODEs) The master equation which is the fundamental equation forstochastic chemical kinetics is also crucial in cases where the time scales of reactiveevents and collisional relaxation compete with each other such that a nonequilibriumdescription of state transitions becomes necessary MESMER (Master Equation Solverfor Multi-Energy Well Reactions) has been developed for this purpose by Glowacki andcolleagues4142 based on their results for both gas phase43 and solution phase4445 chem-istry

To match the philosophy of our developments for a new kind of computational quan-tum chemistry (SCINE46) we require a comprehensive kinetic network analysis to bebased on the following steps

3

(i) Translation of a reaction network endowed with uncertainty-equipped rate con-stants to an ensemble of kinetic models each model representing a unique set ofrate constants

(ii) numerical integration of the ensemble of (possibly stiff) kinetic models

(iii) identification of possibly relevant species based on noisy concentration data to guidethe search for new species

(iv) elimination of kinetically irrelevant species and elementary reactions based on noisyconcentration data (statistical mechanism deduction)

(v) global sensitivity analysis to rank reactions according to how much concentrationnoise the uncertainty of the underlying rate constants induce

Here we introduce KiNetX a meta-algorithm that accomplishes all of these tasksin a fully automated fashion The two key aspects of KiNetX are that it (i) can steerthe exploration of chemical reaction space in order to accelerate this exploration and (ii)enables routine statistico-kinetic analyses of reaction networks endowed with rate con-stants for which correlated uncertainty information is available This way we appreciaterecent developments related to the exploration of chemical reaction networks1ndash18 and theuncertainty quantification of reaction energies19ndash28 both derived from the first principlesof quantum mechanics The novelty of KiNetX consists in the uncertainty quantifica-tion approach it is built on Instead of making educated guesses consulting rules orfitting to experimental data34 it has become possible to estimate the correlated uncer-tainty of free-energy predictions from ensembles of electronic structure models whichallows for a direct evaluation of the underlying free-energy covariance matrix The ex-plicit propagation of these correlated uncertainties through the entire workflow rendersKiNetX a novel toolbox for statistico-kinetic modeling that significantly increases thereliability of mechanistic conclusions drawn from quantum chemical reaction data Weaimed at a very generic input format that does not enforce any context-related for-malities Even nonchemical problems (if expressed as mass action-type ODEs) can bestudied The uncertainty framework of KiNetX can in principle be coupled to any ofthe kinetic modeling codes mentioned above if a suitable parser is provided To examinethe importance of rigorous uncertainty propagation in kinetic modeling we developedan automated network generator AutoNetGen AutoNetGen creates chemistry-mimickingreaction networks based on chemical logic with which we can study an arbitrary numberof chemistry-like scenarios

This paper is organized as follows In Section 2 we introduce the basic equations ofkinetic modeling highlight the importance of uncertainty quantification for this field and

4

discuss the technical details of KiNetX and AutoNetGen In Section 3 we present theKiNetX workflow at a specific example and provide statistics on the reliability increaseby incorporating uncertainty quantification into the kinetic modeling framework

2 Theoretical and Algorithmic Details

21 Kinetic Modeling from a Network Perspective

We describe the structure of a chemical reaction network by a graph of N vertices andL bidirectional edges The network is strictly bidirectional as we assume every chemicaltransformation to be reversible Either of both edges corresponding to a reaction pair (areversible elementary reaction) is assigned an arbitrary but unique direction (forward +or backward minus)

We define the N -dimensional column vector of time-dependent species concentrations

y(t) =(y1(t) middot middot middot yN(t)

)gt (1)

which keeps track of the population density of each vertex at a given time Here yn(t)

refers to the concentration of the n-th chemical species at time t which is a strictlynonnegative quantity The 2L-dimensional column vector of rate constants

k =

(k+

kminus

)=(k+

1 middot middot middot k+L kminus1 middot middot middot kminusL

)gt (2)

contains strictly nonnegative scaling factors that determine the transition rate for eachdirection of an edge We define the (N timesL)-dimensional stoichiometry matrix of forwardreactions

S+ =

S+

11 middot middot middot S+1L

S+N1 middot middot middot S+

NL

(3)

where the element S+nl (n isin 1 middot middot middot N and l isin 1 middot middot middot L) describes the number of

molecules of the n-th species that is consumed in the l-th forward reaction Analogouslyto S+ we define the (N times L)-dimensional stoichiometry matrix of backward reactions

Sminus =

Sminus11 middot middot middot Sminus1L

SminusN1 middot middot middot SminusNL

(4)

All elements in S+ and Sminus are strictly nonnegative integers The combined (N times L)-dimensional stoichiometry matrix reads

S = Sminus minus S+ (5)

5

From the quantities introduced above and assuming mass action kinetics we canexpress the L-dimensional column vectors of forward (+) reaction rates and backward(minus) reaction rates as

f+minus(t) =(f

+minus1 (t) middot middot middot f

+minusL (t)

)gt (6)

with components

f+minusl (t) = k

+minusl

Nprodn=1

(yn(t)

)S+minusnl

(7)

By definition the product of concentrations in Eq (7) only equals zero if a speciesinvolved in the l-th forwardbackward reaction (indicated by a positive value of S+minus

nl ) isnot populated (zero concentration) since for all species not involved we obtain a factorof 1 due to S+minus

nl = 0 Based on the combined stoichiometry matrix S and the combinedL-dimensional column vector of reaction rates

f(t) = f+(t)minus fminus(t) (8)

we can express the time-dependent change in the species concentrations as

g(t) =(g1(t) middot middot middot gN(t)

)gt=

ddt

y(t) = S f(t) (9)

The relevant procedure prior to any kinetic analysis of reaction networks is the numericalintegration of g(t) which is therefore the central quantity in kinetic modeling

22 Uncertainty Quantification in Kinetic Modeling

The objective of uncertainty quantification in kinetic modeling34 is to assess the accuracy(or bias) and precision (or variance) of concentration profiles obtained from numericalintegration of ODE systems To obtain reliable results it may be necessary to investgreat effort in estimating the correlated uncertainty of model parameters (free-energydifferences or rate constants) The mathematical object we are searching for is the jointprobability distribution of model parameters47 One way to estimate parameter correlationis the backward propagation of uncertainty observed in measured concentration profiles48

which requires knowledge of the underlying mechanism containing all kinetically relevantelementary reaction steps This strategy is clearly appealing for verifying mechanismcompleteness but it does not serve our purpose of understanding chemical reactivityfrom a first-principles perspective

It has recently been shown that the neglect of parameter correlation in kinetic mod-els can easily lead to false mechanistic conclusions despite being derived from electronicstructure calculations2649 As the parameters of an electronic structure model are (un-known) functions of chemical space free energies of reaction pathways comprising simi-lar species will not change independently of each other when the value of an electronic

6

structure parameter is changed Fortunately recent statistical developments in quantumchemistry19ndash28 enable the estimation of correlated uncertainty in free-energy differencesStill reliably estimating the correlation between free energies is computationally hardas it requires sampling from ensembles of electronic structure models2526 and hencerepeated first-principles calculations for all species of the chemical system studied (in-cluding transition states and possibly other structures along the reaction pathway)Even if machine learning models were employed a comprehensive training set based ona vast number of electronic structure calculations would be necessary50 We have alreadydemonstrated the steps required for the propagation of correlated uncertainty in activa-tion free energies for a model network of the formose reaction26 Here we will generalizethe procedure for the study of arbitrary chemical systems

The estimation of uncertainty of a target quantity from the joint probability distri-bution of model parameters is referred to as forward uncertainty quantificationmdashalsoknown as uncertainty propagation The opposite procedure is referred to as inverse un-certainty quantification and is applied to calibration problems5152 (the backward prop-agation of concentration uncertainty mentioned above belongs to the latter approach)Every statistical analysis implemented in KiNetX is based on uncertainty propagationWe consider an ensemble of B + 1 vectors of rate constants that is obtained by drawingB samples from the joint probability distribution of activation free energies with meanE[A] = A0 and variance V[A] = ΣA which are subsequently mapped to rate constantsbased on eg Eyringrsquos transition state theory53 Each sampled vector of rate constantsis labeled kb with b isin 1 middot middot middot B An additional vector k0 is directly obtained fromA0 The ensemble of rate constant vectors KB = k0k1 middot middot middot kB is the basis for anybottom-up uncertainty analysis in kinetic modeling and is taken into account explicitlyby every subalgorithm of KiNetX

Note that the setup introduced here neglects third- and higher-order moments of thejoint probability distribution of activation free energies which may be a weak assumptionfor actual reaction networks derived from first principles To avoid these limitations onecan always sample directly from the ensemble of electronic structure models2526 whichrequires repeated first-principles calculations and is therefore rather inefficient comparedto sampling from ΣA Another possibility not yet explored by us is the application ofmatrix algebra to construct special matrices that simplify expressions for higher-ordermoments of joint probability distributions (in particular skewness and kurtosis)54

23 Overview of the KiNetX Meta-Algorithm

All reaction networks analyzed with KiNetX in this work were generated with AutoNet-Gen (Section 26) Both algorithms are written in Matlab55 For the numerical integration

7

of (generally stiff) ODE systems we interface to the ode15s module56 of MatlabThe KiNetX workflow consists of three core algorithms (Sections 241ndash243) all of

which take the correlated uncertainty in the underlying model parameters (rate constants)explicitly into account

1 Uncertainty propagation

Solve an ensemble of kinetic models derived from a unique reaction network andestimate the kinetic relevance of every species based on the maximum rate of for-mation (Section 241)

2 Network reduction Identify and eliminate all kinetically irrelevant vertices andedges of the network by applying a hierarchy of flux analyses resulting in a sparsenetwork that is a comprehensible representation of the underlying reaction mecha-nism (Section 242)

3 Sensitivity analysis Determine the effect of rate constant perturbations on time-dependent species concentrations through an extended version of Morris screening57

(Section 243)

The minimal input requirements for KiNetX are

bull A chemical reaction network with N vertices and 2L unidirectional edges (providedby AutoNetGen in this work)

bull A set of N initial species concentrations y0 (provided by the user)

bull An ensemble of B + 1 sets of 2L rate constants each

KB = kb (10)

with b = 0 middot middot middot B which may be derived from an ensemble of electronic structuremodels based on eg density functional theory222425 (provided by AutoNetGen inthis work)

bull A maximum reaction time tmax representing a practical time scale or a time scaleof interest (provided by the user)

At present we require the input to be provided in SI units Optional input parametersare

bull The maximum tolerable concentration error εy between the exact and an approx-imate solution

8

bull A flux threshold Gmin above which a chemical species will be considered kineticallyrelevant

bull The number of log-distributed time points U + 1 between t = 0 and t = tmax atwhich species concentrations will be evaluated based on cubic spline interpolation

bull A confidence level γ important for assessing network properties derived from anensemble of kinetic models

bull The number of Morris samples C considered in our global sensitivity analysis

bull Absolute and relative concentration error tolerances considered during numericalintegration of ODE systems

Every execution of KiNetX is based on one network with a fixed number of verticesand edges Hence for steering the exploration of reaction networks KiNetX needs tobe executed repeatedly We designed KiNetx to suggest the next exploration step byestimating the kinetic relevance of every species based on the maximum rate of forma-tion (Section 25) We will make the KiNetx program available through our webpage(scineethzch) in 2019

Currently KiNetX is limited to the analysis of homogeneous and isothermal reac-tive chemical systems in dilute solution which constitute a major category of chemicalsystems found in nature and examined in chemical research One prominent subcategorythereof that attracts much attention in fundamental and industrial research is homoge-neous catalysis a field that is rather underrepresented in the kinetic modeling literature58

For the study of dilute systems collisions involving three or more reactive species canbe considered negligible from a statistical point of view For this reason each element ofthe forward and backward reaction rate vectors f+ and fminus is of the form

f+minusl = k

+minusl y

n+minusl1

yn+minusl2

(11)

where the vertex index n is determined by the edge index (l) the direction (+ or minus) andthe (arbitrary) position in the rate equation (i = 1 2) In case of a unimolecular reactionwe simply set one of the two vertex indices to n+minus

li = 0 denoting a hypothetical null-species with a defined constant value of y0 = 1 independent of the unit of measurement

24 The KiNetX Workflow

241 Uncertainty Propagation

The first step of our KiNetX workflow is the numerical integration of an ensemble ofB + 1 kinetic models each of them representing a unique set of rate constants Clearly

9

the user also has the choice to choose B = 0 (leading to the usual kinetic modeling setupcomprising a single numerical integration) but KiNetX is actually designed to analyzean ensemble of kinetic models

To assess the magnitude of noise in the B solutions (y-uncertainty) resulting fromthe ensemble of sets of rate constants (k-uncertainty) we introduce two measures

δB(tu) =1

2B

Bsumb=1

Nsumn=1

∣∣ynb(tu)minus langyn(tu)rang∣∣ (12)

where langyn(tu)

rang=

1

B

Bsumb=1

ynb(tu) (13)

is the mean value of concentrations at time t = tu and

∆B =1

tmax

Usumu=1

(δB(tuminus12)

)middot(tu minus tuminus1

) (14)

The factor tmax in the denominator of Eq (14) equals the sum of time differences giventhat t0 = 0

tmax = tU = tU minus t0 =Usumu=1

(tu minus tuminus1) (15)

Furthermore we define

tuminus12 =1

2(tuminus1 + tu) (16)

According to Eq (12) δB(tu) represents the ensemble-averaged y-uncertainty summedover all species at time t = tu Here we focus on δB(tmax) as it measures the variabil-ity in the product distribution a network property of particular interest for syntheticchemists On the contrary ∆B represents a time-averaged y-uncertainty ie the overallvariability of species concentrations between t0 = 0 and tU = tmax The combination ofboth measures may be valuable for determining the underlying reaction mechanism Forinstance if δB(tmax) is close to zero but ∆B is rather large it is likely that we are facedwith both or either of the following two scenarios (i) The sequence of elementary steps isidentical or very similar for some k-vectors but the uncertainty of the activation barrierassociated with the rate-determining step is significant (ii) different k-vectors suggestdifferent routes to the same metastable sink of the reaction network Furthermore ifboth δB(tmax) and ∆B are below a user-defined concentration error εy we can safely ne-glect the ensemble of kinetic models and base all further kinetic analysis on the nominalsample represented by k0 only

When applying Eqs (12)+(14) it is important that the number and identity ofthe time points tu are the same for all solutions However it is very unlikely thatthe time points obtained from an ODE solver are identical for two different k-vectors

10

For this purpose KiNetX calculates a log-distributed vector of reference time points(t0 t1 middot middot middot tU)gt with t0 = 0 and tU = tmax which is a function of the user-defined pa-rameters tmax and U To obtain species concentrations at the same time points forother k-vectors KiNetX interpolates the corresponding solutions through cubic splinesmoothing Note that spline smoothing is only reasonable if data noise is negligible aproperty which one can assess easily in the case of one control variable (here time) Toevaluate the reliability of the cubic spline interpolations we compared them against re-sults from Gaussian process regression59 Under suitable assumptions Gaussian processregression yields an optimal trade-off between fitting and smoothness (cubic splines onlyensure the latter property) Hence it can be employed to predict concentrations frompreviously unconsidered values in the time domain (here the domain between t0 = 0 andtU = tmax) Furthermore as Gaussian process regression is a strictly Bayesian methodone may obtain reliable uncertainty estimates for each prediction However as this re-gression method scales cubically with the number of data points it is not suitable forrepetitive applications (usually hundreds of regressions would be necessary) In all casesstudied we found that both the mean deviation between the two interpolation meth-ods (cubic spline smoothing versus Gaussian process regression) and the mean predictivevariance obtained from Gaussian process regression are negligibly small compared to themean variance of the concentrations themselves (by a factor of lt10minus6) Hence data noiseis indeed negligible and cubic splines work well for interpolating concentration profiles

242 Network Reduction

We introduce a hierarchy of two reduction algorithms with increasing sophistication (interms of both rigor and required computing resources) which identify and eliminatekinetically irrelevant species and elementary reactions Initially we perform a detailedflux analysis followed by a local barrier analysis if the former analysis suggests a reducedmodel resulting in a concentration error that exceeds the user-defined threshold εy Bothalgorithms reflect our interpretation of established kinetic modeling concepts34

Detailed Flux Analysis (DFA) To keep track of the net concentration that haspassed an edge between t = 0 and t = tmax we integrate the absolute values of fl(t)over that time interval

Fl =

int t=tmax

t=0

|fl(t)| dt (17)

which we define as the edge flux corresponding to the l-th reaction pair The vector ofedge fluxes F is the basis for determining the vertex flux corresponding to the n-thspecies

Gn =(s+n + sminusn

)F (18)

11

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 3: Mechanism Deduction from Noisy Chemical Reaction Networks

species concentrations can yield striking variances in equilibration times For a meanstandard deviation of about 3 kcal molminus1 in the activation free energies (maximum stan-dard deviation of 55 kcal molminus1) we found a maximum deviation in the equilibrationtime of almost 23 orders of magnitude

The focus of this paper is on the kinetic analysis of complex chemical reaction net-works Hence our discussion starts from a network for which the initial species concen-trations and all rate constants including estimates for their correlated uncertainty aresupposed to be known We define a reaction network endowed with rate constants as thegraph representation of a kinetic model based on mass action which is expressed as asystem of ordinary differential equations (ODEs) consisting of variables (species concen-trations) and parameters (rate constants) Since numerical integration of ODE systemsis not restricted to chemical kinetics but is relevant in basically all areas of science thereexists a plethora of computer programs for this purpose A review of the correspondingalgorithms would go beyond the scope of this paper We refer to Chapter 91 in Ref 34for a concise overview of ODE solvers employed in the field of chemical kinetics

The number of software packages for kinetic modeling is rich and has a long traditionin chemical kinetics34 For many chemical problems there exist tailored codes a per-sonal selection of which is briefly introduced hereafter One of the most widely appliedsoftware packages for comprehensive kinetic modeling is CHEMKIN35ndash37 As the develop-ment of CHEMKIN was and is particularly driven by gas phase chemistry it comprisesapplication-specific features that take care of eg transport processes or changes inpressure andor temperature during the course of a reaction An open-source alterna-tive in gas phase chemistry to CHEMKIN with comparable feature scope is Cantera38

Another open-source software package developed for comprehensive kinetic modeling isCOPASI (COmplex PAthway SImulator)39 Due to its focus on biochemical systemswhere particle numbers are likely to be very small COPASI contains an implementationfor stochastic kinetic simulations40 in addition to its deterministic counterpart (numer-ical integration of ODEs) The master equation which is the fundamental equation forstochastic chemical kinetics is also crucial in cases where the time scales of reactiveevents and collisional relaxation compete with each other such that a nonequilibriumdescription of state transitions becomes necessary MESMER (Master Equation Solverfor Multi-Energy Well Reactions) has been developed for this purpose by Glowacki andcolleagues4142 based on their results for both gas phase43 and solution phase4445 chem-istry

To match the philosophy of our developments for a new kind of computational quan-tum chemistry (SCINE46) we require a comprehensive kinetic network analysis to bebased on the following steps

3

(i) Translation of a reaction network endowed with uncertainty-equipped rate con-stants to an ensemble of kinetic models each model representing a unique set ofrate constants

(ii) numerical integration of the ensemble of (possibly stiff) kinetic models

(iii) identification of possibly relevant species based on noisy concentration data to guidethe search for new species

(iv) elimination of kinetically irrelevant species and elementary reactions based on noisyconcentration data (statistical mechanism deduction)

(v) global sensitivity analysis to rank reactions according to how much concentrationnoise the uncertainty of the underlying rate constants induce

Here we introduce KiNetX a meta-algorithm that accomplishes all of these tasksin a fully automated fashion The two key aspects of KiNetX are that it (i) can steerthe exploration of chemical reaction space in order to accelerate this exploration and (ii)enables routine statistico-kinetic analyses of reaction networks endowed with rate con-stants for which correlated uncertainty information is available This way we appreciaterecent developments related to the exploration of chemical reaction networks1ndash18 and theuncertainty quantification of reaction energies19ndash28 both derived from the first principlesof quantum mechanics The novelty of KiNetX consists in the uncertainty quantifica-tion approach it is built on Instead of making educated guesses consulting rules orfitting to experimental data34 it has become possible to estimate the correlated uncer-tainty of free-energy predictions from ensembles of electronic structure models whichallows for a direct evaluation of the underlying free-energy covariance matrix The ex-plicit propagation of these correlated uncertainties through the entire workflow rendersKiNetX a novel toolbox for statistico-kinetic modeling that significantly increases thereliability of mechanistic conclusions drawn from quantum chemical reaction data Weaimed at a very generic input format that does not enforce any context-related for-malities Even nonchemical problems (if expressed as mass action-type ODEs) can bestudied The uncertainty framework of KiNetX can in principle be coupled to any ofthe kinetic modeling codes mentioned above if a suitable parser is provided To examinethe importance of rigorous uncertainty propagation in kinetic modeling we developedan automated network generator AutoNetGen AutoNetGen creates chemistry-mimickingreaction networks based on chemical logic with which we can study an arbitrary numberof chemistry-like scenarios

This paper is organized as follows In Section 2 we introduce the basic equations ofkinetic modeling highlight the importance of uncertainty quantification for this field and

4

discuss the technical details of KiNetX and AutoNetGen In Section 3 we present theKiNetX workflow at a specific example and provide statistics on the reliability increaseby incorporating uncertainty quantification into the kinetic modeling framework

2 Theoretical and Algorithmic Details

21 Kinetic Modeling from a Network Perspective

We describe the structure of a chemical reaction network by a graph of N vertices andL bidirectional edges The network is strictly bidirectional as we assume every chemicaltransformation to be reversible Either of both edges corresponding to a reaction pair (areversible elementary reaction) is assigned an arbitrary but unique direction (forward +or backward minus)

We define the N -dimensional column vector of time-dependent species concentrations

y(t) =(y1(t) middot middot middot yN(t)

)gt (1)

which keeps track of the population density of each vertex at a given time Here yn(t)

refers to the concentration of the n-th chemical species at time t which is a strictlynonnegative quantity The 2L-dimensional column vector of rate constants

k =

(k+

kminus

)=(k+

1 middot middot middot k+L kminus1 middot middot middot kminusL

)gt (2)

contains strictly nonnegative scaling factors that determine the transition rate for eachdirection of an edge We define the (N timesL)-dimensional stoichiometry matrix of forwardreactions

S+ =

S+

11 middot middot middot S+1L

S+N1 middot middot middot S+

NL

(3)

where the element S+nl (n isin 1 middot middot middot N and l isin 1 middot middot middot L) describes the number of

molecules of the n-th species that is consumed in the l-th forward reaction Analogouslyto S+ we define the (N times L)-dimensional stoichiometry matrix of backward reactions

Sminus =

Sminus11 middot middot middot Sminus1L

SminusN1 middot middot middot SminusNL

(4)

All elements in S+ and Sminus are strictly nonnegative integers The combined (N times L)-dimensional stoichiometry matrix reads

S = Sminus minus S+ (5)

5

From the quantities introduced above and assuming mass action kinetics we canexpress the L-dimensional column vectors of forward (+) reaction rates and backward(minus) reaction rates as

f+minus(t) =(f

+minus1 (t) middot middot middot f

+minusL (t)

)gt (6)

with components

f+minusl (t) = k

+minusl

Nprodn=1

(yn(t)

)S+minusnl

(7)

By definition the product of concentrations in Eq (7) only equals zero if a speciesinvolved in the l-th forwardbackward reaction (indicated by a positive value of S+minus

nl ) isnot populated (zero concentration) since for all species not involved we obtain a factorof 1 due to S+minus

nl = 0 Based on the combined stoichiometry matrix S and the combinedL-dimensional column vector of reaction rates

f(t) = f+(t)minus fminus(t) (8)

we can express the time-dependent change in the species concentrations as

g(t) =(g1(t) middot middot middot gN(t)

)gt=

ddt

y(t) = S f(t) (9)

The relevant procedure prior to any kinetic analysis of reaction networks is the numericalintegration of g(t) which is therefore the central quantity in kinetic modeling

22 Uncertainty Quantification in Kinetic Modeling

The objective of uncertainty quantification in kinetic modeling34 is to assess the accuracy(or bias) and precision (or variance) of concentration profiles obtained from numericalintegration of ODE systems To obtain reliable results it may be necessary to investgreat effort in estimating the correlated uncertainty of model parameters (free-energydifferences or rate constants) The mathematical object we are searching for is the jointprobability distribution of model parameters47 One way to estimate parameter correlationis the backward propagation of uncertainty observed in measured concentration profiles48

which requires knowledge of the underlying mechanism containing all kinetically relevantelementary reaction steps This strategy is clearly appealing for verifying mechanismcompleteness but it does not serve our purpose of understanding chemical reactivityfrom a first-principles perspective

It has recently been shown that the neglect of parameter correlation in kinetic mod-els can easily lead to false mechanistic conclusions despite being derived from electronicstructure calculations2649 As the parameters of an electronic structure model are (un-known) functions of chemical space free energies of reaction pathways comprising simi-lar species will not change independently of each other when the value of an electronic

6

structure parameter is changed Fortunately recent statistical developments in quantumchemistry19ndash28 enable the estimation of correlated uncertainty in free-energy differencesStill reliably estimating the correlation between free energies is computationally hardas it requires sampling from ensembles of electronic structure models2526 and hencerepeated first-principles calculations for all species of the chemical system studied (in-cluding transition states and possibly other structures along the reaction pathway)Even if machine learning models were employed a comprehensive training set based ona vast number of electronic structure calculations would be necessary50 We have alreadydemonstrated the steps required for the propagation of correlated uncertainty in activa-tion free energies for a model network of the formose reaction26 Here we will generalizethe procedure for the study of arbitrary chemical systems

The estimation of uncertainty of a target quantity from the joint probability distri-bution of model parameters is referred to as forward uncertainty quantificationmdashalsoknown as uncertainty propagation The opposite procedure is referred to as inverse un-certainty quantification and is applied to calibration problems5152 (the backward prop-agation of concentration uncertainty mentioned above belongs to the latter approach)Every statistical analysis implemented in KiNetX is based on uncertainty propagationWe consider an ensemble of B + 1 vectors of rate constants that is obtained by drawingB samples from the joint probability distribution of activation free energies with meanE[A] = A0 and variance V[A] = ΣA which are subsequently mapped to rate constantsbased on eg Eyringrsquos transition state theory53 Each sampled vector of rate constantsis labeled kb with b isin 1 middot middot middot B An additional vector k0 is directly obtained fromA0 The ensemble of rate constant vectors KB = k0k1 middot middot middot kB is the basis for anybottom-up uncertainty analysis in kinetic modeling and is taken into account explicitlyby every subalgorithm of KiNetX

Note that the setup introduced here neglects third- and higher-order moments of thejoint probability distribution of activation free energies which may be a weak assumptionfor actual reaction networks derived from first principles To avoid these limitations onecan always sample directly from the ensemble of electronic structure models2526 whichrequires repeated first-principles calculations and is therefore rather inefficient comparedto sampling from ΣA Another possibility not yet explored by us is the application ofmatrix algebra to construct special matrices that simplify expressions for higher-ordermoments of joint probability distributions (in particular skewness and kurtosis)54

23 Overview of the KiNetX Meta-Algorithm

All reaction networks analyzed with KiNetX in this work were generated with AutoNet-Gen (Section 26) Both algorithms are written in Matlab55 For the numerical integration

7

of (generally stiff) ODE systems we interface to the ode15s module56 of MatlabThe KiNetX workflow consists of three core algorithms (Sections 241ndash243) all of

which take the correlated uncertainty in the underlying model parameters (rate constants)explicitly into account

1 Uncertainty propagation

Solve an ensemble of kinetic models derived from a unique reaction network andestimate the kinetic relevance of every species based on the maximum rate of for-mation (Section 241)

2 Network reduction Identify and eliminate all kinetically irrelevant vertices andedges of the network by applying a hierarchy of flux analyses resulting in a sparsenetwork that is a comprehensible representation of the underlying reaction mecha-nism (Section 242)

3 Sensitivity analysis Determine the effect of rate constant perturbations on time-dependent species concentrations through an extended version of Morris screening57

(Section 243)

The minimal input requirements for KiNetX are

bull A chemical reaction network with N vertices and 2L unidirectional edges (providedby AutoNetGen in this work)

bull A set of N initial species concentrations y0 (provided by the user)

bull An ensemble of B + 1 sets of 2L rate constants each

KB = kb (10)

with b = 0 middot middot middot B which may be derived from an ensemble of electronic structuremodels based on eg density functional theory222425 (provided by AutoNetGen inthis work)

bull A maximum reaction time tmax representing a practical time scale or a time scaleof interest (provided by the user)

At present we require the input to be provided in SI units Optional input parametersare

bull The maximum tolerable concentration error εy between the exact and an approx-imate solution

8

bull A flux threshold Gmin above which a chemical species will be considered kineticallyrelevant

bull The number of log-distributed time points U + 1 between t = 0 and t = tmax atwhich species concentrations will be evaluated based on cubic spline interpolation

bull A confidence level γ important for assessing network properties derived from anensemble of kinetic models

bull The number of Morris samples C considered in our global sensitivity analysis

bull Absolute and relative concentration error tolerances considered during numericalintegration of ODE systems

Every execution of KiNetX is based on one network with a fixed number of verticesand edges Hence for steering the exploration of reaction networks KiNetX needs tobe executed repeatedly We designed KiNetx to suggest the next exploration step byestimating the kinetic relevance of every species based on the maximum rate of forma-tion (Section 25) We will make the KiNetx program available through our webpage(scineethzch) in 2019

Currently KiNetX is limited to the analysis of homogeneous and isothermal reac-tive chemical systems in dilute solution which constitute a major category of chemicalsystems found in nature and examined in chemical research One prominent subcategorythereof that attracts much attention in fundamental and industrial research is homoge-neous catalysis a field that is rather underrepresented in the kinetic modeling literature58

For the study of dilute systems collisions involving three or more reactive species canbe considered negligible from a statistical point of view For this reason each element ofthe forward and backward reaction rate vectors f+ and fminus is of the form

f+minusl = k

+minusl y

n+minusl1

yn+minusl2

(11)

where the vertex index n is determined by the edge index (l) the direction (+ or minus) andthe (arbitrary) position in the rate equation (i = 1 2) In case of a unimolecular reactionwe simply set one of the two vertex indices to n+minus

li = 0 denoting a hypothetical null-species with a defined constant value of y0 = 1 independent of the unit of measurement

24 The KiNetX Workflow

241 Uncertainty Propagation

The first step of our KiNetX workflow is the numerical integration of an ensemble ofB + 1 kinetic models each of them representing a unique set of rate constants Clearly

9

the user also has the choice to choose B = 0 (leading to the usual kinetic modeling setupcomprising a single numerical integration) but KiNetX is actually designed to analyzean ensemble of kinetic models

To assess the magnitude of noise in the B solutions (y-uncertainty) resulting fromthe ensemble of sets of rate constants (k-uncertainty) we introduce two measures

δB(tu) =1

2B

Bsumb=1

Nsumn=1

∣∣ynb(tu)minus langyn(tu)rang∣∣ (12)

where langyn(tu)

rang=

1

B

Bsumb=1

ynb(tu) (13)

is the mean value of concentrations at time t = tu and

∆B =1

tmax

Usumu=1

(δB(tuminus12)

)middot(tu minus tuminus1

) (14)

The factor tmax in the denominator of Eq (14) equals the sum of time differences giventhat t0 = 0

tmax = tU = tU minus t0 =Usumu=1

(tu minus tuminus1) (15)

Furthermore we define

tuminus12 =1

2(tuminus1 + tu) (16)

According to Eq (12) δB(tu) represents the ensemble-averaged y-uncertainty summedover all species at time t = tu Here we focus on δB(tmax) as it measures the variabil-ity in the product distribution a network property of particular interest for syntheticchemists On the contrary ∆B represents a time-averaged y-uncertainty ie the overallvariability of species concentrations between t0 = 0 and tU = tmax The combination ofboth measures may be valuable for determining the underlying reaction mechanism Forinstance if δB(tmax) is close to zero but ∆B is rather large it is likely that we are facedwith both or either of the following two scenarios (i) The sequence of elementary steps isidentical or very similar for some k-vectors but the uncertainty of the activation barrierassociated with the rate-determining step is significant (ii) different k-vectors suggestdifferent routes to the same metastable sink of the reaction network Furthermore ifboth δB(tmax) and ∆B are below a user-defined concentration error εy we can safely ne-glect the ensemble of kinetic models and base all further kinetic analysis on the nominalsample represented by k0 only

When applying Eqs (12)+(14) it is important that the number and identity ofthe time points tu are the same for all solutions However it is very unlikely thatthe time points obtained from an ODE solver are identical for two different k-vectors

10

For this purpose KiNetX calculates a log-distributed vector of reference time points(t0 t1 middot middot middot tU)gt with t0 = 0 and tU = tmax which is a function of the user-defined pa-rameters tmax and U To obtain species concentrations at the same time points forother k-vectors KiNetX interpolates the corresponding solutions through cubic splinesmoothing Note that spline smoothing is only reasonable if data noise is negligible aproperty which one can assess easily in the case of one control variable (here time) Toevaluate the reliability of the cubic spline interpolations we compared them against re-sults from Gaussian process regression59 Under suitable assumptions Gaussian processregression yields an optimal trade-off between fitting and smoothness (cubic splines onlyensure the latter property) Hence it can be employed to predict concentrations frompreviously unconsidered values in the time domain (here the domain between t0 = 0 andtU = tmax) Furthermore as Gaussian process regression is a strictly Bayesian methodone may obtain reliable uncertainty estimates for each prediction However as this re-gression method scales cubically with the number of data points it is not suitable forrepetitive applications (usually hundreds of regressions would be necessary) In all casesstudied we found that both the mean deviation between the two interpolation meth-ods (cubic spline smoothing versus Gaussian process regression) and the mean predictivevariance obtained from Gaussian process regression are negligibly small compared to themean variance of the concentrations themselves (by a factor of lt10minus6) Hence data noiseis indeed negligible and cubic splines work well for interpolating concentration profiles

242 Network Reduction

We introduce a hierarchy of two reduction algorithms with increasing sophistication (interms of both rigor and required computing resources) which identify and eliminatekinetically irrelevant species and elementary reactions Initially we perform a detailedflux analysis followed by a local barrier analysis if the former analysis suggests a reducedmodel resulting in a concentration error that exceeds the user-defined threshold εy Bothalgorithms reflect our interpretation of established kinetic modeling concepts34

Detailed Flux Analysis (DFA) To keep track of the net concentration that haspassed an edge between t = 0 and t = tmax we integrate the absolute values of fl(t)over that time interval

Fl =

int t=tmax

t=0

|fl(t)| dt (17)

which we define as the edge flux corresponding to the l-th reaction pair The vector ofedge fluxes F is the basis for determining the vertex flux corresponding to the n-thspecies

Gn =(s+n + sminusn

)F (18)

11

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 4: Mechanism Deduction from Noisy Chemical Reaction Networks

(i) Translation of a reaction network endowed with uncertainty-equipped rate con-stants to an ensemble of kinetic models each model representing a unique set ofrate constants

(ii) numerical integration of the ensemble of (possibly stiff) kinetic models

(iii) identification of possibly relevant species based on noisy concentration data to guidethe search for new species

(iv) elimination of kinetically irrelevant species and elementary reactions based on noisyconcentration data (statistical mechanism deduction)

(v) global sensitivity analysis to rank reactions according to how much concentrationnoise the uncertainty of the underlying rate constants induce

Here we introduce KiNetX a meta-algorithm that accomplishes all of these tasksin a fully automated fashion The two key aspects of KiNetX are that it (i) can steerthe exploration of chemical reaction space in order to accelerate this exploration and (ii)enables routine statistico-kinetic analyses of reaction networks endowed with rate con-stants for which correlated uncertainty information is available This way we appreciaterecent developments related to the exploration of chemical reaction networks1ndash18 and theuncertainty quantification of reaction energies19ndash28 both derived from the first principlesof quantum mechanics The novelty of KiNetX consists in the uncertainty quantifica-tion approach it is built on Instead of making educated guesses consulting rules orfitting to experimental data34 it has become possible to estimate the correlated uncer-tainty of free-energy predictions from ensembles of electronic structure models whichallows for a direct evaluation of the underlying free-energy covariance matrix The ex-plicit propagation of these correlated uncertainties through the entire workflow rendersKiNetX a novel toolbox for statistico-kinetic modeling that significantly increases thereliability of mechanistic conclusions drawn from quantum chemical reaction data Weaimed at a very generic input format that does not enforce any context-related for-malities Even nonchemical problems (if expressed as mass action-type ODEs) can bestudied The uncertainty framework of KiNetX can in principle be coupled to any ofthe kinetic modeling codes mentioned above if a suitable parser is provided To examinethe importance of rigorous uncertainty propagation in kinetic modeling we developedan automated network generator AutoNetGen AutoNetGen creates chemistry-mimickingreaction networks based on chemical logic with which we can study an arbitrary numberof chemistry-like scenarios

This paper is organized as follows In Section 2 we introduce the basic equations ofkinetic modeling highlight the importance of uncertainty quantification for this field and

4

discuss the technical details of KiNetX and AutoNetGen In Section 3 we present theKiNetX workflow at a specific example and provide statistics on the reliability increaseby incorporating uncertainty quantification into the kinetic modeling framework

2 Theoretical and Algorithmic Details

21 Kinetic Modeling from a Network Perspective

We describe the structure of a chemical reaction network by a graph of N vertices andL bidirectional edges The network is strictly bidirectional as we assume every chemicaltransformation to be reversible Either of both edges corresponding to a reaction pair (areversible elementary reaction) is assigned an arbitrary but unique direction (forward +or backward minus)

We define the N -dimensional column vector of time-dependent species concentrations

y(t) =(y1(t) middot middot middot yN(t)

)gt (1)

which keeps track of the population density of each vertex at a given time Here yn(t)

refers to the concentration of the n-th chemical species at time t which is a strictlynonnegative quantity The 2L-dimensional column vector of rate constants

k =

(k+

kminus

)=(k+

1 middot middot middot k+L kminus1 middot middot middot kminusL

)gt (2)

contains strictly nonnegative scaling factors that determine the transition rate for eachdirection of an edge We define the (N timesL)-dimensional stoichiometry matrix of forwardreactions

S+ =

S+

11 middot middot middot S+1L

S+N1 middot middot middot S+

NL

(3)

where the element S+nl (n isin 1 middot middot middot N and l isin 1 middot middot middot L) describes the number of

molecules of the n-th species that is consumed in the l-th forward reaction Analogouslyto S+ we define the (N times L)-dimensional stoichiometry matrix of backward reactions

Sminus =

Sminus11 middot middot middot Sminus1L

SminusN1 middot middot middot SminusNL

(4)

All elements in S+ and Sminus are strictly nonnegative integers The combined (N times L)-dimensional stoichiometry matrix reads

S = Sminus minus S+ (5)

5

From the quantities introduced above and assuming mass action kinetics we canexpress the L-dimensional column vectors of forward (+) reaction rates and backward(minus) reaction rates as

f+minus(t) =(f

+minus1 (t) middot middot middot f

+minusL (t)

)gt (6)

with components

f+minusl (t) = k

+minusl

Nprodn=1

(yn(t)

)S+minusnl

(7)

By definition the product of concentrations in Eq (7) only equals zero if a speciesinvolved in the l-th forwardbackward reaction (indicated by a positive value of S+minus

nl ) isnot populated (zero concentration) since for all species not involved we obtain a factorof 1 due to S+minus

nl = 0 Based on the combined stoichiometry matrix S and the combinedL-dimensional column vector of reaction rates

f(t) = f+(t)minus fminus(t) (8)

we can express the time-dependent change in the species concentrations as

g(t) =(g1(t) middot middot middot gN(t)

)gt=

ddt

y(t) = S f(t) (9)

The relevant procedure prior to any kinetic analysis of reaction networks is the numericalintegration of g(t) which is therefore the central quantity in kinetic modeling

22 Uncertainty Quantification in Kinetic Modeling

The objective of uncertainty quantification in kinetic modeling34 is to assess the accuracy(or bias) and precision (or variance) of concentration profiles obtained from numericalintegration of ODE systems To obtain reliable results it may be necessary to investgreat effort in estimating the correlated uncertainty of model parameters (free-energydifferences or rate constants) The mathematical object we are searching for is the jointprobability distribution of model parameters47 One way to estimate parameter correlationis the backward propagation of uncertainty observed in measured concentration profiles48

which requires knowledge of the underlying mechanism containing all kinetically relevantelementary reaction steps This strategy is clearly appealing for verifying mechanismcompleteness but it does not serve our purpose of understanding chemical reactivityfrom a first-principles perspective

It has recently been shown that the neglect of parameter correlation in kinetic mod-els can easily lead to false mechanistic conclusions despite being derived from electronicstructure calculations2649 As the parameters of an electronic structure model are (un-known) functions of chemical space free energies of reaction pathways comprising simi-lar species will not change independently of each other when the value of an electronic

6

structure parameter is changed Fortunately recent statistical developments in quantumchemistry19ndash28 enable the estimation of correlated uncertainty in free-energy differencesStill reliably estimating the correlation between free energies is computationally hardas it requires sampling from ensembles of electronic structure models2526 and hencerepeated first-principles calculations for all species of the chemical system studied (in-cluding transition states and possibly other structures along the reaction pathway)Even if machine learning models were employed a comprehensive training set based ona vast number of electronic structure calculations would be necessary50 We have alreadydemonstrated the steps required for the propagation of correlated uncertainty in activa-tion free energies for a model network of the formose reaction26 Here we will generalizethe procedure for the study of arbitrary chemical systems

The estimation of uncertainty of a target quantity from the joint probability distri-bution of model parameters is referred to as forward uncertainty quantificationmdashalsoknown as uncertainty propagation The opposite procedure is referred to as inverse un-certainty quantification and is applied to calibration problems5152 (the backward prop-agation of concentration uncertainty mentioned above belongs to the latter approach)Every statistical analysis implemented in KiNetX is based on uncertainty propagationWe consider an ensemble of B + 1 vectors of rate constants that is obtained by drawingB samples from the joint probability distribution of activation free energies with meanE[A] = A0 and variance V[A] = ΣA which are subsequently mapped to rate constantsbased on eg Eyringrsquos transition state theory53 Each sampled vector of rate constantsis labeled kb with b isin 1 middot middot middot B An additional vector k0 is directly obtained fromA0 The ensemble of rate constant vectors KB = k0k1 middot middot middot kB is the basis for anybottom-up uncertainty analysis in kinetic modeling and is taken into account explicitlyby every subalgorithm of KiNetX

Note that the setup introduced here neglects third- and higher-order moments of thejoint probability distribution of activation free energies which may be a weak assumptionfor actual reaction networks derived from first principles To avoid these limitations onecan always sample directly from the ensemble of electronic structure models2526 whichrequires repeated first-principles calculations and is therefore rather inefficient comparedto sampling from ΣA Another possibility not yet explored by us is the application ofmatrix algebra to construct special matrices that simplify expressions for higher-ordermoments of joint probability distributions (in particular skewness and kurtosis)54

23 Overview of the KiNetX Meta-Algorithm

All reaction networks analyzed with KiNetX in this work were generated with AutoNet-Gen (Section 26) Both algorithms are written in Matlab55 For the numerical integration

7

of (generally stiff) ODE systems we interface to the ode15s module56 of MatlabThe KiNetX workflow consists of three core algorithms (Sections 241ndash243) all of

which take the correlated uncertainty in the underlying model parameters (rate constants)explicitly into account

1 Uncertainty propagation

Solve an ensemble of kinetic models derived from a unique reaction network andestimate the kinetic relevance of every species based on the maximum rate of for-mation (Section 241)

2 Network reduction Identify and eliminate all kinetically irrelevant vertices andedges of the network by applying a hierarchy of flux analyses resulting in a sparsenetwork that is a comprehensible representation of the underlying reaction mecha-nism (Section 242)

3 Sensitivity analysis Determine the effect of rate constant perturbations on time-dependent species concentrations through an extended version of Morris screening57

(Section 243)

The minimal input requirements for KiNetX are

bull A chemical reaction network with N vertices and 2L unidirectional edges (providedby AutoNetGen in this work)

bull A set of N initial species concentrations y0 (provided by the user)

bull An ensemble of B + 1 sets of 2L rate constants each

KB = kb (10)

with b = 0 middot middot middot B which may be derived from an ensemble of electronic structuremodels based on eg density functional theory222425 (provided by AutoNetGen inthis work)

bull A maximum reaction time tmax representing a practical time scale or a time scaleof interest (provided by the user)

At present we require the input to be provided in SI units Optional input parametersare

bull The maximum tolerable concentration error εy between the exact and an approx-imate solution

8

bull A flux threshold Gmin above which a chemical species will be considered kineticallyrelevant

bull The number of log-distributed time points U + 1 between t = 0 and t = tmax atwhich species concentrations will be evaluated based on cubic spline interpolation

bull A confidence level γ important for assessing network properties derived from anensemble of kinetic models

bull The number of Morris samples C considered in our global sensitivity analysis

bull Absolute and relative concentration error tolerances considered during numericalintegration of ODE systems

Every execution of KiNetX is based on one network with a fixed number of verticesand edges Hence for steering the exploration of reaction networks KiNetX needs tobe executed repeatedly We designed KiNetx to suggest the next exploration step byestimating the kinetic relevance of every species based on the maximum rate of forma-tion (Section 25) We will make the KiNetx program available through our webpage(scineethzch) in 2019

Currently KiNetX is limited to the analysis of homogeneous and isothermal reac-tive chemical systems in dilute solution which constitute a major category of chemicalsystems found in nature and examined in chemical research One prominent subcategorythereof that attracts much attention in fundamental and industrial research is homoge-neous catalysis a field that is rather underrepresented in the kinetic modeling literature58

For the study of dilute systems collisions involving three or more reactive species canbe considered negligible from a statistical point of view For this reason each element ofthe forward and backward reaction rate vectors f+ and fminus is of the form

f+minusl = k

+minusl y

n+minusl1

yn+minusl2

(11)

where the vertex index n is determined by the edge index (l) the direction (+ or minus) andthe (arbitrary) position in the rate equation (i = 1 2) In case of a unimolecular reactionwe simply set one of the two vertex indices to n+minus

li = 0 denoting a hypothetical null-species with a defined constant value of y0 = 1 independent of the unit of measurement

24 The KiNetX Workflow

241 Uncertainty Propagation

The first step of our KiNetX workflow is the numerical integration of an ensemble ofB + 1 kinetic models each of them representing a unique set of rate constants Clearly

9

the user also has the choice to choose B = 0 (leading to the usual kinetic modeling setupcomprising a single numerical integration) but KiNetX is actually designed to analyzean ensemble of kinetic models

To assess the magnitude of noise in the B solutions (y-uncertainty) resulting fromthe ensemble of sets of rate constants (k-uncertainty) we introduce two measures

δB(tu) =1

2B

Bsumb=1

Nsumn=1

∣∣ynb(tu)minus langyn(tu)rang∣∣ (12)

where langyn(tu)

rang=

1

B

Bsumb=1

ynb(tu) (13)

is the mean value of concentrations at time t = tu and

∆B =1

tmax

Usumu=1

(δB(tuminus12)

)middot(tu minus tuminus1

) (14)

The factor tmax in the denominator of Eq (14) equals the sum of time differences giventhat t0 = 0

tmax = tU = tU minus t0 =Usumu=1

(tu minus tuminus1) (15)

Furthermore we define

tuminus12 =1

2(tuminus1 + tu) (16)

According to Eq (12) δB(tu) represents the ensemble-averaged y-uncertainty summedover all species at time t = tu Here we focus on δB(tmax) as it measures the variabil-ity in the product distribution a network property of particular interest for syntheticchemists On the contrary ∆B represents a time-averaged y-uncertainty ie the overallvariability of species concentrations between t0 = 0 and tU = tmax The combination ofboth measures may be valuable for determining the underlying reaction mechanism Forinstance if δB(tmax) is close to zero but ∆B is rather large it is likely that we are facedwith both or either of the following two scenarios (i) The sequence of elementary steps isidentical or very similar for some k-vectors but the uncertainty of the activation barrierassociated with the rate-determining step is significant (ii) different k-vectors suggestdifferent routes to the same metastable sink of the reaction network Furthermore ifboth δB(tmax) and ∆B are below a user-defined concentration error εy we can safely ne-glect the ensemble of kinetic models and base all further kinetic analysis on the nominalsample represented by k0 only

When applying Eqs (12)+(14) it is important that the number and identity ofthe time points tu are the same for all solutions However it is very unlikely thatthe time points obtained from an ODE solver are identical for two different k-vectors

10

For this purpose KiNetX calculates a log-distributed vector of reference time points(t0 t1 middot middot middot tU)gt with t0 = 0 and tU = tmax which is a function of the user-defined pa-rameters tmax and U To obtain species concentrations at the same time points forother k-vectors KiNetX interpolates the corresponding solutions through cubic splinesmoothing Note that spline smoothing is only reasonable if data noise is negligible aproperty which one can assess easily in the case of one control variable (here time) Toevaluate the reliability of the cubic spline interpolations we compared them against re-sults from Gaussian process regression59 Under suitable assumptions Gaussian processregression yields an optimal trade-off between fitting and smoothness (cubic splines onlyensure the latter property) Hence it can be employed to predict concentrations frompreviously unconsidered values in the time domain (here the domain between t0 = 0 andtU = tmax) Furthermore as Gaussian process regression is a strictly Bayesian methodone may obtain reliable uncertainty estimates for each prediction However as this re-gression method scales cubically with the number of data points it is not suitable forrepetitive applications (usually hundreds of regressions would be necessary) In all casesstudied we found that both the mean deviation between the two interpolation meth-ods (cubic spline smoothing versus Gaussian process regression) and the mean predictivevariance obtained from Gaussian process regression are negligibly small compared to themean variance of the concentrations themselves (by a factor of lt10minus6) Hence data noiseis indeed negligible and cubic splines work well for interpolating concentration profiles

242 Network Reduction

We introduce a hierarchy of two reduction algorithms with increasing sophistication (interms of both rigor and required computing resources) which identify and eliminatekinetically irrelevant species and elementary reactions Initially we perform a detailedflux analysis followed by a local barrier analysis if the former analysis suggests a reducedmodel resulting in a concentration error that exceeds the user-defined threshold εy Bothalgorithms reflect our interpretation of established kinetic modeling concepts34

Detailed Flux Analysis (DFA) To keep track of the net concentration that haspassed an edge between t = 0 and t = tmax we integrate the absolute values of fl(t)over that time interval

Fl =

int t=tmax

t=0

|fl(t)| dt (17)

which we define as the edge flux corresponding to the l-th reaction pair The vector ofedge fluxes F is the basis for determining the vertex flux corresponding to the n-thspecies

Gn =(s+n + sminusn

)F (18)

11

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 5: Mechanism Deduction from Noisy Chemical Reaction Networks

discuss the technical details of KiNetX and AutoNetGen In Section 3 we present theKiNetX workflow at a specific example and provide statistics on the reliability increaseby incorporating uncertainty quantification into the kinetic modeling framework

2 Theoretical and Algorithmic Details

21 Kinetic Modeling from a Network Perspective

We describe the structure of a chemical reaction network by a graph of N vertices andL bidirectional edges The network is strictly bidirectional as we assume every chemicaltransformation to be reversible Either of both edges corresponding to a reaction pair (areversible elementary reaction) is assigned an arbitrary but unique direction (forward +or backward minus)

We define the N -dimensional column vector of time-dependent species concentrations

y(t) =(y1(t) middot middot middot yN(t)

)gt (1)

which keeps track of the population density of each vertex at a given time Here yn(t)

refers to the concentration of the n-th chemical species at time t which is a strictlynonnegative quantity The 2L-dimensional column vector of rate constants

k =

(k+

kminus

)=(k+

1 middot middot middot k+L kminus1 middot middot middot kminusL

)gt (2)

contains strictly nonnegative scaling factors that determine the transition rate for eachdirection of an edge We define the (N timesL)-dimensional stoichiometry matrix of forwardreactions

S+ =

S+

11 middot middot middot S+1L

S+N1 middot middot middot S+

NL

(3)

where the element S+nl (n isin 1 middot middot middot N and l isin 1 middot middot middot L) describes the number of

molecules of the n-th species that is consumed in the l-th forward reaction Analogouslyto S+ we define the (N times L)-dimensional stoichiometry matrix of backward reactions

Sminus =

Sminus11 middot middot middot Sminus1L

SminusN1 middot middot middot SminusNL

(4)

All elements in S+ and Sminus are strictly nonnegative integers The combined (N times L)-dimensional stoichiometry matrix reads

S = Sminus minus S+ (5)

5

From the quantities introduced above and assuming mass action kinetics we canexpress the L-dimensional column vectors of forward (+) reaction rates and backward(minus) reaction rates as

f+minus(t) =(f

+minus1 (t) middot middot middot f

+minusL (t)

)gt (6)

with components

f+minusl (t) = k

+minusl

Nprodn=1

(yn(t)

)S+minusnl

(7)

By definition the product of concentrations in Eq (7) only equals zero if a speciesinvolved in the l-th forwardbackward reaction (indicated by a positive value of S+minus

nl ) isnot populated (zero concentration) since for all species not involved we obtain a factorof 1 due to S+minus

nl = 0 Based on the combined stoichiometry matrix S and the combinedL-dimensional column vector of reaction rates

f(t) = f+(t)minus fminus(t) (8)

we can express the time-dependent change in the species concentrations as

g(t) =(g1(t) middot middot middot gN(t)

)gt=

ddt

y(t) = S f(t) (9)

The relevant procedure prior to any kinetic analysis of reaction networks is the numericalintegration of g(t) which is therefore the central quantity in kinetic modeling

22 Uncertainty Quantification in Kinetic Modeling

The objective of uncertainty quantification in kinetic modeling34 is to assess the accuracy(or bias) and precision (or variance) of concentration profiles obtained from numericalintegration of ODE systems To obtain reliable results it may be necessary to investgreat effort in estimating the correlated uncertainty of model parameters (free-energydifferences or rate constants) The mathematical object we are searching for is the jointprobability distribution of model parameters47 One way to estimate parameter correlationis the backward propagation of uncertainty observed in measured concentration profiles48

which requires knowledge of the underlying mechanism containing all kinetically relevantelementary reaction steps This strategy is clearly appealing for verifying mechanismcompleteness but it does not serve our purpose of understanding chemical reactivityfrom a first-principles perspective

It has recently been shown that the neglect of parameter correlation in kinetic mod-els can easily lead to false mechanistic conclusions despite being derived from electronicstructure calculations2649 As the parameters of an electronic structure model are (un-known) functions of chemical space free energies of reaction pathways comprising simi-lar species will not change independently of each other when the value of an electronic

6

structure parameter is changed Fortunately recent statistical developments in quantumchemistry19ndash28 enable the estimation of correlated uncertainty in free-energy differencesStill reliably estimating the correlation between free energies is computationally hardas it requires sampling from ensembles of electronic structure models2526 and hencerepeated first-principles calculations for all species of the chemical system studied (in-cluding transition states and possibly other structures along the reaction pathway)Even if machine learning models were employed a comprehensive training set based ona vast number of electronic structure calculations would be necessary50 We have alreadydemonstrated the steps required for the propagation of correlated uncertainty in activa-tion free energies for a model network of the formose reaction26 Here we will generalizethe procedure for the study of arbitrary chemical systems

The estimation of uncertainty of a target quantity from the joint probability distri-bution of model parameters is referred to as forward uncertainty quantificationmdashalsoknown as uncertainty propagation The opposite procedure is referred to as inverse un-certainty quantification and is applied to calibration problems5152 (the backward prop-agation of concentration uncertainty mentioned above belongs to the latter approach)Every statistical analysis implemented in KiNetX is based on uncertainty propagationWe consider an ensemble of B + 1 vectors of rate constants that is obtained by drawingB samples from the joint probability distribution of activation free energies with meanE[A] = A0 and variance V[A] = ΣA which are subsequently mapped to rate constantsbased on eg Eyringrsquos transition state theory53 Each sampled vector of rate constantsis labeled kb with b isin 1 middot middot middot B An additional vector k0 is directly obtained fromA0 The ensemble of rate constant vectors KB = k0k1 middot middot middot kB is the basis for anybottom-up uncertainty analysis in kinetic modeling and is taken into account explicitlyby every subalgorithm of KiNetX

Note that the setup introduced here neglects third- and higher-order moments of thejoint probability distribution of activation free energies which may be a weak assumptionfor actual reaction networks derived from first principles To avoid these limitations onecan always sample directly from the ensemble of electronic structure models2526 whichrequires repeated first-principles calculations and is therefore rather inefficient comparedto sampling from ΣA Another possibility not yet explored by us is the application ofmatrix algebra to construct special matrices that simplify expressions for higher-ordermoments of joint probability distributions (in particular skewness and kurtosis)54

23 Overview of the KiNetX Meta-Algorithm

All reaction networks analyzed with KiNetX in this work were generated with AutoNet-Gen (Section 26) Both algorithms are written in Matlab55 For the numerical integration

7

of (generally stiff) ODE systems we interface to the ode15s module56 of MatlabThe KiNetX workflow consists of three core algorithms (Sections 241ndash243) all of

which take the correlated uncertainty in the underlying model parameters (rate constants)explicitly into account

1 Uncertainty propagation

Solve an ensemble of kinetic models derived from a unique reaction network andestimate the kinetic relevance of every species based on the maximum rate of for-mation (Section 241)

2 Network reduction Identify and eliminate all kinetically irrelevant vertices andedges of the network by applying a hierarchy of flux analyses resulting in a sparsenetwork that is a comprehensible representation of the underlying reaction mecha-nism (Section 242)

3 Sensitivity analysis Determine the effect of rate constant perturbations on time-dependent species concentrations through an extended version of Morris screening57

(Section 243)

The minimal input requirements for KiNetX are

bull A chemical reaction network with N vertices and 2L unidirectional edges (providedby AutoNetGen in this work)

bull A set of N initial species concentrations y0 (provided by the user)

bull An ensemble of B + 1 sets of 2L rate constants each

KB = kb (10)

with b = 0 middot middot middot B which may be derived from an ensemble of electronic structuremodels based on eg density functional theory222425 (provided by AutoNetGen inthis work)

bull A maximum reaction time tmax representing a practical time scale or a time scaleof interest (provided by the user)

At present we require the input to be provided in SI units Optional input parametersare

bull The maximum tolerable concentration error εy between the exact and an approx-imate solution

8

bull A flux threshold Gmin above which a chemical species will be considered kineticallyrelevant

bull The number of log-distributed time points U + 1 between t = 0 and t = tmax atwhich species concentrations will be evaluated based on cubic spline interpolation

bull A confidence level γ important for assessing network properties derived from anensemble of kinetic models

bull The number of Morris samples C considered in our global sensitivity analysis

bull Absolute and relative concentration error tolerances considered during numericalintegration of ODE systems

Every execution of KiNetX is based on one network with a fixed number of verticesand edges Hence for steering the exploration of reaction networks KiNetX needs tobe executed repeatedly We designed KiNetx to suggest the next exploration step byestimating the kinetic relevance of every species based on the maximum rate of forma-tion (Section 25) We will make the KiNetx program available through our webpage(scineethzch) in 2019

Currently KiNetX is limited to the analysis of homogeneous and isothermal reac-tive chemical systems in dilute solution which constitute a major category of chemicalsystems found in nature and examined in chemical research One prominent subcategorythereof that attracts much attention in fundamental and industrial research is homoge-neous catalysis a field that is rather underrepresented in the kinetic modeling literature58

For the study of dilute systems collisions involving three or more reactive species canbe considered negligible from a statistical point of view For this reason each element ofthe forward and backward reaction rate vectors f+ and fminus is of the form

f+minusl = k

+minusl y

n+minusl1

yn+minusl2

(11)

where the vertex index n is determined by the edge index (l) the direction (+ or minus) andthe (arbitrary) position in the rate equation (i = 1 2) In case of a unimolecular reactionwe simply set one of the two vertex indices to n+minus

li = 0 denoting a hypothetical null-species with a defined constant value of y0 = 1 independent of the unit of measurement

24 The KiNetX Workflow

241 Uncertainty Propagation

The first step of our KiNetX workflow is the numerical integration of an ensemble ofB + 1 kinetic models each of them representing a unique set of rate constants Clearly

9

the user also has the choice to choose B = 0 (leading to the usual kinetic modeling setupcomprising a single numerical integration) but KiNetX is actually designed to analyzean ensemble of kinetic models

To assess the magnitude of noise in the B solutions (y-uncertainty) resulting fromthe ensemble of sets of rate constants (k-uncertainty) we introduce two measures

δB(tu) =1

2B

Bsumb=1

Nsumn=1

∣∣ynb(tu)minus langyn(tu)rang∣∣ (12)

where langyn(tu)

rang=

1

B

Bsumb=1

ynb(tu) (13)

is the mean value of concentrations at time t = tu and

∆B =1

tmax

Usumu=1

(δB(tuminus12)

)middot(tu minus tuminus1

) (14)

The factor tmax in the denominator of Eq (14) equals the sum of time differences giventhat t0 = 0

tmax = tU = tU minus t0 =Usumu=1

(tu minus tuminus1) (15)

Furthermore we define

tuminus12 =1

2(tuminus1 + tu) (16)

According to Eq (12) δB(tu) represents the ensemble-averaged y-uncertainty summedover all species at time t = tu Here we focus on δB(tmax) as it measures the variabil-ity in the product distribution a network property of particular interest for syntheticchemists On the contrary ∆B represents a time-averaged y-uncertainty ie the overallvariability of species concentrations between t0 = 0 and tU = tmax The combination ofboth measures may be valuable for determining the underlying reaction mechanism Forinstance if δB(tmax) is close to zero but ∆B is rather large it is likely that we are facedwith both or either of the following two scenarios (i) The sequence of elementary steps isidentical or very similar for some k-vectors but the uncertainty of the activation barrierassociated with the rate-determining step is significant (ii) different k-vectors suggestdifferent routes to the same metastable sink of the reaction network Furthermore ifboth δB(tmax) and ∆B are below a user-defined concentration error εy we can safely ne-glect the ensemble of kinetic models and base all further kinetic analysis on the nominalsample represented by k0 only

When applying Eqs (12)+(14) it is important that the number and identity ofthe time points tu are the same for all solutions However it is very unlikely thatthe time points obtained from an ODE solver are identical for two different k-vectors

10

For this purpose KiNetX calculates a log-distributed vector of reference time points(t0 t1 middot middot middot tU)gt with t0 = 0 and tU = tmax which is a function of the user-defined pa-rameters tmax and U To obtain species concentrations at the same time points forother k-vectors KiNetX interpolates the corresponding solutions through cubic splinesmoothing Note that spline smoothing is only reasonable if data noise is negligible aproperty which one can assess easily in the case of one control variable (here time) Toevaluate the reliability of the cubic spline interpolations we compared them against re-sults from Gaussian process regression59 Under suitable assumptions Gaussian processregression yields an optimal trade-off between fitting and smoothness (cubic splines onlyensure the latter property) Hence it can be employed to predict concentrations frompreviously unconsidered values in the time domain (here the domain between t0 = 0 andtU = tmax) Furthermore as Gaussian process regression is a strictly Bayesian methodone may obtain reliable uncertainty estimates for each prediction However as this re-gression method scales cubically with the number of data points it is not suitable forrepetitive applications (usually hundreds of regressions would be necessary) In all casesstudied we found that both the mean deviation between the two interpolation meth-ods (cubic spline smoothing versus Gaussian process regression) and the mean predictivevariance obtained from Gaussian process regression are negligibly small compared to themean variance of the concentrations themselves (by a factor of lt10minus6) Hence data noiseis indeed negligible and cubic splines work well for interpolating concentration profiles

242 Network Reduction

We introduce a hierarchy of two reduction algorithms with increasing sophistication (interms of both rigor and required computing resources) which identify and eliminatekinetically irrelevant species and elementary reactions Initially we perform a detailedflux analysis followed by a local barrier analysis if the former analysis suggests a reducedmodel resulting in a concentration error that exceeds the user-defined threshold εy Bothalgorithms reflect our interpretation of established kinetic modeling concepts34

Detailed Flux Analysis (DFA) To keep track of the net concentration that haspassed an edge between t = 0 and t = tmax we integrate the absolute values of fl(t)over that time interval

Fl =

int t=tmax

t=0

|fl(t)| dt (17)

which we define as the edge flux corresponding to the l-th reaction pair The vector ofedge fluxes F is the basis for determining the vertex flux corresponding to the n-thspecies

Gn =(s+n + sminusn

)F (18)

11

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 6: Mechanism Deduction from Noisy Chemical Reaction Networks

From the quantities introduced above and assuming mass action kinetics we canexpress the L-dimensional column vectors of forward (+) reaction rates and backward(minus) reaction rates as

f+minus(t) =(f

+minus1 (t) middot middot middot f

+minusL (t)

)gt (6)

with components

f+minusl (t) = k

+minusl

Nprodn=1

(yn(t)

)S+minusnl

(7)

By definition the product of concentrations in Eq (7) only equals zero if a speciesinvolved in the l-th forwardbackward reaction (indicated by a positive value of S+minus

nl ) isnot populated (zero concentration) since for all species not involved we obtain a factorof 1 due to S+minus

nl = 0 Based on the combined stoichiometry matrix S and the combinedL-dimensional column vector of reaction rates

f(t) = f+(t)minus fminus(t) (8)

we can express the time-dependent change in the species concentrations as

g(t) =(g1(t) middot middot middot gN(t)

)gt=

ddt

y(t) = S f(t) (9)

The relevant procedure prior to any kinetic analysis of reaction networks is the numericalintegration of g(t) which is therefore the central quantity in kinetic modeling

22 Uncertainty Quantification in Kinetic Modeling

The objective of uncertainty quantification in kinetic modeling34 is to assess the accuracy(or bias) and precision (or variance) of concentration profiles obtained from numericalintegration of ODE systems To obtain reliable results it may be necessary to investgreat effort in estimating the correlated uncertainty of model parameters (free-energydifferences or rate constants) The mathematical object we are searching for is the jointprobability distribution of model parameters47 One way to estimate parameter correlationis the backward propagation of uncertainty observed in measured concentration profiles48

which requires knowledge of the underlying mechanism containing all kinetically relevantelementary reaction steps This strategy is clearly appealing for verifying mechanismcompleteness but it does not serve our purpose of understanding chemical reactivityfrom a first-principles perspective

It has recently been shown that the neglect of parameter correlation in kinetic mod-els can easily lead to false mechanistic conclusions despite being derived from electronicstructure calculations2649 As the parameters of an electronic structure model are (un-known) functions of chemical space free energies of reaction pathways comprising simi-lar species will not change independently of each other when the value of an electronic

6

structure parameter is changed Fortunately recent statistical developments in quantumchemistry19ndash28 enable the estimation of correlated uncertainty in free-energy differencesStill reliably estimating the correlation between free energies is computationally hardas it requires sampling from ensembles of electronic structure models2526 and hencerepeated first-principles calculations for all species of the chemical system studied (in-cluding transition states and possibly other structures along the reaction pathway)Even if machine learning models were employed a comprehensive training set based ona vast number of electronic structure calculations would be necessary50 We have alreadydemonstrated the steps required for the propagation of correlated uncertainty in activa-tion free energies for a model network of the formose reaction26 Here we will generalizethe procedure for the study of arbitrary chemical systems

The estimation of uncertainty of a target quantity from the joint probability distri-bution of model parameters is referred to as forward uncertainty quantificationmdashalsoknown as uncertainty propagation The opposite procedure is referred to as inverse un-certainty quantification and is applied to calibration problems5152 (the backward prop-agation of concentration uncertainty mentioned above belongs to the latter approach)Every statistical analysis implemented in KiNetX is based on uncertainty propagationWe consider an ensemble of B + 1 vectors of rate constants that is obtained by drawingB samples from the joint probability distribution of activation free energies with meanE[A] = A0 and variance V[A] = ΣA which are subsequently mapped to rate constantsbased on eg Eyringrsquos transition state theory53 Each sampled vector of rate constantsis labeled kb with b isin 1 middot middot middot B An additional vector k0 is directly obtained fromA0 The ensemble of rate constant vectors KB = k0k1 middot middot middot kB is the basis for anybottom-up uncertainty analysis in kinetic modeling and is taken into account explicitlyby every subalgorithm of KiNetX

Note that the setup introduced here neglects third- and higher-order moments of thejoint probability distribution of activation free energies which may be a weak assumptionfor actual reaction networks derived from first principles To avoid these limitations onecan always sample directly from the ensemble of electronic structure models2526 whichrequires repeated first-principles calculations and is therefore rather inefficient comparedto sampling from ΣA Another possibility not yet explored by us is the application ofmatrix algebra to construct special matrices that simplify expressions for higher-ordermoments of joint probability distributions (in particular skewness and kurtosis)54

23 Overview of the KiNetX Meta-Algorithm

All reaction networks analyzed with KiNetX in this work were generated with AutoNet-Gen (Section 26) Both algorithms are written in Matlab55 For the numerical integration

7

of (generally stiff) ODE systems we interface to the ode15s module56 of MatlabThe KiNetX workflow consists of three core algorithms (Sections 241ndash243) all of

which take the correlated uncertainty in the underlying model parameters (rate constants)explicitly into account

1 Uncertainty propagation

Solve an ensemble of kinetic models derived from a unique reaction network andestimate the kinetic relevance of every species based on the maximum rate of for-mation (Section 241)

2 Network reduction Identify and eliminate all kinetically irrelevant vertices andedges of the network by applying a hierarchy of flux analyses resulting in a sparsenetwork that is a comprehensible representation of the underlying reaction mecha-nism (Section 242)

3 Sensitivity analysis Determine the effect of rate constant perturbations on time-dependent species concentrations through an extended version of Morris screening57

(Section 243)

The minimal input requirements for KiNetX are

bull A chemical reaction network with N vertices and 2L unidirectional edges (providedby AutoNetGen in this work)

bull A set of N initial species concentrations y0 (provided by the user)

bull An ensemble of B + 1 sets of 2L rate constants each

KB = kb (10)

with b = 0 middot middot middot B which may be derived from an ensemble of electronic structuremodels based on eg density functional theory222425 (provided by AutoNetGen inthis work)

bull A maximum reaction time tmax representing a practical time scale or a time scaleof interest (provided by the user)

At present we require the input to be provided in SI units Optional input parametersare

bull The maximum tolerable concentration error εy between the exact and an approx-imate solution

8

bull A flux threshold Gmin above which a chemical species will be considered kineticallyrelevant

bull The number of log-distributed time points U + 1 between t = 0 and t = tmax atwhich species concentrations will be evaluated based on cubic spline interpolation

bull A confidence level γ important for assessing network properties derived from anensemble of kinetic models

bull The number of Morris samples C considered in our global sensitivity analysis

bull Absolute and relative concentration error tolerances considered during numericalintegration of ODE systems

Every execution of KiNetX is based on one network with a fixed number of verticesand edges Hence for steering the exploration of reaction networks KiNetX needs tobe executed repeatedly We designed KiNetx to suggest the next exploration step byestimating the kinetic relevance of every species based on the maximum rate of forma-tion (Section 25) We will make the KiNetx program available through our webpage(scineethzch) in 2019

Currently KiNetX is limited to the analysis of homogeneous and isothermal reac-tive chemical systems in dilute solution which constitute a major category of chemicalsystems found in nature and examined in chemical research One prominent subcategorythereof that attracts much attention in fundamental and industrial research is homoge-neous catalysis a field that is rather underrepresented in the kinetic modeling literature58

For the study of dilute systems collisions involving three or more reactive species canbe considered negligible from a statistical point of view For this reason each element ofthe forward and backward reaction rate vectors f+ and fminus is of the form

f+minusl = k

+minusl y

n+minusl1

yn+minusl2

(11)

where the vertex index n is determined by the edge index (l) the direction (+ or minus) andthe (arbitrary) position in the rate equation (i = 1 2) In case of a unimolecular reactionwe simply set one of the two vertex indices to n+minus

li = 0 denoting a hypothetical null-species with a defined constant value of y0 = 1 independent of the unit of measurement

24 The KiNetX Workflow

241 Uncertainty Propagation

The first step of our KiNetX workflow is the numerical integration of an ensemble ofB + 1 kinetic models each of them representing a unique set of rate constants Clearly

9

the user also has the choice to choose B = 0 (leading to the usual kinetic modeling setupcomprising a single numerical integration) but KiNetX is actually designed to analyzean ensemble of kinetic models

To assess the magnitude of noise in the B solutions (y-uncertainty) resulting fromthe ensemble of sets of rate constants (k-uncertainty) we introduce two measures

δB(tu) =1

2B

Bsumb=1

Nsumn=1

∣∣ynb(tu)minus langyn(tu)rang∣∣ (12)

where langyn(tu)

rang=

1

B

Bsumb=1

ynb(tu) (13)

is the mean value of concentrations at time t = tu and

∆B =1

tmax

Usumu=1

(δB(tuminus12)

)middot(tu minus tuminus1

) (14)

The factor tmax in the denominator of Eq (14) equals the sum of time differences giventhat t0 = 0

tmax = tU = tU minus t0 =Usumu=1

(tu minus tuminus1) (15)

Furthermore we define

tuminus12 =1

2(tuminus1 + tu) (16)

According to Eq (12) δB(tu) represents the ensemble-averaged y-uncertainty summedover all species at time t = tu Here we focus on δB(tmax) as it measures the variabil-ity in the product distribution a network property of particular interest for syntheticchemists On the contrary ∆B represents a time-averaged y-uncertainty ie the overallvariability of species concentrations between t0 = 0 and tU = tmax The combination ofboth measures may be valuable for determining the underlying reaction mechanism Forinstance if δB(tmax) is close to zero but ∆B is rather large it is likely that we are facedwith both or either of the following two scenarios (i) The sequence of elementary steps isidentical or very similar for some k-vectors but the uncertainty of the activation barrierassociated with the rate-determining step is significant (ii) different k-vectors suggestdifferent routes to the same metastable sink of the reaction network Furthermore ifboth δB(tmax) and ∆B are below a user-defined concentration error εy we can safely ne-glect the ensemble of kinetic models and base all further kinetic analysis on the nominalsample represented by k0 only

When applying Eqs (12)+(14) it is important that the number and identity ofthe time points tu are the same for all solutions However it is very unlikely thatthe time points obtained from an ODE solver are identical for two different k-vectors

10

For this purpose KiNetX calculates a log-distributed vector of reference time points(t0 t1 middot middot middot tU)gt with t0 = 0 and tU = tmax which is a function of the user-defined pa-rameters tmax and U To obtain species concentrations at the same time points forother k-vectors KiNetX interpolates the corresponding solutions through cubic splinesmoothing Note that spline smoothing is only reasonable if data noise is negligible aproperty which one can assess easily in the case of one control variable (here time) Toevaluate the reliability of the cubic spline interpolations we compared them against re-sults from Gaussian process regression59 Under suitable assumptions Gaussian processregression yields an optimal trade-off between fitting and smoothness (cubic splines onlyensure the latter property) Hence it can be employed to predict concentrations frompreviously unconsidered values in the time domain (here the domain between t0 = 0 andtU = tmax) Furthermore as Gaussian process regression is a strictly Bayesian methodone may obtain reliable uncertainty estimates for each prediction However as this re-gression method scales cubically with the number of data points it is not suitable forrepetitive applications (usually hundreds of regressions would be necessary) In all casesstudied we found that both the mean deviation between the two interpolation meth-ods (cubic spline smoothing versus Gaussian process regression) and the mean predictivevariance obtained from Gaussian process regression are negligibly small compared to themean variance of the concentrations themselves (by a factor of lt10minus6) Hence data noiseis indeed negligible and cubic splines work well for interpolating concentration profiles

242 Network Reduction

We introduce a hierarchy of two reduction algorithms with increasing sophistication (interms of both rigor and required computing resources) which identify and eliminatekinetically irrelevant species and elementary reactions Initially we perform a detailedflux analysis followed by a local barrier analysis if the former analysis suggests a reducedmodel resulting in a concentration error that exceeds the user-defined threshold εy Bothalgorithms reflect our interpretation of established kinetic modeling concepts34

Detailed Flux Analysis (DFA) To keep track of the net concentration that haspassed an edge between t = 0 and t = tmax we integrate the absolute values of fl(t)over that time interval

Fl =

int t=tmax

t=0

|fl(t)| dt (17)

which we define as the edge flux corresponding to the l-th reaction pair The vector ofedge fluxes F is the basis for determining the vertex flux corresponding to the n-thspecies

Gn =(s+n + sminusn

)F (18)

11

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 7: Mechanism Deduction from Noisy Chemical Reaction Networks

structure parameter is changed Fortunately recent statistical developments in quantumchemistry19ndash28 enable the estimation of correlated uncertainty in free-energy differencesStill reliably estimating the correlation between free energies is computationally hardas it requires sampling from ensembles of electronic structure models2526 and hencerepeated first-principles calculations for all species of the chemical system studied (in-cluding transition states and possibly other structures along the reaction pathway)Even if machine learning models were employed a comprehensive training set based ona vast number of electronic structure calculations would be necessary50 We have alreadydemonstrated the steps required for the propagation of correlated uncertainty in activa-tion free energies for a model network of the formose reaction26 Here we will generalizethe procedure for the study of arbitrary chemical systems

The estimation of uncertainty of a target quantity from the joint probability distri-bution of model parameters is referred to as forward uncertainty quantificationmdashalsoknown as uncertainty propagation The opposite procedure is referred to as inverse un-certainty quantification and is applied to calibration problems5152 (the backward prop-agation of concentration uncertainty mentioned above belongs to the latter approach)Every statistical analysis implemented in KiNetX is based on uncertainty propagationWe consider an ensemble of B + 1 vectors of rate constants that is obtained by drawingB samples from the joint probability distribution of activation free energies with meanE[A] = A0 and variance V[A] = ΣA which are subsequently mapped to rate constantsbased on eg Eyringrsquos transition state theory53 Each sampled vector of rate constantsis labeled kb with b isin 1 middot middot middot B An additional vector k0 is directly obtained fromA0 The ensemble of rate constant vectors KB = k0k1 middot middot middot kB is the basis for anybottom-up uncertainty analysis in kinetic modeling and is taken into account explicitlyby every subalgorithm of KiNetX

Note that the setup introduced here neglects third- and higher-order moments of thejoint probability distribution of activation free energies which may be a weak assumptionfor actual reaction networks derived from first principles To avoid these limitations onecan always sample directly from the ensemble of electronic structure models2526 whichrequires repeated first-principles calculations and is therefore rather inefficient comparedto sampling from ΣA Another possibility not yet explored by us is the application ofmatrix algebra to construct special matrices that simplify expressions for higher-ordermoments of joint probability distributions (in particular skewness and kurtosis)54

23 Overview of the KiNetX Meta-Algorithm

All reaction networks analyzed with KiNetX in this work were generated with AutoNet-Gen (Section 26) Both algorithms are written in Matlab55 For the numerical integration

7

of (generally stiff) ODE systems we interface to the ode15s module56 of MatlabThe KiNetX workflow consists of three core algorithms (Sections 241ndash243) all of

which take the correlated uncertainty in the underlying model parameters (rate constants)explicitly into account

1 Uncertainty propagation

Solve an ensemble of kinetic models derived from a unique reaction network andestimate the kinetic relevance of every species based on the maximum rate of for-mation (Section 241)

2 Network reduction Identify and eliminate all kinetically irrelevant vertices andedges of the network by applying a hierarchy of flux analyses resulting in a sparsenetwork that is a comprehensible representation of the underlying reaction mecha-nism (Section 242)

3 Sensitivity analysis Determine the effect of rate constant perturbations on time-dependent species concentrations through an extended version of Morris screening57

(Section 243)

The minimal input requirements for KiNetX are

bull A chemical reaction network with N vertices and 2L unidirectional edges (providedby AutoNetGen in this work)

bull A set of N initial species concentrations y0 (provided by the user)

bull An ensemble of B + 1 sets of 2L rate constants each

KB = kb (10)

with b = 0 middot middot middot B which may be derived from an ensemble of electronic structuremodels based on eg density functional theory222425 (provided by AutoNetGen inthis work)

bull A maximum reaction time tmax representing a practical time scale or a time scaleof interest (provided by the user)

At present we require the input to be provided in SI units Optional input parametersare

bull The maximum tolerable concentration error εy between the exact and an approx-imate solution

8

bull A flux threshold Gmin above which a chemical species will be considered kineticallyrelevant

bull The number of log-distributed time points U + 1 between t = 0 and t = tmax atwhich species concentrations will be evaluated based on cubic spline interpolation

bull A confidence level γ important for assessing network properties derived from anensemble of kinetic models

bull The number of Morris samples C considered in our global sensitivity analysis

bull Absolute and relative concentration error tolerances considered during numericalintegration of ODE systems

Every execution of KiNetX is based on one network with a fixed number of verticesand edges Hence for steering the exploration of reaction networks KiNetX needs tobe executed repeatedly We designed KiNetx to suggest the next exploration step byestimating the kinetic relevance of every species based on the maximum rate of forma-tion (Section 25) We will make the KiNetx program available through our webpage(scineethzch) in 2019

Currently KiNetX is limited to the analysis of homogeneous and isothermal reac-tive chemical systems in dilute solution which constitute a major category of chemicalsystems found in nature and examined in chemical research One prominent subcategorythereof that attracts much attention in fundamental and industrial research is homoge-neous catalysis a field that is rather underrepresented in the kinetic modeling literature58

For the study of dilute systems collisions involving three or more reactive species canbe considered negligible from a statistical point of view For this reason each element ofthe forward and backward reaction rate vectors f+ and fminus is of the form

f+minusl = k

+minusl y

n+minusl1

yn+minusl2

(11)

where the vertex index n is determined by the edge index (l) the direction (+ or minus) andthe (arbitrary) position in the rate equation (i = 1 2) In case of a unimolecular reactionwe simply set one of the two vertex indices to n+minus

li = 0 denoting a hypothetical null-species with a defined constant value of y0 = 1 independent of the unit of measurement

24 The KiNetX Workflow

241 Uncertainty Propagation

The first step of our KiNetX workflow is the numerical integration of an ensemble ofB + 1 kinetic models each of them representing a unique set of rate constants Clearly

9

the user also has the choice to choose B = 0 (leading to the usual kinetic modeling setupcomprising a single numerical integration) but KiNetX is actually designed to analyzean ensemble of kinetic models

To assess the magnitude of noise in the B solutions (y-uncertainty) resulting fromthe ensemble of sets of rate constants (k-uncertainty) we introduce two measures

δB(tu) =1

2B

Bsumb=1

Nsumn=1

∣∣ynb(tu)minus langyn(tu)rang∣∣ (12)

where langyn(tu)

rang=

1

B

Bsumb=1

ynb(tu) (13)

is the mean value of concentrations at time t = tu and

∆B =1

tmax

Usumu=1

(δB(tuminus12)

)middot(tu minus tuminus1

) (14)

The factor tmax in the denominator of Eq (14) equals the sum of time differences giventhat t0 = 0

tmax = tU = tU minus t0 =Usumu=1

(tu minus tuminus1) (15)

Furthermore we define

tuminus12 =1

2(tuminus1 + tu) (16)

According to Eq (12) δB(tu) represents the ensemble-averaged y-uncertainty summedover all species at time t = tu Here we focus on δB(tmax) as it measures the variabil-ity in the product distribution a network property of particular interest for syntheticchemists On the contrary ∆B represents a time-averaged y-uncertainty ie the overallvariability of species concentrations between t0 = 0 and tU = tmax The combination ofboth measures may be valuable for determining the underlying reaction mechanism Forinstance if δB(tmax) is close to zero but ∆B is rather large it is likely that we are facedwith both or either of the following two scenarios (i) The sequence of elementary steps isidentical or very similar for some k-vectors but the uncertainty of the activation barrierassociated with the rate-determining step is significant (ii) different k-vectors suggestdifferent routes to the same metastable sink of the reaction network Furthermore ifboth δB(tmax) and ∆B are below a user-defined concentration error εy we can safely ne-glect the ensemble of kinetic models and base all further kinetic analysis on the nominalsample represented by k0 only

When applying Eqs (12)+(14) it is important that the number and identity ofthe time points tu are the same for all solutions However it is very unlikely thatthe time points obtained from an ODE solver are identical for two different k-vectors

10

For this purpose KiNetX calculates a log-distributed vector of reference time points(t0 t1 middot middot middot tU)gt with t0 = 0 and tU = tmax which is a function of the user-defined pa-rameters tmax and U To obtain species concentrations at the same time points forother k-vectors KiNetX interpolates the corresponding solutions through cubic splinesmoothing Note that spline smoothing is only reasonable if data noise is negligible aproperty which one can assess easily in the case of one control variable (here time) Toevaluate the reliability of the cubic spline interpolations we compared them against re-sults from Gaussian process regression59 Under suitable assumptions Gaussian processregression yields an optimal trade-off between fitting and smoothness (cubic splines onlyensure the latter property) Hence it can be employed to predict concentrations frompreviously unconsidered values in the time domain (here the domain between t0 = 0 andtU = tmax) Furthermore as Gaussian process regression is a strictly Bayesian methodone may obtain reliable uncertainty estimates for each prediction However as this re-gression method scales cubically with the number of data points it is not suitable forrepetitive applications (usually hundreds of regressions would be necessary) In all casesstudied we found that both the mean deviation between the two interpolation meth-ods (cubic spline smoothing versus Gaussian process regression) and the mean predictivevariance obtained from Gaussian process regression are negligibly small compared to themean variance of the concentrations themselves (by a factor of lt10minus6) Hence data noiseis indeed negligible and cubic splines work well for interpolating concentration profiles

242 Network Reduction

We introduce a hierarchy of two reduction algorithms with increasing sophistication (interms of both rigor and required computing resources) which identify and eliminatekinetically irrelevant species and elementary reactions Initially we perform a detailedflux analysis followed by a local barrier analysis if the former analysis suggests a reducedmodel resulting in a concentration error that exceeds the user-defined threshold εy Bothalgorithms reflect our interpretation of established kinetic modeling concepts34

Detailed Flux Analysis (DFA) To keep track of the net concentration that haspassed an edge between t = 0 and t = tmax we integrate the absolute values of fl(t)over that time interval

Fl =

int t=tmax

t=0

|fl(t)| dt (17)

which we define as the edge flux corresponding to the l-th reaction pair The vector ofedge fluxes F is the basis for determining the vertex flux corresponding to the n-thspecies

Gn =(s+n + sminusn

)F (18)

11

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 8: Mechanism Deduction from Noisy Chemical Reaction Networks

of (generally stiff) ODE systems we interface to the ode15s module56 of MatlabThe KiNetX workflow consists of three core algorithms (Sections 241ndash243) all of

which take the correlated uncertainty in the underlying model parameters (rate constants)explicitly into account

1 Uncertainty propagation

Solve an ensemble of kinetic models derived from a unique reaction network andestimate the kinetic relevance of every species based on the maximum rate of for-mation (Section 241)

2 Network reduction Identify and eliminate all kinetically irrelevant vertices andedges of the network by applying a hierarchy of flux analyses resulting in a sparsenetwork that is a comprehensible representation of the underlying reaction mecha-nism (Section 242)

3 Sensitivity analysis Determine the effect of rate constant perturbations on time-dependent species concentrations through an extended version of Morris screening57

(Section 243)

The minimal input requirements for KiNetX are

bull A chemical reaction network with N vertices and 2L unidirectional edges (providedby AutoNetGen in this work)

bull A set of N initial species concentrations y0 (provided by the user)

bull An ensemble of B + 1 sets of 2L rate constants each

KB = kb (10)

with b = 0 middot middot middot B which may be derived from an ensemble of electronic structuremodels based on eg density functional theory222425 (provided by AutoNetGen inthis work)

bull A maximum reaction time tmax representing a practical time scale or a time scaleof interest (provided by the user)

At present we require the input to be provided in SI units Optional input parametersare

bull The maximum tolerable concentration error εy between the exact and an approx-imate solution

8

bull A flux threshold Gmin above which a chemical species will be considered kineticallyrelevant

bull The number of log-distributed time points U + 1 between t = 0 and t = tmax atwhich species concentrations will be evaluated based on cubic spline interpolation

bull A confidence level γ important for assessing network properties derived from anensemble of kinetic models

bull The number of Morris samples C considered in our global sensitivity analysis

bull Absolute and relative concentration error tolerances considered during numericalintegration of ODE systems

Every execution of KiNetX is based on one network with a fixed number of verticesand edges Hence for steering the exploration of reaction networks KiNetX needs tobe executed repeatedly We designed KiNetx to suggest the next exploration step byestimating the kinetic relevance of every species based on the maximum rate of forma-tion (Section 25) We will make the KiNetx program available through our webpage(scineethzch) in 2019

Currently KiNetX is limited to the analysis of homogeneous and isothermal reac-tive chemical systems in dilute solution which constitute a major category of chemicalsystems found in nature and examined in chemical research One prominent subcategorythereof that attracts much attention in fundamental and industrial research is homoge-neous catalysis a field that is rather underrepresented in the kinetic modeling literature58

For the study of dilute systems collisions involving three or more reactive species canbe considered negligible from a statistical point of view For this reason each element ofthe forward and backward reaction rate vectors f+ and fminus is of the form

f+minusl = k

+minusl y

n+minusl1

yn+minusl2

(11)

where the vertex index n is determined by the edge index (l) the direction (+ or minus) andthe (arbitrary) position in the rate equation (i = 1 2) In case of a unimolecular reactionwe simply set one of the two vertex indices to n+minus

li = 0 denoting a hypothetical null-species with a defined constant value of y0 = 1 independent of the unit of measurement

24 The KiNetX Workflow

241 Uncertainty Propagation

The first step of our KiNetX workflow is the numerical integration of an ensemble ofB + 1 kinetic models each of them representing a unique set of rate constants Clearly

9

the user also has the choice to choose B = 0 (leading to the usual kinetic modeling setupcomprising a single numerical integration) but KiNetX is actually designed to analyzean ensemble of kinetic models

To assess the magnitude of noise in the B solutions (y-uncertainty) resulting fromthe ensemble of sets of rate constants (k-uncertainty) we introduce two measures

δB(tu) =1

2B

Bsumb=1

Nsumn=1

∣∣ynb(tu)minus langyn(tu)rang∣∣ (12)

where langyn(tu)

rang=

1

B

Bsumb=1

ynb(tu) (13)

is the mean value of concentrations at time t = tu and

∆B =1

tmax

Usumu=1

(δB(tuminus12)

)middot(tu minus tuminus1

) (14)

The factor tmax in the denominator of Eq (14) equals the sum of time differences giventhat t0 = 0

tmax = tU = tU minus t0 =Usumu=1

(tu minus tuminus1) (15)

Furthermore we define

tuminus12 =1

2(tuminus1 + tu) (16)

According to Eq (12) δB(tu) represents the ensemble-averaged y-uncertainty summedover all species at time t = tu Here we focus on δB(tmax) as it measures the variabil-ity in the product distribution a network property of particular interest for syntheticchemists On the contrary ∆B represents a time-averaged y-uncertainty ie the overallvariability of species concentrations between t0 = 0 and tU = tmax The combination ofboth measures may be valuable for determining the underlying reaction mechanism Forinstance if δB(tmax) is close to zero but ∆B is rather large it is likely that we are facedwith both or either of the following two scenarios (i) The sequence of elementary steps isidentical or very similar for some k-vectors but the uncertainty of the activation barrierassociated with the rate-determining step is significant (ii) different k-vectors suggestdifferent routes to the same metastable sink of the reaction network Furthermore ifboth δB(tmax) and ∆B are below a user-defined concentration error εy we can safely ne-glect the ensemble of kinetic models and base all further kinetic analysis on the nominalsample represented by k0 only

When applying Eqs (12)+(14) it is important that the number and identity ofthe time points tu are the same for all solutions However it is very unlikely thatthe time points obtained from an ODE solver are identical for two different k-vectors

10

For this purpose KiNetX calculates a log-distributed vector of reference time points(t0 t1 middot middot middot tU)gt with t0 = 0 and tU = tmax which is a function of the user-defined pa-rameters tmax and U To obtain species concentrations at the same time points forother k-vectors KiNetX interpolates the corresponding solutions through cubic splinesmoothing Note that spline smoothing is only reasonable if data noise is negligible aproperty which one can assess easily in the case of one control variable (here time) Toevaluate the reliability of the cubic spline interpolations we compared them against re-sults from Gaussian process regression59 Under suitable assumptions Gaussian processregression yields an optimal trade-off between fitting and smoothness (cubic splines onlyensure the latter property) Hence it can be employed to predict concentrations frompreviously unconsidered values in the time domain (here the domain between t0 = 0 andtU = tmax) Furthermore as Gaussian process regression is a strictly Bayesian methodone may obtain reliable uncertainty estimates for each prediction However as this re-gression method scales cubically with the number of data points it is not suitable forrepetitive applications (usually hundreds of regressions would be necessary) In all casesstudied we found that both the mean deviation between the two interpolation meth-ods (cubic spline smoothing versus Gaussian process regression) and the mean predictivevariance obtained from Gaussian process regression are negligibly small compared to themean variance of the concentrations themselves (by a factor of lt10minus6) Hence data noiseis indeed negligible and cubic splines work well for interpolating concentration profiles

242 Network Reduction

We introduce a hierarchy of two reduction algorithms with increasing sophistication (interms of both rigor and required computing resources) which identify and eliminatekinetically irrelevant species and elementary reactions Initially we perform a detailedflux analysis followed by a local barrier analysis if the former analysis suggests a reducedmodel resulting in a concentration error that exceeds the user-defined threshold εy Bothalgorithms reflect our interpretation of established kinetic modeling concepts34

Detailed Flux Analysis (DFA) To keep track of the net concentration that haspassed an edge between t = 0 and t = tmax we integrate the absolute values of fl(t)over that time interval

Fl =

int t=tmax

t=0

|fl(t)| dt (17)

which we define as the edge flux corresponding to the l-th reaction pair The vector ofedge fluxes F is the basis for determining the vertex flux corresponding to the n-thspecies

Gn =(s+n + sminusn

)F (18)

11

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 9: Mechanism Deduction from Noisy Chemical Reaction Networks

bull A flux threshold Gmin above which a chemical species will be considered kineticallyrelevant

bull The number of log-distributed time points U + 1 between t = 0 and t = tmax atwhich species concentrations will be evaluated based on cubic spline interpolation

bull A confidence level γ important for assessing network properties derived from anensemble of kinetic models

bull The number of Morris samples C considered in our global sensitivity analysis

bull Absolute and relative concentration error tolerances considered during numericalintegration of ODE systems

Every execution of KiNetX is based on one network with a fixed number of verticesand edges Hence for steering the exploration of reaction networks KiNetX needs tobe executed repeatedly We designed KiNetx to suggest the next exploration step byestimating the kinetic relevance of every species based on the maximum rate of forma-tion (Section 25) We will make the KiNetx program available through our webpage(scineethzch) in 2019

Currently KiNetX is limited to the analysis of homogeneous and isothermal reac-tive chemical systems in dilute solution which constitute a major category of chemicalsystems found in nature and examined in chemical research One prominent subcategorythereof that attracts much attention in fundamental and industrial research is homoge-neous catalysis a field that is rather underrepresented in the kinetic modeling literature58

For the study of dilute systems collisions involving three or more reactive species canbe considered negligible from a statistical point of view For this reason each element ofthe forward and backward reaction rate vectors f+ and fminus is of the form

f+minusl = k

+minusl y

n+minusl1

yn+minusl2

(11)

where the vertex index n is determined by the edge index (l) the direction (+ or minus) andthe (arbitrary) position in the rate equation (i = 1 2) In case of a unimolecular reactionwe simply set one of the two vertex indices to n+minus

li = 0 denoting a hypothetical null-species with a defined constant value of y0 = 1 independent of the unit of measurement

24 The KiNetX Workflow

241 Uncertainty Propagation

The first step of our KiNetX workflow is the numerical integration of an ensemble ofB + 1 kinetic models each of them representing a unique set of rate constants Clearly

9

the user also has the choice to choose B = 0 (leading to the usual kinetic modeling setupcomprising a single numerical integration) but KiNetX is actually designed to analyzean ensemble of kinetic models

To assess the magnitude of noise in the B solutions (y-uncertainty) resulting fromthe ensemble of sets of rate constants (k-uncertainty) we introduce two measures

δB(tu) =1

2B

Bsumb=1

Nsumn=1

∣∣ynb(tu)minus langyn(tu)rang∣∣ (12)

where langyn(tu)

rang=

1

B

Bsumb=1

ynb(tu) (13)

is the mean value of concentrations at time t = tu and

∆B =1

tmax

Usumu=1

(δB(tuminus12)

)middot(tu minus tuminus1

) (14)

The factor tmax in the denominator of Eq (14) equals the sum of time differences giventhat t0 = 0

tmax = tU = tU minus t0 =Usumu=1

(tu minus tuminus1) (15)

Furthermore we define

tuminus12 =1

2(tuminus1 + tu) (16)

According to Eq (12) δB(tu) represents the ensemble-averaged y-uncertainty summedover all species at time t = tu Here we focus on δB(tmax) as it measures the variabil-ity in the product distribution a network property of particular interest for syntheticchemists On the contrary ∆B represents a time-averaged y-uncertainty ie the overallvariability of species concentrations between t0 = 0 and tU = tmax The combination ofboth measures may be valuable for determining the underlying reaction mechanism Forinstance if δB(tmax) is close to zero but ∆B is rather large it is likely that we are facedwith both or either of the following two scenarios (i) The sequence of elementary steps isidentical or very similar for some k-vectors but the uncertainty of the activation barrierassociated with the rate-determining step is significant (ii) different k-vectors suggestdifferent routes to the same metastable sink of the reaction network Furthermore ifboth δB(tmax) and ∆B are below a user-defined concentration error εy we can safely ne-glect the ensemble of kinetic models and base all further kinetic analysis on the nominalsample represented by k0 only

When applying Eqs (12)+(14) it is important that the number and identity ofthe time points tu are the same for all solutions However it is very unlikely thatthe time points obtained from an ODE solver are identical for two different k-vectors

10

For this purpose KiNetX calculates a log-distributed vector of reference time points(t0 t1 middot middot middot tU)gt with t0 = 0 and tU = tmax which is a function of the user-defined pa-rameters tmax and U To obtain species concentrations at the same time points forother k-vectors KiNetX interpolates the corresponding solutions through cubic splinesmoothing Note that spline smoothing is only reasonable if data noise is negligible aproperty which one can assess easily in the case of one control variable (here time) Toevaluate the reliability of the cubic spline interpolations we compared them against re-sults from Gaussian process regression59 Under suitable assumptions Gaussian processregression yields an optimal trade-off between fitting and smoothness (cubic splines onlyensure the latter property) Hence it can be employed to predict concentrations frompreviously unconsidered values in the time domain (here the domain between t0 = 0 andtU = tmax) Furthermore as Gaussian process regression is a strictly Bayesian methodone may obtain reliable uncertainty estimates for each prediction However as this re-gression method scales cubically with the number of data points it is not suitable forrepetitive applications (usually hundreds of regressions would be necessary) In all casesstudied we found that both the mean deviation between the two interpolation meth-ods (cubic spline smoothing versus Gaussian process regression) and the mean predictivevariance obtained from Gaussian process regression are negligibly small compared to themean variance of the concentrations themselves (by a factor of lt10minus6) Hence data noiseis indeed negligible and cubic splines work well for interpolating concentration profiles

242 Network Reduction

We introduce a hierarchy of two reduction algorithms with increasing sophistication (interms of both rigor and required computing resources) which identify and eliminatekinetically irrelevant species and elementary reactions Initially we perform a detailedflux analysis followed by a local barrier analysis if the former analysis suggests a reducedmodel resulting in a concentration error that exceeds the user-defined threshold εy Bothalgorithms reflect our interpretation of established kinetic modeling concepts34

Detailed Flux Analysis (DFA) To keep track of the net concentration that haspassed an edge between t = 0 and t = tmax we integrate the absolute values of fl(t)over that time interval

Fl =

int t=tmax

t=0

|fl(t)| dt (17)

which we define as the edge flux corresponding to the l-th reaction pair The vector ofedge fluxes F is the basis for determining the vertex flux corresponding to the n-thspecies

Gn =(s+n + sminusn

)F (18)

11

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 10: Mechanism Deduction from Noisy Chemical Reaction Networks

the user also has the choice to choose B = 0 (leading to the usual kinetic modeling setupcomprising a single numerical integration) but KiNetX is actually designed to analyzean ensemble of kinetic models

To assess the magnitude of noise in the B solutions (y-uncertainty) resulting fromthe ensemble of sets of rate constants (k-uncertainty) we introduce two measures

δB(tu) =1

2B

Bsumb=1

Nsumn=1

∣∣ynb(tu)minus langyn(tu)rang∣∣ (12)

where langyn(tu)

rang=

1

B

Bsumb=1

ynb(tu) (13)

is the mean value of concentrations at time t = tu and

∆B =1

tmax

Usumu=1

(δB(tuminus12)

)middot(tu minus tuminus1

) (14)

The factor tmax in the denominator of Eq (14) equals the sum of time differences giventhat t0 = 0

tmax = tU = tU minus t0 =Usumu=1

(tu minus tuminus1) (15)

Furthermore we define

tuminus12 =1

2(tuminus1 + tu) (16)

According to Eq (12) δB(tu) represents the ensemble-averaged y-uncertainty summedover all species at time t = tu Here we focus on δB(tmax) as it measures the variabil-ity in the product distribution a network property of particular interest for syntheticchemists On the contrary ∆B represents a time-averaged y-uncertainty ie the overallvariability of species concentrations between t0 = 0 and tU = tmax The combination ofboth measures may be valuable for determining the underlying reaction mechanism Forinstance if δB(tmax) is close to zero but ∆B is rather large it is likely that we are facedwith both or either of the following two scenarios (i) The sequence of elementary steps isidentical or very similar for some k-vectors but the uncertainty of the activation barrierassociated with the rate-determining step is significant (ii) different k-vectors suggestdifferent routes to the same metastable sink of the reaction network Furthermore ifboth δB(tmax) and ∆B are below a user-defined concentration error εy we can safely ne-glect the ensemble of kinetic models and base all further kinetic analysis on the nominalsample represented by k0 only

When applying Eqs (12)+(14) it is important that the number and identity ofthe time points tu are the same for all solutions However it is very unlikely thatthe time points obtained from an ODE solver are identical for two different k-vectors

10

For this purpose KiNetX calculates a log-distributed vector of reference time points(t0 t1 middot middot middot tU)gt with t0 = 0 and tU = tmax which is a function of the user-defined pa-rameters tmax and U To obtain species concentrations at the same time points forother k-vectors KiNetX interpolates the corresponding solutions through cubic splinesmoothing Note that spline smoothing is only reasonable if data noise is negligible aproperty which one can assess easily in the case of one control variable (here time) Toevaluate the reliability of the cubic spline interpolations we compared them against re-sults from Gaussian process regression59 Under suitable assumptions Gaussian processregression yields an optimal trade-off between fitting and smoothness (cubic splines onlyensure the latter property) Hence it can be employed to predict concentrations frompreviously unconsidered values in the time domain (here the domain between t0 = 0 andtU = tmax) Furthermore as Gaussian process regression is a strictly Bayesian methodone may obtain reliable uncertainty estimates for each prediction However as this re-gression method scales cubically with the number of data points it is not suitable forrepetitive applications (usually hundreds of regressions would be necessary) In all casesstudied we found that both the mean deviation between the two interpolation meth-ods (cubic spline smoothing versus Gaussian process regression) and the mean predictivevariance obtained from Gaussian process regression are negligibly small compared to themean variance of the concentrations themselves (by a factor of lt10minus6) Hence data noiseis indeed negligible and cubic splines work well for interpolating concentration profiles

242 Network Reduction

We introduce a hierarchy of two reduction algorithms with increasing sophistication (interms of both rigor and required computing resources) which identify and eliminatekinetically irrelevant species and elementary reactions Initially we perform a detailedflux analysis followed by a local barrier analysis if the former analysis suggests a reducedmodel resulting in a concentration error that exceeds the user-defined threshold εy Bothalgorithms reflect our interpretation of established kinetic modeling concepts34

Detailed Flux Analysis (DFA) To keep track of the net concentration that haspassed an edge between t = 0 and t = tmax we integrate the absolute values of fl(t)over that time interval

Fl =

int t=tmax

t=0

|fl(t)| dt (17)

which we define as the edge flux corresponding to the l-th reaction pair The vector ofedge fluxes F is the basis for determining the vertex flux corresponding to the n-thspecies

Gn =(s+n + sminusn

)F (18)

11

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 11: Mechanism Deduction from Noisy Chemical Reaction Networks

For this purpose KiNetX calculates a log-distributed vector of reference time points(t0 t1 middot middot middot tU)gt with t0 = 0 and tU = tmax which is a function of the user-defined pa-rameters tmax and U To obtain species concentrations at the same time points forother k-vectors KiNetX interpolates the corresponding solutions through cubic splinesmoothing Note that spline smoothing is only reasonable if data noise is negligible aproperty which one can assess easily in the case of one control variable (here time) Toevaluate the reliability of the cubic spline interpolations we compared them against re-sults from Gaussian process regression59 Under suitable assumptions Gaussian processregression yields an optimal trade-off between fitting and smoothness (cubic splines onlyensure the latter property) Hence it can be employed to predict concentrations frompreviously unconsidered values in the time domain (here the domain between t0 = 0 andtU = tmax) Furthermore as Gaussian process regression is a strictly Bayesian methodone may obtain reliable uncertainty estimates for each prediction However as this re-gression method scales cubically with the number of data points it is not suitable forrepetitive applications (usually hundreds of regressions would be necessary) In all casesstudied we found that both the mean deviation between the two interpolation meth-ods (cubic spline smoothing versus Gaussian process regression) and the mean predictivevariance obtained from Gaussian process regression are negligibly small compared to themean variance of the concentrations themselves (by a factor of lt10minus6) Hence data noiseis indeed negligible and cubic splines work well for interpolating concentration profiles

242 Network Reduction

We introduce a hierarchy of two reduction algorithms with increasing sophistication (interms of both rigor and required computing resources) which identify and eliminatekinetically irrelevant species and elementary reactions Initially we perform a detailedflux analysis followed by a local barrier analysis if the former analysis suggests a reducedmodel resulting in a concentration error that exceeds the user-defined threshold εy Bothalgorithms reflect our interpretation of established kinetic modeling concepts34

Detailed Flux Analysis (DFA) To keep track of the net concentration that haspassed an edge between t = 0 and t = tmax we integrate the absolute values of fl(t)over that time interval

Fl =

int t=tmax

t=0

|fl(t)| dt (17)

which we define as the edge flux corresponding to the l-th reaction pair The vector ofedge fluxes F is the basis for determining the vertex flux corresponding to the n-thspecies

Gn =(s+n + sminusn

)F (18)

11

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 12: Mechanism Deduction from Noisy Chemical Reaction Networks

where s+n and sminusn represent the n-th row of S+ and Sminus respectively Our DFA implemen-

tation approximates the edge flux according to

Fl asymp FDFAl =

Usumu=1

|fl(tuminus12)| middot (tu minus tuminus1) (19)

Analogous to the exact expression the approximate vertex flux reads

GDFAn =

(s+n + sminusn

)FDFA (20)

If GDFAn lt Gmin where Gmin is a user-defined threshold the n-th vertex will be removed

from the kinetic model except for the case where the n-th vertex represents a reactantAll edges that were connected to at least one of the removed vertices will be removed tooAfter this procedure the number of vertices and edges should have decreased significantly

Note that in the case of a closed system GDFAn is approaching a finite value with

increasing time which renders Gmin an intuitive threshold that resembles a concentrationerror The assumption behind our DFA is based on this intuitive interpretation If areaction channel with a flux smaller than Gmin is removed the redistribution of thisminute amount of flux (ie concentration) should yield a concentration error that iscomparable to Gmin Therefore we have a means at hand that potentially allows us tocontrol the concentration error introduced by a specific value of Gmin Clearly choosinga smaller value of Gmin will increase the reliability of the DFA-based solution but alsodecreases the possible degree of network reduction

To determine the accuracy loss caused by a DFA we introduce the measures

δpq(tu) =1

2

Nsumn=1

∣∣ynp(tu)minus ynq(tu)∣∣ (21)

and

∆pq =1

tmax

Usumu=1

(δpq(tuminus12)

)middot(tu minus tuminus1

) (22)

which resemble the control quantities δB(tu) and ∆B Here however we compare twospecific solutions yp(t) and yq(t) with each other one resulting from the original net-work and the other resulting from the corresponding sparse network obtained throughour DFA Note that the number of vertices differs for the original and sparse networksHence to render Eq (21) practical we define N to be the number of vertices containedin the original network and set all elements in ysparse(t) to zero that refer to verticesnot contained in the sparse network In case that all of the B + 1 kinetic models yieldboth δpq(tu)-values (again we focus on tu = tU = tmax) and ∆pq-values that are below auser-defined concentration error εy KiNetX considers the DFA-based network reduc-tion reliable However if there is at least one kinetic model that yields δpq(tmax) gt εy

12

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 13: Mechanism Deduction from Noisy Chemical Reaction Networks

or ∆pq gt εy it depends on the user-defined confidence level γ whether the DFA-basednetwork reduction is considered reliable or not We require the confidence level to fulfillthe condition 0 le γ le 1 If the (1 + γ)2 quantile of δpq(tmax) andor ∆pq exceeds εythe DFA-based network reduction is considered unreliable Here the lowest and highestpossible quantiles represent the median and the maximum of both measures over all B+1

samples respectivelyLocal Barrier Analysis (LBA) There are certain cases in which a DFA-based

network reduction is prone to yield unreliable results One example is autocatalysisImagine the following model reaction network

2A B (very slow)B C (fast)C + A D (fast)D + A E (fast)E 2C (fast)Even though the first reaction step is very slow it is required to initiate all other

reaction steps Without B neither C nor D nor E can be formed Even though B isobviously a very important species for the reaction mechanism it will only be relevantat the very beginning of the reation (ie until a minute amount of it has been formed)Hence the vertex flux for B GB will be very small possibly smaller than the user-definedthreshold Gmin leading to a false elimination of this species and the second of the fiveedges

In this case δpq(tmax) and ∆pq will clearly exceed εy and the LBA will start auto-matically The idea behind our LBA algorithm is fairly simple Set the rate constants ofthe first reaction pair to zero which is equivalent to an infinitely high activation barrieror the removal of the corresponding edge Repeat numerical integration and comparethe solution to the original one based on δpq(tmax) and ∆pq Set the rate constants of thefirst reaction pair to their original values Repeat the entire procedure for the second L-th reaction pair and for each of the B + 1 kinetic models

After this procedure every reaction pair is associated with B + 1 values for δpq(tmax)

and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq is smaller than εy the cor-responding reaction pair will be considered kinetically irrelevant and removed from thenetwork All unconnected vertices will be removed too Subsequently the resulting B+1

reduced kinetic models are integrated and their solutions are compared to the originalones again based on δpq(tmax) and ∆pq If the (1 + c)2 quantile of δpq(tmax) andor ∆pq

is smaller than εy the LBA-based network reduction will be considered reliable It is stillpossible that εy is exceeded as the kinetic models we consider are generally nonlinearIf the lack of one edge does not alter the solution and the same result is obtained foranother edge it does not imply that the simultaneous lack of both edges leads to the

13

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 14: Mechanism Deduction from Noisy Chemical Reaction Networks

same conclusion If εy is still exceeded after the LBA-based network reduction KiNetX

will recover the original network and continue its analysis without any reduction

243 Sensitivity Analysis

In the following we assume that the concentration profiles of the sparse reaction networkreveal pronounced uncertainties such that it is difficult to correctly assign product ratiosor to suggest a specific reaction mechanism To estimate to which rate constants theconcentration profiles are most sensitive a sensitivity analysis is required34 For thispurpose the rate constants are perturbed to study how such perturbations affect thetime-dependent species concentrations

In a local sensitivity analysis the model parameters are perturbed one by one fromtheir nominal values (here the elements of k0) While a local sensitivity analysis isstraightforward to conduct and computationally feasible (usually L kinetic simulationsare required) it has a significant disadvantage in that it does take into account thecorrelation between the model parameters (cf Section 22) Consequently one mayoverlook important correlation effects on the uncertainty of concentration profiles Notethat our LBA algorithm resembles an extreme variant of a local sensitivity analysis(cf Section 243) where each rate constant is perturbed to its minimal value

In a global sensitivity analysis the correlation between the model parameters is takeninto account but the process requires significantly more computational resources thana local sensitivity analysis Furthermore the design of a global sensitivity analysis isnot as unambiguous as for a local sensitivity analysis which explains the existence ofseveral approaches that may require CL (Morris screening where C is an integer usu-ally much smaller than B) B (polynomial chaos) or 2BL (Sobolrsquos method) numericalintegrations34

Morris screening57 is among the simplest of global sensitivity analyses as it is notdesigned to induce a mapping between the model parameters and the model solutionInstead its purpose is to categorize the model parameters as either important or unim-portant depending on how strongly changes in them affect the model solutions We areparticularly interested in this categorization since we aim for a descriptor that informs usabout the quality of rate constants obtained from an efficient (basic) quantum chemicalmodel If we find that the uncertainty of some species concentrations is too large toderive sensible conclusions about specific network properties the results of a global sen-sitivity analysis will support us in identifying the most critical rate constants that requirea reevaluation based on a more sophisticated (benchmark) quantum chemical model

The original version of Morris screening does not take into account the joint prob-ability distribution of the model parameters Here we introduce an extended variant

14

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 15: Mechanism Deduction from Noisy Chemical Reaction Networks

of Morris screening that explicitly considers this information as it is the key element ofKiNetX In our implementation of Morris screening one starts from the nominal samplek0 and replaces the elements k+

01 and kminus01 with the corresponding elements of k1 k+11 and

kminus11 For this new vector of rate constants k01 a numerical integration is carried outSubsequently the elements k+

02 and kminus02 of k01 are replaced with the elements k+12 and

kminus12 of k1 and a numerical integration based on the new vector k02 is carried out Thisprocedure is repeated L times until we arrive at k0L = k1 for which numerical integrationwas already carried out in the first step of the KiNetX workflow (Section 241) Theentire procedure is repeated now by an element-wise change from k1 to k2 and generallyby an element-wise change from kb to kb+1 until b = Cminus1 is reached In the end we willhave generated solutions for another C(L minus 1) kinetic models in addition to the B + 1

solutions obtained for KB We are interested in the C(Lminus 1) new solutions and the firstC + 1 of the B+ 1 solutions obtained previously amounting to CL+ 1 solutions relevantfor our sensitivity analysis

For each of these solutions KiNetX determines the δpq(tmax) and ∆pq metrics wherep and q represent two adjacent k-vectors as constructed by our extended version of Morrisscreening CL values are obtained for each metric C thereof for every reaction pair Wedefine the sensitivity coefficients zδl and z∆

l as

zδl =1

C

Cminus1sumb=0

δblblminus1(tmax) (23)

and

z∆l =

1

C

Cminus1sumb=0

∆blblminus1 (24)

respectively The subscript bl refers to sample kbl and we define kb0 = kb Note that inEqs (23) and (24) we do not divide the δpq(tmax) and ∆pq metrics by the correspondingchanges in the rate constants which would be consistent with the usual definition ofsensitivity coefficients Instead we implicitly define the dimension of each sensivitycoefficient to be a concentration divided by the conditional standard deviation of theassociated rate constant Here the term conditional relates to the consideration of thecurrent values of all other rate constants This way we obtain a direct measure ofthe average effect a rate-constant perturbation has on the solution of the correspondingkinetic model We justify this unusual approach by the fact that all perturbations appliedoriginate from actual samples of the underlying joint probability distribution of rateconstants

Finally we can set up a ranking of sensitivity coefficients The larger zδl and z∆l the

larger the effect of changes in k+minusl on the concentration profiles With this ranking

at hand one can systematically improve on both the accuracy and precision of the

15

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 16: Mechanism Deduction from Noisy Chemical Reaction Networks

concentration profiles to reliably suggest product distributions (based on zδl ) or specificreaction mechanisms (based on z∆

l ) For an actual chemical reaction network derivedfrom first-principles reaction data one would reevaluate the critical rate constants witha benchmark quantum chemical model

25 KiNetX-Guided Network Exploration

In Sections 241 to 243 we have introduced the entire KiNetX workflow which ana-lyzes one network at a time However given that electronic structure calculations usuallyrequire much more computational resources than a kinetic analysis it is rather impracti-cal to explore all relevant elementary steps prior to a kinetic analysis (not to be mentionedthat this strategy may be even unfeasible) For this reason we designed KiNetX to sug-gest the next exploration step which may significantly increase the possible explorationdepth

The coupling of kinetic modeling with mechanism generation is well-known in thereaction engineering community It has been introduced by Broadbelt Green and co-workers60 and recently revisited by Green and co-workers61 for their Reaction Mech-

anism Generator13 an exploration software originally designed for the study of gasphase reactions (in particular combustion)62 which has recently been extended to un-derstand the kinetics of solution phase chemistry63 Here we follow a similar strategybut additionally take into account the correlated uncertainty in the rate constants Notethat the temperatures relevant in the field of combustion are much higher than what isusual in solution phase chemistry As the thermal rate constant is a decreasing functionof temperature in classical rate theories k-uncertainty is usually neglected in combustionstudies This relationship explains why the Reaction Mechanism Generator doesnot take k-uncertainty into account

We start from the reactants (species with nonzero initial concentrations) and attemptto find all direct products ie intermediates that are formed by a single elementary re-action of the reactants The reactants are considered active whereas the direct productsare considered inactive Only active species are considered in the exploration procedureie at this point all possible reaction steps have been discovered (according to the explo-ration algorithm employed) To estimate which of the inactive species is potentially themost relevant for the mechanism we focus on the formation rate of each inactive species(indicated by an asterisk)

rarrgnlowast(tu) = s+nlowastfminus(tu) + sminusnlowastf+(tu) (25)

Note that s+nlowast is multiplied with fminus(t) since in a backward (minus) reaction the left-

hand-side species (+) are formed An equivalent argument holds for the multiplication

16

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 17: Mechanism Deduction from Noisy Chemical Reaction Networks

of sminusnlowast with f+(t) If a species reveals a very high formation rate at a specific time duringthe course of the reaction we assume that this species may be part of relevant reactionchannels60 Hence we are interested in the maximum formation rate of each inactivespecies with respect to all time points tu

rarrgmaxnlowast = max

rarrgnlowast(t0)rarr gnlowast(t1) middot middot middot rarr gnlowast(tU)

(26)

The species with the highest value of rarrgmaxnlowast will be promoted active Note that if

an ensemble of k-vectors is provided there is more than one maximum formation ratefor each inactive species In this case we rank the inactive species based on a simplestatistical model

η(rarrgmax

nlowast

)=

radicradicradicradiclangrarrgmaxnlowast

rang2+

1

B minus 1

Bsumb=1

(rarrgmax

nlowastb minuslangrarrgmax

nlowast

rang)2

(27)

that takes into account both the ensemble average of maximum formation rates

langrarrgmaxnlowast

rang=

1

B

Bsumb=1

rarrgmaxnlowastb (28)

and the corresponding variance If two species exhibit the same average maximum for-mation rate but significantly different variances the high-variance case will be favoredas the corresponding species may lead us to potentially more important regions of thereaction space to be explored

In the original algorithm by Susnow et al60 which is very similar to the one presentedhere but does not take into account ensembles of kinetic models the exploration stops ifall inactive species reveal values of rarrgmax

nlowast that are below a user-defined rate thresholdGrenda et al64 discussed an important limitation of the rate threshold In certaincases eg where an autocatalytic cycle is an integral part of the reaction mechanismsome species that are of central mechanistic importance may reveal very small maximumformation rates during the exploration procedure In such cases the rate thresholdmay need to be chosen very small to achieve mechanistic completeness which rendersit basically impossible to choose a reasonable threshold for a yet unknown reaction Inthe most inefficient case a kinetics-guided exploration would lead to the same reactionnetwork as a nonguided exploration

26 Chemistry-Mimicking Networks from AutoNetGen

AutoNetGen generates chemistry-mimicking networks endowed with parameters (bothactivation free energies and rate constants) in a layer-by-layer fashion The first layerrepresents the reactants which need to be specified on input A new layer is formed

17

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 18: Mechanism Deduction from Noisy Chemical Reaction Networks

combinatorially by exploring all possible reactions between all species of the current andprevious layers As AutoNetGen generates reaction networks based on abstract rules in-stead of deriving them from actual chemical representations (eg nuclear coordinates forintermediates and transition states) we cannot resort to descriptors identifying reactivesites7 of molecules Instead we employ random number generators that either enable ordisable the formation of an edge between a set of vertices (see the Appendix for moredetails on this topic) In the current version of AutoNetGen only closed systems (noparticle flux into our out of the system between t = 0 and t = tmax) are being generatedThis limitation is introduced here for the sake of simplicity and not for technical reasonsKiNetX is not restricted to these kinds of systems and can also be applied to study opensystems The minimal input requirements for AutoNetGen are (for optional AutoNetGeninput see the Appendix)

bull The number of samples B to be drawn from ΣA

bull A thermostat temperature T for the calculation of rate constants

bull The average free-energy increasedecrease micro(AminusminusA+) for a new intermediate state(minus) formed by reaction of an intermediate state already present in the network (+)

bull The average difference between the free energies of a left-hand-side intermediatestate (+) and its corresponding right-hand-side intermediate state (minus) σ(AminusminusA+)

bull A minimum activation free energy min(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull A maximum activation free energy max(ADagger minus A+minus) with respect to the higher-energy intermediate state

bull The average free-energy uncertainty 〈σA〉 (required for generating an ensemble ofkinetic models)

bull The maximum number of edges Lmax to be generated

3 Results and Discussion

31 Exemplary KiNetX Workflow

In the following we present one full run of KiNetX applied to a reaction networkrandomly generated by AutoNetGen consisting of N = 103 vertices and L = 118 edgeswhich we term CRN-X The exploration started from two reactants 1 and 2 with initialconcentrations of 100 mol Lminus1 and 050 mol Lminus1 respectively The molecular mass

18

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 19: Mechanism Deduction from Noisy Chemical Reaction Networks

of 2 is twice as large as the molecular mass of 1 AutoNetGen generated an ensembleof B + 1 = 25 k-vectors sampled from a covariance matrix representing free-energyuncertainties of (05plusmn 01) kcal molminus1 The resulting uncertainties of the free energies ofactivation amount to (11plusmn 06) kcal molminus1 Rate constants were obtained on the basisof Eyringrsquos transition state theory53 for a temperature T = 29815 K A detailed listcontaining all input values submitted to both AutoNetGen and KiNetX can be found inthe Appendix

Fig 1 shows concentrationndashtime plots for all species that exceeded a concentrationthreshold of 001 mol Lminus1 during the course of reaction We refer to these species asdominant species The 25 different solutions draw a diverse picture In some cases reac-tant 1 is fully consumed at the end of the reaction course and in other cases more than50 of the initial concentration remains at t = tmax Also the potential main product33 reveals final concentrations ranging between about 03 mol Lminus1 and 09 mol Lminus1 theobservable value of which will in turn affect the concentrations of the potential sideproducts 29 32 34 42 and 68

Figure 1 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X before any parameter refinement An ensemble of 25 distinct kineticmodels derived from CRN-X has been considered

Given a concentration error tolerance of εy = 001 mol Lminus1 we find that a DFA-based network reduction with Gmin = 10times 10minus3 mol Lminus1 is reliable with respect to bothδpq(tmax) and ∆pq Here we chose the minimum confidence level of γ = 0 for the sake ofclarity The smaller γ the larger the possible degree of network reduction which leads

19

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 20: Mechanism Deduction from Noisy Chemical Reaction Networks

to more comprehensible reaction mechanisms In an actual setup we would recommendto choose a larger confidence level Note that we chose Gmin to be one magnitude smallerthan εy The reason is that Gmin is assessed on a species-by-species basis whereas εyis compared against the quantities δpq(tmax) and ∆pq which measure the concentrationerror summed over all species One may resolve this situation by dividing Gmin by N which is rather conservative (ie it would lower the possible degree of network reduction)but also more reliable (ie the resulting concentration error would be smaller) Fig 2shows the sparse variant of CRN-X which only consists of 18 vertices and 21 edgescorresponding to about 20 of the network elements contained in the original networkNote that the number of kinetically relevant species identified by our DFA analysis variesbetween 17 and 21 if the 25 solutions are considered individually Hence the explicitconsideration of k-uncertainty has a direct effect on the degree of network reduction inthis case Note that the possible degree of reduction becomes larger (on average) for anincreasing number of vertices and edges

Figure 2 Sparse variant of the exemplary reaction network CRN-X consisting of 18vertices (numbered circles) and 21 edges The center of every edge is represented by asquare which may be interpreted as transition state Two sides of each square are linkedwith either one or two lines which are in turn linked with a single vertex each Theleft-hand-side and right-hand-side vertices of a reaction are never connected with thesame side of a square Vertices corresponding to reactants are black-colored whereas allother vertices are red-colored

The combination of a large y-uncertainty and a still quite entangled network rendersit difficult to suggest a specific reaction mechanism To resolve this issue we conducted aglobal sensitivity analysis based on our extended Morris screening approach For C = 5

20

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 21: Mechanism Deduction from Noisy Chemical Reaction Networks

Morris samples our analysis suggests 9 out of 21 reaction pairs to be critical ie theyyield values for δpq(tmax) andor ∆pq exceeding εy with respect to the quantile specifiedby γ Assessing the 5 Morris samples one by one the number of critical reactions variesbetween 7 and 13 Here we simulated the refinement of rate constants by taking theaverages of the absolute free energies in question (for both vertices and edges) overall 25 samples which resulted in rate constants with zero variance for the 9 criticalreaction pairs The corresponding concentrationndashtime plots are shown in Fig 3 Theinterpretability of the solutions increased significantly Species 33 is indeed the mainproduct with a final concentration of about 08 mol Lminus1 Furthermore species 42 and68 are relevant side products with final concentrations of 02ndash03 mol Lminus1 whereas thefinal concentrations of species 29 32 and 34 are less significant

Figure 3 Concentrationndashtime plots for the dominant species of the exemplary reactionnetwork CRN-X after refinement of 9 pairs of rate constants An ensemble of 25 distinctkinetic models derived from CRN-X has been considered

When analyzing the edge flux quanta ie the edge fluxes for a given time intervaltu minus tuminus1 we can reconstruct the dominant reaction pathways leading to the main andside products (Fig 4) In the very beginning of the reaction 2 dimerizes immediatelyand completely to 4 which in turn immediately and completely dissociates to 9 and10 The formation of 9 enables its reaction with 1 to form 6 and 13 the latter of whichreacts quickly to 33 the main product Of the three reaction channels that lead to theformation of 33 this channel is the most important one The formation of 6 via 1 and 9activates its reaction with 1 to 19 the latter two of which react further to 33 and 68 one

21

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 22: Mechanism Deduction from Noisy Chemical Reaction Networks

of the two dominant side products On a longer time scale 10 reacts to 42mdashthe otherdominant side productmdashvia 30 and 14 In summary the dominant reaction pathwaysread

2 + 2 rarr 4 rarr 9 + 101 + 9 rarr 6 + 1313 rarr 33 + 331 + 6 rarr 191 + 19 rarr 33 + 6810 rarr 30 rarr 14 rarr 42

Figure 4 Sparse variant of CRN-X illustrating the dominant reaction paths (red-colored edges with arrow heads) All species that are part of dominant reaction paths arerepresented by red-colored vertices except for reactant vertices which are black-colored

In a practical setup one would conduct the kinetic analysis presented above not onlyat the end but rather repeatedly during the exploration as many of the intermediatesand transition states may be kinetically irrelevant The number of such redundant statespotentially grows superlinearly with the size of the network since every vertex that canonly be reached via irrelevant channels will be irrelevant as well Combined with the factthat the final network size is unknown a priori the coupling of kinetic modeling withnetwork exploration is generally crucial

Here we applied the algorithm presented in Section 25 to CRN-X but performednetwork reduction and sensitivity analysis only at the end of the exploration As alreadymentioned potentially relevant edges may reveal small fluxes in early stages of the ex-ploration which renders any flux-based network reduction critical Instead of choosing arate threshold as a completeness criterion we stopped the exploration when the solution

22

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 23: Mechanism Deduction from Noisy Chemical Reaction Networks

of the current network did not differ from the solution of the full network by more than εyas measured by δpq(tmax) and ∆pq Hence we needed to switch off the sensitivity analysisduring exploration as it would have led to incomparable solutions

The KiNetX-guided exploration of CRN-X required 17 steps leading to 63 verticesand 68 edges which corresponds to an effective network reduction of about 40 whichis significantly below the 80 we obtained with our DFA algorithm However choosinga conservative exploration strategy that does not make a priori assumptions about theunderlying mechanism one cannot bypass a certain degree of redundancy

32 Relevance of Uncertainty Propagation

To assess the importance of explicitly considering k-uncertainty in kinetic studies of room-temperature solution chemistry we need to examine a multitude of chemical reactionnetworks for which correlated uncertainty information is available Currently the datasituation is still too poor to resort to reaction networks generated with chemistry-basedexploration codes For this reason we randomly created 100 reaction networks withAutoNetGen Each network contains precisely 100 edges and is based on two reactantswith the same initial concentrations and masses introduced in Section 31 All remaininginput values are listed in Table 2 The average number of vertices in these networksexplored without kinetic guidance is 45 (plusmn6) The ensemble-averaged activation free-energy uncertainty amounts to 10 kcal molminus1 with a standard deviation of 04 kcal molminus1This uncertainty range is rather small compared to what we estimated for small-moleculeorganic chemistry with the LClowast-PBE0 density functional ensemble26 We conducted thekinetic analysis in two different ways In the first case we only considered the nominalkinetic model based on k0 We refer to this case as CRN-0 In the second case namedCRN-B we considered the nominal model plus 5 additional sampled models This waywe can estimate the importance of incorporating uncertainty propagation into kineticmodeling The results of both analyses are summarized in Table 1

In case of exploration guidance by KiNetX the average number of vertices and edgesdecreases to 31 and 60 for CRN-B respectively which corresponds to a reduction of net-work elements of about 35 The number of vertices and edges in the case of CRN-0 isslightly smaller (29 and 54 respectively) which is to be expected since the considerationof several instead of a single solution naturally increases the number of potentially rel-evant reaction channels This also indicates why two additional exploration steps werenecessary on average in the CRN-B case Additionally we recorded the maximum forma-tion rate initiating the last exploration for each network The minimum over all networksamounts to 10times 10minus17 mol Lminus1 sminus1 in the CRN-B case If we choose the maximum pos-sible time interval (the time of reaction tmax) we obtain a flux of 37 times 10minus13 mol Lminus1

23

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 24: Mechanism Deduction from Noisy Chemical Reaction Networks

Table 1 Mean values and standard devations of properties obtained as a result ofKiNetX-based analyses on the CRN-0 and CRN-B cases

Property ζ micro(ζCRN-0) micro(ζCRN-B) σ(ζCRN-0) σ(ζCRN-B)

exploration steps 12 14 7 7critical reactions 5 10 4 7Lexpl 54 60 26 27Nexpl 29 31 11 11Lred 30 37 18 21Nred 17 20 8 8δBexpl(tmax) mol Lminus1 mdash 014 mdash 013∆Bexpl mol Lminus1 mdash 015 mdash 013δBrefined(tmax) mol Lminus1 mdash lt001 mdash 001∆Brefined mol Lminus1 mdash lt001 mdash 001

which is very small compared to εy = 001 mol Lminus1 This finding confirms the limitationsof the rate-based algorithm discussed in Section 25 There are cases in which the ratethreshold would need to be chosen so small that nothing is gained by coupling a kineticanalysis to network exploration However one does not know a priori when this situationoccurs

DFA-based network reduction with a flux threshold of Gmin = 10 times 10minus5 mol Lminus1

was successful for 83 networks in the CRN-B case whereas it was successful for only78 networks in the CRN-0 case Similar to our argument of the last paragraph theconsideration of several solutions increases the likelihood to identify kinetically relevantnetwork elements Therefore we also find that the resulting number of vertices and edgesis larger in the CRN-B case after network reduction (including LBA-based reduction)Note that the reduction percentage of approximately 40 (75 compared to the net-works explored without kinetic guidance) is likely to become larger for an increasing sizeof the original network To test this hypothesis we chose the same 100 random seedsemployed for the generation of the 100-edge networks but increased the number of edgesto 500 Neglecting kinetic guidance during exploration and considering only DFA-basedreduction we find an average increase in the reduction of network elements from about75 to 90 which confirms our hypothesis

We find that on average 10 reactions per network are critical in the CRN-B casecorresponding to 28 of the reactions Analogously to the argument provided withregards to the possible degree of network reduction we expect the number of criticalreactions identified for an ensemble of kinetic models to be larger than for a single modelIndeed only 5 reactions per network (on average) were found to be critical in the CRN-0

24

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 25: Mechanism Deduction from Noisy Chemical Reaction Networks

case After global sensitivity analysis and refinement of activation free energies we findan average decrease of max

δB(tmax)∆B

from 015 mol Lminus1 to lt 001 mol Lminus1 for the

CRN-B case which highlights the success of our extended Morris screening approachThe refinement of activation free energies was mimicked by taking the mean value ofabsolute free energies (for both vertices and edges) over all B + 1 = 6 samples for thecritical reactions This way the resulting activation free energies of (at least) the criticalreactions are identical for all kinetic models It is important to manipulate absoluteinstead of relative free energies Otherwise one may induce unphysical scenarios byviolating the necessary condition of microscopic reversibility

Finally we examined the reliability of the solutions obtained for CRN-0 and CRN-Bafter a series of kinetics-guided exploration network reduction and free-energy refine-ment For this purpose we generated an ldquoexactrdquo solution obtained from taking the meanvalue of absolute free energies over all reactions of the full network explored withoutkinetic guidance The average value of max

δpq(tmax)∆pq

comparing the CRN-0 solu-

tions against the ldquoexactrdquo solutions amounts to 157times 10minus3 mol Lminus1 whereas it amountsto only 13 times 10minus3 mol Lminus1 when comparing the ensemble-averaged CRN-B solutionsagainst the ldquoexactrdquo solutions

4 Conclusions and Outlook

In this paper we have demonstrated the strong capability of advanced kinetic modelingtechniques for the deduction of product distributions reaction mechanisms and possiblyother properties of chemical systems from reaction data equipped with correlated uncer-tainty information Our approach is designed (but not limited) to be fed with raw datafrom quantum chemical calculations as we aim to develop a flexible kinetic modelingframework rooted in the first principles of quantum mechanics For this purpose wedeveloped the meta-algorithm KiNetX which carries out kinetic analyses of complexchemical reaction networks in a fully automated manner including guidance for networkexploration hierarchical network reduction and global sensitivity analysis The key fea-ture of KiNetX is that the correlated uncertainties of the model parameters (activationfree energies or rate constants) are rigorously propagated through all steps of the kineticmodeling workflow

We demonstrated the entire workflow of KiNetX at a reaction network generatedwith AutoNetGen an algorithm which constructs artificial reaction networks by encodingchemical logic into their underlying graph structure Our results show that KiNetX cansystematically interpret noisy concentration data to guide the exploration of reactionspace and identify regions in a network that require more accurate free-energy datawithout the need to carry out high-accuracy quantum chemical calculations for all species

25

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 26: Mechanism Deduction from Noisy Chemical Reaction Networks

considered Furthermore we showed that the dominant reaction pathways can be reliablydeduced as a result of these efforts To study the reliability increase by incorporatinguncertainty quantification into the kinetic modeling framework we were faced with thechallenge to generate a multitude of reaction networks for covering a wide chemicalspectrum which is very time-consuming regarding the number of quantum chemicalcalculations required for this purpose With the development of AutoNetGen we wereable to examine a large number of distinct chemistry-like scenarios in short time Here weconsidered 100 networks consisting of 100 edges and 45 (plusmn6) vertices and distinguishedbetween two cases In the first case we only considered a single kinetic model pernetwork In the second case we considered an additional number of 5 kinetic modelsper network Our results suggest despite the small number of samples considered in thesecond case that the rigorous propagation of uncertainty through all steps of a kineticmodeling study can significantly increase the reliability of conclusions here by a factorof 10

While the findings are appealing we understand that the use of chemistry-mimickingreaction networks requires a careful analysis to ensure that important network propertiesmet in actual chemical scenarios are captured Otherwise it may be ambiguous to gen-eralize our conclusions drawn from these artificial cases At the same time it is difficultto draw general conclusions about certain network propertiesmdash such as the percentageof critical (highly noise-inducing) reactions the potential degree of network reduction orthe number of network layers required to correctly account for all kinetically relevant re-action stepsmdashas they may be strongly dependent on the underlying graph structure thedistribution of activation free energies rate constants as well as their correlated uncer-tainties It is not known to us that the literature on solution chemistry (or chemistry ingeneral) offers enough data in this direction to reliably evaluate this dependency Recentresults from our group suggest that free-energy uncertainty may induce almost arbitrarymagnitudes of concentration noise26 We developed AutoNetGen to offer a more general(ie statistics-based) perspective on this important issue despite a poor data situationThere are a few arguments in favor of our artificial chemistry-mimicking reaction net-works First AutoNetGen is in principle able to generate every network graph thatcorresponds to a specific chemical reaction characterized by elementary processes with amolecularity smaller than three Second we coordinated the range of activation free en-ergies (0ndash100 kJ molminus1 with respect to the higher-energy intermediate) with the reactiontime tmax which we set to the half life of a unimolecular rate constant correspondingto a barrier height of 100 kJ molminus1 This way we avoid that the network reductionprocedure merely deletes reactions because of activation barriers that are too high inenergy Note that in actual chemical systems several activation barriers may be muchhigher in energy and hence we expect the degree of network reduction to be rather small

26

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 27: Mechanism Deduction from Noisy Chemical Reaction Networks

compared to actual chemical scenarios Third the activation free-energy uncertaintiesstudied here amount to (10 plusmn 04) kcal molminus1 which is rather small compared to whatwe found for actual activation barriers obtained from DFT calculations26 Fourth ouranalysis consistently suggests that the explicit consideration of ensembles of kinetic mod-els improves the reliability of conclusions for a diverse range of networks In total thereis some indication that our chemistry-mimicking reaction networks provide a trend forwhat is to be expected if rigorous uncertainty propagation is incorporated in the kineticanalysis of actual chemical systems Certainly the application of KiNetX to real-worldexamples (not only by us but by the entire community) is our long-term goal but it isoutside the scope of this paper In future work we will couple KiNetX to our automatednetwork exploration program Chemoton16 to assess the performance of KiNetX withrespect to relevant examples such as the very challenging autocatalytic formose reactionBoth projects are part of our developments for a new kind of computational quantumchemistry (SCINE)46

Acknowledgments

The authors gratefully acknowledge financial support from the Swiss National ScienceFoundation (project no 200020_169120)

Appendix

Details on AutoNetGen

We require AutoNetGen to yield a fully connected network representing unimolecular re-actions (isomerization and dissociation) as well as bimolecular reactions (dissociation andsubstitution) AutoNetGen requires the specification of reactants including their massesThe following steps are carried out by AutoNetGen for the construction of artificial reac-tion networks

1 For the construction of the (x + 1)-th network layer we first define all potentiallyreactive intermediate states At that stage we register N = N0 + middot middot middot+Nx verticesSince we already explored all possible reactions between the first N minus Nx specieswe will only consider the following potentially reactive intermediate states Nx

unimolecular intermediate states (leading to reactions of type A rarr P) defined bythe Nx species of the x-th network layer Nx homobimolecular intermediate states(type 2A rarr P) and Nx(Nx minus 1)2 + Nx(N minusNx) heterobimolecular intermediatestates (type A + Brarr P) The first Nx(Nxminus1)2 heterobimolecular states represent

27

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 28: Mechanism Deduction from Noisy Chemical Reaction Networks

all possible nonredundant combinations of the Nx new species among each otherwhereas the latter Nx(N minusNx) represent all possible combinations between the Nx

new species and the N minusNx old species

2 For each of the potentially reactive intermediate states we draw from an exponen-tial distribution with mean microexp The user can specify two different values for microexpone for unimolecular reactions microexpuni and one for bimolecular reactions microexpbiThe floor value of the sampled value equals the number of reaction channels tobe explored from the current reactive intermediate state The specific number ofreaction channels determines how many distinct reactive transformations of thecorresponding intermediate state will occur and therefore how many new verticeswill be formed or vertices of an earlier layer will be reconnected The user can con-trol the tendency to generate new vertices over linking to vertices of earlier layersvia the parameter p(Vnew) ranging from zero to one If p(Vnew) = 0 AutoNetGenwill never generate a new vertex whereas it will never link to a vertex of an earlierlayer if p(Vnew) = 1 AutoNetGen ensures of course that the mass on both sides ofa reaction is always identical

3 When the exploration stops (eg when a user-defined number of edges is reached)2L activation free energies will be calculated The absolute free energy of theproduct (or right-hand) side of an edge (comprises either one or two vertices) issampled from a normal distribution with mean micro(AminusminusA+) and standard deviationσ(AminusminusA+) which is added to the absolute free energy of the reactant (or left-hand)side of that edge The absolute free energy of an edge is sampled from a uniformdistribution with bounds min(ADagger minusA+minus) and max(ADagger minusA+minus) which is added tothe higher-energy side of an edge The differences between edge and vertex energiesare the activation free energies of the underlying reaction network

4 We introduce free-energy uncertainties in two ways First we sample vertex freeenergies from their nominal values and the covariance matrix ΣA as described inthe next section Second for a given reaction and a given sample we add themean value of the free-energy changes in the two connected intermediate states(compared to their nominal values) to the free energy of the corresponding edgeand add another contribution to it which is sampled from a normal distributionwith zero mean and standard deviation 2〈σA〉 In case the resulting activation freeenergy becomes negative its absolute value will be chosen

Subsequently AutoNetGen transforms all activation free energies to rate constants

28

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 29: Mechanism Deduction from Noisy Chemical Reaction Networks

based on the Eyring equation53

k+minusl =

kBT

hexp

(minus ADaggerl minus A

+minusl

RT

) (29)

where h kB R and T are Planckrsquos constant Boltzmannrsquos constant the gas constantand a user-defined constant temperature respectively

Random Construction of Covariance Matrices

Covariance matrices are symmetric positive-semidefinite square matrices by definitionie their eigenvalues are strictly nonnegative We outline a simple recipe to constructa random covariance matrix ΣA which fulfills the condition that its largest eigenvalueequals σ2

Amax Defining σ2

Amaxto be the largest eigenvalue ensures that all activation

free-energy uncertainties are bound by σAmax

1 Generate a random (NtimesN)-dimensional matrix P with elements that are uniformlysampled from [minus05+05] N is the number of vertices in the network

2 The multiplication of P with its transpose leads to a (N timesN)-dimensional matrixQ = PPgt that already is a covariance matrix65 with eigenvalues Eii

3 The covariance matrix ΣA results from a rescaling of Q

ΣA = Q〈σA〉2

maxEii

which implies that the largest eigenvalue of ΣA equals 〈σA〉2

Note that in a practical setup we expect the user to provide B+ 1 k-vectors derivedfrom an ensemble of quantum chemical models instead of sampling the correspondingactivation free energies from a random (and hence problem-unrelated) covariance ma-trix However as it is current practice to generate a single set of activation free energiesinstead of an ensemble of them we encourage users to employ these random covariancematrices to develop an intuition for the potential effect of free-energy uncertainty on thekinetics of complex chemical systems

29

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 30: Mechanism Deduction from Noisy Chemical Reaction Networks

Control Parameters for AutoNetGen and KiNetX

Table 2 provides an overview of all parameters that are required for AutoNetGen andKiNetX on input and contains the values chosen for the three cases discussed in thiswork

Table 2 Input parameters for AutoNetGen and KiNetX and their values chosen forthis study

Input parameter CRN-X CRN-0 CRN-B

AutoNetGen

B + 1 25 1 6T K 29815 29815 29815micro(Aminus minus A+) (kJ molminus1) minus25 minus25 minus25σ(Aminus minus A+) (kJ molminus1) 50 50 50min(ADagger minus A+minus) (kJ molminus1) 0 0 0max(ADagger minus A+minus) (kJ molminus1) 100 100 100〈σA〉 (kJ molminus1) 209 209 209Lmax 125 100 100microexp(Nuni) 5 5 5microexp(Nbi) 2 2 2p(Vnew) 05 01 01

KiNetX

tmax s 36954 36954 36954εy (mol Lminus1) 001 001 001Gmin (mol Lminus1) 10times 10minus3 10times 10minus5 10times 10minus5

U 1000 1000 1000γ 0 1 1C 5 1 5

References

[1] Broadbelt L J Stark S M Klein M T Computer Generated Pyrolysis Model-ing On-the-Fly Generation of Species Reactions and Rates Ind Eng Chem Res1994 33 790ndash799

[2] Kayala M A Baldi P ReactionPredictor Prediction of Complex Chemical Reac-tions at the Mechanistic Level Using Machine Learning J Chem Inf Model 201252 2526ndash2540

30

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 31: Mechanism Deduction from Noisy Chemical Reaction Networks

[3] Maeda S Ohno K Morokuma K Systematic Exploration of the Mechanism ofChemical Reactions The Global Reaction Route Mapping (GRRM) Strategy Usingthe ADDF and AFIR Methods Phys Chem Chem Phys 2013 15 3683ndash3701

[4] Zimmerman P M Automated Discovery of Chemically Reasonable Elementary Re-action Steps J Comput Chem 2013 34 1385ndash1392

[5] Rappoport D Galvin C J Zubarev D Y Aspuru-Guzik A Complex ChemicalReaction Networks from Heuristics-Aided Quantum Chemistry J Chem TheoryComput 2014 10 897ndash907

[6] Wang L-P Titov A McGibbon R Liu F Pande V S Martiacutenez T JDiscovering Chemistry with an Ab Initio Nanoreactor Nat Chem 2014 6 1044ndash1048

[7] Bergeler M Simm G N Proppe J Reiher M Heuristics-Guided Explorationof Reaction Mechanisms J Chem Theory Comput 2015 11 5712ndash5722

[8] Doumlntgen M Przybylski-Freund M-D Kroumlger L C Kopp W A Ismail A ELeonhard K Automated Discovery of Reaction Pathways Rate Constants andTransition States Using Reactive Molecular Dynamics Simulations J Chem TheoryComput 2015 11 2517ndash2524

[9] Habershon S Sampling Reactive Pathways with Random Walks in Chemical SpaceApplications to Molecular Dissociation and Catalysis J Chem Phys 2015 143094106

[10] Suleimanov Y V Green W H Automated Discovery of Elementary ChemicalReaction Steps Using Freezing String and Berny Optimization Methods J ChemTheory Comput 2015 11 4248ndash4259

[11] Zimmerman P M Navigating Molecular Space for Reaction Mechanisms An Effi-cient Automated Procedure Mol Simul 2015 41 43ndash54

[12] Dewyer A L Zimmerman P M Finding Reaction Mechanisms Intuitive or Oth-erwise Org Biomol Chem 2017 15 501ndash504

[13] Gao C W Allen J W Green W H West R H Reaction Mechanism Gen-erator Automatic Construction of Chemical Kinetic Mechanisms Comput PhysCommun 2016 203 212ndash225

[14] Habershon S Automated Prediction of Catalytic Mechanism and Rate Law UsingGraph-Based Reaction Path Sampling J Chem Theory Comput 2016 12 1786ndash1798

31

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 32: Mechanism Deduction from Noisy Chemical Reaction Networks

[15] Wang L-P McGibbon R T Pande V S Martinez T J Automated Discov-ery and Refinement of Reactive Molecular Dynamics Pathways J Chem TheoryComput 2016 12 638ndash649

[16] Simm G N Reiher M Context-Driven Exploration of Complex Chemical ReactionNetworks J Chem Theory Comput 2017 13 6108ndash6119

[17] Doumlntgen M Schmalz F Kopp W A Kroumlger L C Leonhard K AutomatedChemical Kinetic Modeling via Hybrid Reactive Molecular Dynamics and QuantumChemistry Simulations J Chem Inf Model 2018 58 1343ndash1355

[18] Dewyer A L Arguumlelles A J Zimmerman P M Methods for Exploring ReactionSpace in Molecular Systems WIREs Comput Mol Sci 2018 8 e1354

[19] Frederiksen S L Jacobsen K W Brown K S Sethna J P Bayesian EnsembleApproach to Error Estimation of Interatomic Potentials Phys Rev Lett 2004 93165501

[20] Mortensen J J Kaasbjerg K Frederiksen S L Noslashrskov J K Sethna J PJacobsen K W Bayesian Error Estimation in Density-Functional Theory PhysRev Lett 2005 95 216401

[21] Petzold V Bligaard T Jacobsen K W Construction of New Electronic DensityFunctionals with Error Estimation Through Fitting Top Catal 2012 55 402ndash417

[22] Wellendorff J Lundgaard K T Moslashgelhoslashj A Petzold V Landis D DNoslashrskov J K Bligaard T Jacobsen K W Density Functionals for SurfaceScience Exchange-Correlation Model Development with Bayesian Error EstimationPhys Rev B 2012 85 235149

[23] Medford A J Wellendorff J Vojvodic A Studt F Abild-Pedersen FJacobsen K W Bligaard T Noslashrskov J K Assessing the Reliability of CalculatedCatalytic Ammonia Synthesis Rates Science 2014 345 197ndash200

[24] Pandey M Jacobsen K W Heats of Formation of Solids with Error EstimationThe mBEEF Functional with and without Fitted Reference Energies Phys Rev B2015 91 235201

[25] Simm G N Reiher M Systematic Error Estimation for Chemical Reaction En-ergies J Chem Theory Comput 2016 12 2762ndash2773

[26] Proppe J Husch T Simm G N Reiher M Uncertainty Quantification forQuantum Chemical Models of Complex Reaction Networks Faraday Discuss 2016195 497ndash520

32

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 33: Mechanism Deduction from Noisy Chemical Reaction Networks

[27] Pernot P The Parameter Uncertainty Inflation Fallacy J Chem Phys 2017 147104102

[28] Simm G N Reiher M Error-Controlled Exploration of Chemical Reaction Net-works with Gaussian Processes J Chem Theory Comput 2018 14 5238ndash5248

[29] Bishop C M Pattern Recognition and Machine Learning Springer New York2006

[30] Althorpe S C et al Fundamentals General Discussion Faraday Discuss 2016195 139ndash169

[31] Althorpe S C et al Non-Adiabatic Reactions General Discussion Faraday Dis-cuss 2016 195 311ndash344

[32] Angulo G et al New Methods General Discussion Faraday Discuss 2016 195521ndash556

[33] Althorpe S et al Application to Large Systems General Discussion Faraday Dis-cuss 2016 195 671ndash698

[34] Turaacutenyi T Tomlin A S Analysis of Kinetic Reaction Mechanisms SpringerBerlin 2014

[35] Kee R J Miller J A Jefferson T H ldquoCHEMKIN A General-Purpose Problem-Independent Transportable FORTRAN Chemical Kinetics Code Packagerdquo Tech-nical Report SANDndash80-8003 Sandia National Labs Livermore CA United States1980

[36] Kee R J Rupley F M Miller J A ldquoChemkin-II A Fortran Chemical KineticsPackage for the Analysis of Gas-Phase Chemical Kineticsrdquo Technical Report SAND-89-8009 Sandia National Labs Livermore CA United States 1989

[37] Kee R J Rupley F M Meeks E Miller J A ldquoCHEMKIN-III A FORTRANChemical Kinetics Package for the Analysis of Gas-Phase Chemical and PlasmaKineticsrdquo Technical Report SAND-96-8216 Sandia National Labs Livermore CAUnited States 1996

[38] Goodwin D G Moffat H K Speth R L ldquoCantera An Object-Oriented SoftwareToolkit for Chemical Kinetics Thermodynamics and Transport Processesrdquo Version230 DOI 105281zenodo170284 httpwwwcanteraorg2017

33

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 34: Mechanism Deduction from Noisy Chemical Reaction Networks

[39] Hoops S Sahle S Gauges R Lee C Pahle J Simus N Singhal M Xu LMendes P Kummer U COPASImdasha COmplex PAthway SImulator Bioinformatics2006 22 3067ndash3074

[40] Gillespie D T Stochastic Simulation of Chemical Kinetics Annu Rev Phys Chem2007 58 35ndash55

[41] Glowacki D R Liang C-H Morley C Pilling M J Robertson S H MES-MER An Open-Source Master Equation Solver for Multi-Energy Well Reactions JPhys Chem A 2012 116 9545ndash9560

[42] Shannon R Glowacki D R A Simple ldquoBoxed Molecular Kineticsrdquo Approach ToAccelerate Rare Events in the Stochastic Kinetic Master Equation J Phys ChemA 2018 122 1531ndash1541

[43] Glowacki D R Lockhart J Blitz M A Klippenstein S J Pilling M JRobertson S H Seakins P W Interception of Excited Vibrational QuantumStates by O2 in Atmospheric Association Reactions Science 2012 337 1066ndash1069

[44] Glowacki D R Liang C H Marsden S P Harvey J N Pilling M J AlkeneHydroboration Hot Intermediates That React While They Are Cooling J AmChem Soc 2010 132 13621ndash13623

[45] Goldman L M Glowacki D R Carpenter B K Nonstatistical Dynamics inUnlikely Places [15] Hydrogen Migration in Chemically Activated CyclopentadieneJ Am Chem Soc 2011 133 5312ndash5318

[46] ldquoSCINE Software for Chemical Interaction Networksrdquo ETH Zuumlrich Zuumlrich Switzer-land httpwwwreiherethzchsoftwarescinehtml

[47] Grimmett G Stirzaker D Probability and Random Processes Oxford UniversityPress New York 2nd ed 1992

[48] Kirk P D W Stumpf M P H Gaussian Process Regression BootstrappingExploring the Effects of Uncertainty in Time Course Data Bioinformatics 200925 1300ndash1306

[49] Sutton J E Guo W Katsoulakis M A Vlachos D G Effects of Corre-lated Parameters and Uncertainty in Electronic-Structure-Based Chemical KineticModelling Nat Chem 2016 8 331ndash337

[50] Ramakrishnan R Dral P O Rupp M von Lilienfeld O A Quantum Chem-istry Structures and Properties of 134 Kilo Molecules Sci Data 2014 1 140022

34

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 35: Mechanism Deduction from Noisy Chemical Reaction Networks

[51] Pernot P Cailliez F A Critical Review of Statistical CalibrationPrediction Mod-els Handling Data Inconsistency and Model Inadequacy AIChE J 2017 63 4642ndash4665

[52] Proppe J Reiher M Reliable Estimation of Prediction Uncertainty for Physico-chemical Property Models J Chem Theory Comput 2017 13 3297ndash3317

[53] Eyring H The Activated Complex in Chemical Reactions J Chem Phys 19353 107ndash115

[54] Meijer E Matrix Algebra for Higher Order Moments Linear Algebra Appl 2005410 112ndash134

[55] ldquoMATLABrdquo Release 2018a The MathWorks Inc Natick MA United States 2018

[56] Shampine L Reichelt M The MATLAB ODE Suite SIAM J Sci Comput 199718 1ndash22

[57] Morris M D Factorial Sampling Plans for Preliminary Computational ExperimentsTechnometrics 1991 33 161ndash174

[58] Besora M Maseras F Microkinetic Modeling in Homogeneous Catalysis WIREsComput Mol Sci 2018 e1372 DOI 101002wcms1372

[59] Rasmussen C E Williams C K I Gaussian Processes for Machine LearningThe MIT Press Cambridge MA 2006

[60] Susnow R G Dean A M Green W H Peczak P Broadbelt L J Rate-BasedConstruction of Kinetic Models for Complex Systems J Phys Chem A 1997 1013731ndash3740

[61] Han K Green W H West R H On-the-Fly Pruning for Rate-Based ReactionMechanism Generation Comput Chem Eng 2017 100 1ndash8

[62] Song J Building Robust Chemical Reaction Mechanisms Next Generation of Au-tomatic Model Construction Software Doctoral thesis Massachusetts Institute ofTechnology 2004

[63] Jalan A West R H Green W H An Extensible Framework for CapturingSolvent Effects in Computer Generated Kinetic Models J Phys Chem B 2013117 2955ndash2970

[64] Grenda J M Androulakis I P Dean A M Green W H Application ofComputational Kinetic Mechanism Generation to Model the Autocatalytic Pyrolysisof Methane Ind Eng Chem Res 2003 42 1000ndash1010

35

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook
Page 36: Mechanism Deduction from Noisy Chemical Reaction Networks

[65] Horn R A Johnson C R Matrix Analysis Cambridge University Press Cam-bridge 1990

36

  • 1 Introduction
  • 2 Theoretical and Algorithmic Details
    • 21 Kinetic Modeling from a Network Perspective
    • 22 Uncertainty Quantification in Kinetic Modeling
    • 23 Overview of the KiNetX Meta-Algorithm
    • 24 The KiNetX Workflow
      • 241 Uncertainty Propagation
      • 242 Network Reduction
      • 243 Sensitivity Analysis
        • 25 KiNetX-Guided Network Exploration
        • 26 Chemistry-Mimicking Networks from AutoNetGen
          • 3 Results and Discussion
            • 31 Exemplary KiNetX Workflow
            • 32 Relevance of Uncertainty Propagation
              • 4 Conclusions and Outlook