-
Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Contents lists available at ScienceDirect
Neurocomputing
http://d0925-23
n CorrMoffett
E-m
Pleasadva
journal homepage: www.elsevier.com/locate/neucom
Stochastic gradient based extreme learning machines for stable
onlinelearning of advanced combustion engines
Vijay Manikandan Janakiraman a,n, XuanLong Nguyen b, Dennis
Assanis c
a Department of Mechanical Engineering, University of Michigan,
Ann Arbor, MI, USAb Department of Statistics, University of
Michigan, Ann Arbor, MI, USAc Stony Brook University, NY, USA
a r t i c l e i n f o
Article history:Received 6 May 2015Received in revised form1
November 2015Accepted 7 November 2015
Keywords:Online learningExtreme learning machineSystem
identificationLyapunov stabilityEngine controlOperating
envelope
x.doi.org/10.1016/j.neucom.2015.11.02412/& 2015 Elsevier
B.V. All rights reserved.
esponding author. Presently at: NASA AmesField, CA 94035-1000,
USA. Tel.: þ1 734 358ail address: [email protected] (V.M.
Janakirama
e cite this article as: V.M. Janakiramnced combustion engines,
Neurocom
a b s t r a c t
We propose and develop SG-ELM, a stable online learning
algorithm based on stochastic gradients andExtreme Learning
Machines (ELM). We propose SG-ELM particularly for systems that are
required to bestable during learning; i.e., the estimated model
parameters remain bounded during learning. We use aLyapunov
approach to prove both asymptotic stability of estimation error and
boundedness in the modelparameters suitable for identification of
nonlinear dynamic systems. Using the Lyapunov approach, wedetermine
an upper bound for the learning rate of SG-ELM. The SG-ELM
algorithm not only guarantees astable learning but also reduces the
computational demand compared to the recursive least squaresbased
OS-ELM algorithm (Liang et al., 2006). In order to demonstrate the
working of SG-ELM on a real-world problem, an advanced combustion
engine identification is considered. The algorithm is applied totwo
case studies: An online regression learning for system
identification of a Homogeneous ChargeCompression Ignition (HCCI)
Engine and an online classification learning (with class imbalance)
foridentifying the dynamic operating envelope. The case studies
demonstrate that the accuracy of theproposed SG-ELM is comparable
to that of the OS-ELM approach but adds stability and a reduction
incomputational effort.
& 2015 Elsevier B.V. All rights reserved.
1. Introduction
Homogeneous Charge Compression Ignition (HCCI) Engines areof
significant interest to the automotive industry owing to
theirability to reduce emissions and fuel consumption
significantlycompared to existing methods such as spark ignition
(SI) andcompression ignition (CI) engines [1–3]. Although HCCI
enginestend to do well in laboratory controlled tests, practical
imple-mentation is quite challenging because HCCI engines do not
have adirect trigger for ignition (such as spark in SI or fuel
injection inCI). Further, HCCI requires some special engine designs
such asexhaust gas recirculation (EGR) [4], variable valve timings
(VVT)[5], intake charge heating [6] among others. Such
advanceddesigns also increase the complexity of the engine
operationmaking it unstable and extremely sensitive to operational
dis-turbances [7,8]. A model based control is typically opted to
addressthe challenges involved in controlling HCCI [9,5,10]. For
modeldevelopment, both physics based approaches [9,5,10] and
data
Research Center, MS 269-1,6633.n).
an, et al., Stochastic gradienputing (2015), http://dx.do
based approaches [11–14] were shown to be effective. A
keyrequirement for a model based control is the ability of the
modelsto accurately predict the engine state variables for several
oper-ating cycles ahead of time, so that a control action with a
knownconsequence can be applied to the engine. Further, in order to
bevigilant against the engine drifting towards instabilities such
asmisfire, ringing, knock, etc. [15,16], the operating limits of
theengine particularly in transients, is required to be known. In
orderto develop controllers and operate the engine in a stable
manner,both models of the engine state variables as well as the
operatingenvelope are necessary.
Data based modeling approaches for the HCCI engine
statevariables and dynamic operating envelope were
demonstratedusing neural networks [11], support vector machines
[12], extremelearning machines [13,14] by the authors. However,
previousresearch considered an offline approach where the data
collectedfrom engine experiments were taken offline and models
weredeveloped using computer workstations that had high
processingand memory. However, a key requirement in advancing
HCCImodeling is to perform online learning for the following
reasons.The models developed offline are valid only in the
controlledexperimental conditions. For instance, the experiments
are per-formed at a controlled ambient temperature, pressure
and
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
www.sciencedirect.com/science/journal/09252312www.elsevier.com/locate/neucomhttp://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024mailto:[email protected]://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎2
humidity conditions. As a result, the models developed are
validfor the specified conditions and when the models are
imple-mented on a vehicle, the expectation is that the model works
on awide range of climatic conditions that the vehicle is exposed
to,possibly on conditions that were not experimented. Hence,
anonline adaptation to learn the behavior of the system at
new/unfamiliar situations is required. Also, since the offline
models aredeveloped directly from experimental data, they may
performpoorly in certain operating regions where the density of
experi-mental data is low. As more data becomes available in
suchregions, an online mechanism can be used to adapt to such data.
Inaddition, the engine produces high velocity streaming
data;operating at about 2500 revolutions per minute, an
in-cylinderpressure sensor can produce about 1.8 million data
observationsper day. It becomes infeasible to store this volume of
data foroffline model development. Thus, an online learning
frameworkthat processes every data observation, updates the model
anddiscards the data is required for advanced engines like
HCCI.
Online learning, as the name suggests, refers to obtaining
amodel online; i.e., learning happens when the system is
inoperation and as data is streaming in. Typically, the learning
issequential; i.e., the data from the system is processed
one-by-oneor batch-by-batch and the model parameters are updated. A
dataprocessor on-board a combustion engine usually is low on
com-putation power and memory. Thus, simple linear models used tobe
the natural choice for combustion engines. However, for asystem
like the HCCI engine, linear models may be insufficient tocapture
the complex dynamics, particularly for predicting severalsteps
ahead in time [11]. While numerous nonlinear methods foronline
learning do exist in machine learning literature, a completesurvey
is beyond the scope of this paper. The recent paper ononline
sequential extreme learning machines (OS-ELM) [17] sur-veys popular
online learning algorithms in the context of classifi-cation and
regression and develops an efficient algorithm based onrecursive
least squares. The OS-ELM algorithm appears to be thepresent state
of the art (although some variants have been pro-posed such as
[18–20]) for classification/regression problemsachieving a global
optimal solution, high generalization accuraciesand most
importantly, in quick time. Also, based on observationsfrom our
previous work [21], we choose extreme learningmachines (ELM) over
other popular methods such as neural net-works and support vector
machines for the HCCI engine problem.It has been shown that both
polynomial and linear methods wereinferior in terms of prediction
accuracy [12,11] although they havesimple algorithms suitable for
online applications. The onlinevariants of SVM usually work by
approximating the batch (offline)loss function so that data can be
processed sequentially [22,23]and achieve accuracies similar to
that of the offline learningcounterparts. However, SVMs Come with a
high computation andmemory requirement to be used efficiently on a
memory limitedsystem such as the engine control unit [13]. Thus we
prefer ELMover SVM and other state of the art nonlinear models.
In spite of its known advantages, an over-parameterized ELMmay
suffer from ill-conditioning problem when a recursive leastsquares
type update is performed (as in OS-ELM). This sometimesresults in
poor regularization behavior as reported in[24,25,20,26,27], which
leads to an unbounded growth of themodel parameters and unbounded
model predictions. This maynot be a serious problem for many
applications as the modelusually improves as more data becomes
available. However, forcontrol problems in particular, if decisions
are made simulta-neously based on the online learned model (as in
adaptive control[28]), it is critical that the parameter estimation
algorithm behavesin a stable manner so that control actions can be
trusted at alltimes. Hence a guarantee of stability and parameter
boundednessis of extreme importance. To address this issue, we
propose the
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
SG-ELM, a stable online learning algorithm based on
stochasticgradient descent and extreme learning machines. By
extendingELM to include a notion of stable learning, we hope that
thesimplicity and generalization power of ELM can be retained
alongwith stability of identification, suitable for real-time
controlapplications. We use a Lyapunov approach to prove both
asymp-totic stability of estimation error and boundedness in the
esti-mated parameters suitable for identification of nonlinear
dynamicsystems. Using the Lyapunov approach, we determine an
upperbound for the learning rate of SG-ELM that seems to avoid
badregularization that may arise during online learning. These are
themain contributions of this paper. Further, we also apply the
SG-ELM algorithm to two real-world HCCI identification
problemsincluding online state estimation and online operating
boundaryestimation which is a novel application of online extreme
learningmachines.
The remainder of the article is organized as follows. The
ELMmodeling approach is described in Section 2 along with
algorithmdetails on batch (offline) learning as well as the present
state ofthe art; the OS-ELM algorithm. In Section 3, the stochastic
gradientbased ELM algorithm is derived along with a stability
proof. InSection 4, the background on HCCI engine and
experimentationare discussed. Sections 5 and 6 cover the
discussions on theapplication of the SG-ELM algorithm on the two
applications,followed by conclusions in Section 7.
2. Extreme learning machines
Extreme Learning Machine (ELM) is an emerging learningparadigm
for multi-class classification and regression problems[29,30]. An
advantage of the ELM method is that the trainingspeed is extremely
fast, thanks to the random assignment of inputlayer parameters
which do not require adaptation to the data. Insuch a setup, the
output layer parameters can be analyticallydetermined using a
linear least squares approach. Some of theattractive features of
ELM [29] include the universal approxima-tion capability of ELM,
the convex optimization problem of ELMresulting in the smallest
training error without getting trapped inlocal minima, closed form
solution of ELM eliminating iterativetraining and better
generalization capability of ELM [30]. In com-parison, a
backpropagation neural network has the same objectivefunction as
that of ELM but they often get trapped in local minimawhereas ELM
do not. Support vector machines on the other hand,solves a convex
optimization problem but the computationinvolved is quite high and
running times are slow for large data-sets. Thus, ELM appears to be
very efficient both in terms ofaccuracy and running times compared
to several state-of-the-artalgorithms.
Consider the following data set
fðx1; y1Þ;…; ðxN ; yNÞgA X ;Yð Þ; ð1Þwhere N denotes the number
of training samples, X denotes thespace of the input features and Y
denotes labels whose naturedifferentiate the learning problem in
hand. For instance, if Y takesinteger values ð1;2;3 ,.. g then the
problem is referred to as clas-sification and if Y takes real
values, it becomes a regression pro-blem. ELMs are well suited for
solving both regression and clas-sification problems faster than
state of the art algorithms [30]. Afurther distinction could be
made depending on the availability oftraining data during the
learning process, as offline learning (orbatch learning) and online
learning (or sequential learning). Off-line learning could make use
of all training data simultaneously asall data is available to the
algorithm and time is generally not alimiting factor. So it is
possible to have the model see the dataseveral times (iterations)
so that the best accuracy can be
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3
achieved. On the other hand, there may be situations where
offlinelearning becomes infeasible and one has to resort to
onlinelearning, such as those involving high velocity steaming
datawhere time taken for learning becomes a bottleneck. In an
onlinelearning setting, data is available one-by-one or
batch-by-batchand needs to be processed with limited computational
effort andstorage. Further, inference is required to be made with
each newavailable data along with the ones recorded in the
past.
2.1. Batch (Offline) ELM
When the entire data is available and a model is required to
betrained using all the available data, batch learning is adopted.
Inthis case, the ELM algorithm involves solving the following
opti-mization problem similar to that of a ridge regression
minW
JHW�Y J2þλJW J2n o
ð2Þ
HT ¼ψ ðWTr xðkÞþbrÞARnh�1; ð3Þwhere λ represents the
regularization coefficient determinedusing cross-validation, Y
represents the vector of labels, ψ repre-sents the hidden layer
activation function (sigmoidal, sinusoidal,radial basis, etc. [30])
and WrARn�nh , WARnh�yd represents theinput and output layer
parameters respectively. Here, n representsthe dimension of inputs
xðkÞ, nh represents the number of hiddenneurons of the ELM model, H
represents the hidden layer outputmatrix and yd represents the
dimension of outputs Y . The matrixWr consists of randomly assigned
values that map the input vectorto a high dimensional feature space
while brARnh is a bias com-ponent assigned in a randommanner
similar toWr . The number ofhidden neurons determines the
expressive power of the trans-formed feature space. The values of
Wr and br can be assignedbased on any continuous random
distribution [30] and remainsfixed during the learning process.
Hence the training reduces to asingle step calculation given by Eq.
(4). The ELM decisionhypothesis can be expressed as in Eq. (5) for
classification and as inEq. (6) for regression. It should be noted
that the hidden layer andthe corresponding activation functions
give a nonlinear mappingof the data, which if eliminated, the ELM
model becomes a linearleast squares (Linear LS) model and is
considered as one of thebaseline models in this study.
Wn ¼ HTHþλI� ��1
HTY ð4Þ
f ðxÞ ¼ sgn WT ½ψ ðWTr xþbrÞ�� �
: ð5Þ
f ðxÞ ¼WT ½ψ ðWTr xþbrÞ� ð6ÞSince training involves solving a
linear least squares with a
convex objective function, the solution obtained by ELM is
extre-mely fast and is a global optimum for the chosen nh, Wr and
br .The above formulation for classification (5) is not designed
tohandle imbalanced or skewed data sets [13]. As a modification
toweigh the minority class data more, a simple weighting methodcan
be incorporated in the ELM objective function (2) as
minW
ðHW�YÞTΓðHW�YÞþλWTWn o
ð7Þ
Γ ¼
γ1 0 : : 00 γ2 : : 0: : : : 00 0 : : γN
266664
377775
γi ¼1 majority class datar � f s minority class data
(ð8Þ
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
where Γ represents the weight matrix, r represents the ratio
ofnumber of majority class data to number minority class data and f
srepresents a scaling factor to be tuned for a given data set
[13].This results in the training step given by Eq. (9) and the
decisionhypothesis takes the same form as in Eq. (5):
Wn ¼ HTΓHþλI� ��1
HTΓY : ð9Þ
2.2. Online Sequential ELM (OS-ELM)
The OS-ELM [17] is a recursive version of the batch ELM
algo-rithm. This version of the algorithm is used for online
learningpurposes where data is processed one-by-one or
batch-by-batchand the model parameters are updated after which the
data is notrequired to be stored. In this process, training
involves two steps –an initialization step and a sequential
learning step. During theinitialization step, a set of data
observations ðN0Þ are required toinitialize H0 and W0 by solving
the batch ELM optimization pro-blem as follows
minW0
JH0W0�Y0 J2þλJW0 J2n o
ð10Þ
H0 ¼ ½gðWTr x0þbrÞ�T ARN0�nh : ð11ÞThe solution W0 is given
by
W0 ¼ K �10 HT0Y0 ð12Þwhere K0 ¼HT0H0þλI. Suppose given another
new data x1, theproblem becomes
minW1
���������� H0H1" #
W1�Y0Y1
" #����������2
: ð13Þ
The solution can be derived as
W1 ¼W0þK �11 HT1ðY1�H1W0ÞK1 ¼ K0þHT1H1:Based on the above, a
generalized recursive algorithm for updatingthe least-squares
solution can be computed as follows [17]
Mkþ1 ¼Mk�MkHTkþ1ðIþHkþ1MkHTKþ1Þ�1Hkþ1Mk ð14Þ
Wkþ1 ¼WkþMkþ1HTkþ1ðYkþ1�Hkþ1WkÞ ð15Þwhere M represents the
covariance of the parameter estimate.
3. Stochastic gradient based ELM algorithm
In this section, we propose an extension of the ELM algorithmfor
online learning using stochastic gradient descent (SGD).
Sto-chastic gradient descent methods have been popular for
severaldecades for online learning but practically limited because
of pooroptimization characteristics (failure to converge to an
absoluteminimum, for instance) and slow convergence rates.
However,only recently, the asymptotic behavior of SGD methods has
beenanalyzed indicating that SGD methods can be very powerful
forlearning large data sets [31,32]. SGD based algorithms have
beendeveloped successfully for perceptron models, K-means, SVM
andLasso [31]. In this work, we propose to use the simple and
scalableSGD algorithm to extreme learning machines and derive
stabilityproperties, so that an online learning algorithm useful
for controlpurposes can be developed.
The justification of SGD based algorithms in machine learningcan
be briefly discussed as follows. In any learning problem,
threetypes of errors are encountered, namely the approximation
error,the estimation error and the optimization error [31], and
the
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎4
expected risk Eexpðf Þ and the empirical risk Eempðf Þ for a
supervisedlearning problem can be given by
Eexpðf Þ ¼Z
lðf ðxÞ; yÞdPðx; yÞ
Eempðf Þ ¼1N
XNi ¼ 1
lðf ðxiÞ; yiÞ
where lðf ðxÞ; yÞ denotes the loss function between the
predictionf ðxÞ and label y, Pðx; yÞ denote the joint probability
density of x andy. Let f n ¼ argminf Eexpðf Þ be the best possible
prediction function.In practice, the prediction function is chosen
from a family ofparametric functions denoted by F . Let f nF ¼
argminf AFEexpðf Þ bethe best prediction function chosen from a
parameterized familyof functions F . When a finite training data
set becomes available,the empirical risk becomes a proxy for the
expected risk for thelearning problem [33]. Let f
n
F ¼ argminf AFEempðf Þ be the solutionthat minimizes the
empirical risk. However, the global solution isnot typically
obtained because of computational limitations andhence the solution
of the learning problem is reduced to findingf F ¼ argminf AFEempðf
Þ.
Using the above setup, the approximation error ðEappÞ is
theerror introduced in approximating the true function space with
afamily of functions F , the estimation error ðEestÞ is the
errorintroduced in optimizing over Eempðf Þ instead of Eexpðf Þ,
the opti-mization error ðEoptÞ is the error induced as a result of
stopping theoptimization to f F . The total error Etot can be
expressed as
Eapp ¼ Eexpðf nÞ�Eexpðf nF ÞEest ¼ Eexpðf nF Þ�Eempðf
n
F ÞEopt ¼ Eempðf
n
F Þ�Eempðf F ÞEtot ¼ EappþEestþEopt
The following observations are taken from the asymptoticanalysis
of SGD algorithms [31,34].
1. The empirical risk Eempðf Þ is only a surrogate for the
expectedrisk Eexpðf Þ and hence an increased effort to minimize
Eopt maynot translate to better learning. In fact, if Eopt is very
low, there isa good chance that the prediction function will
over-fit thetraining data.
2. SGD are worst optimization algorithms (in terms of
reducingEopt) but they minimize the expected risk relatively
quickly.Therefore, in the large scale setup, when the limiting
factor iscomputational time rather than the number of examples,
SGDalgorithms perform asymptotically better.
3. SGD results in a faster convergence when the loss function
hasstrong convexity properties.
The last observation is key in developing our algorithm basedon
ELM models. The ELM models have a squared loss function andwhen the
hidden neurons are randomly assigned and fixed, thetraining
translates to solving a convex optimization problem. Thismotivates
the use of ELM models for performing SGD basedlearning. The SGD
based algorithm can be derived for the ELMmodels as follows.
3.1. SG-ELM parameter update
Let ðxi; yiÞ where i¼ 1;2 ,.. N be the streaming data in
con-sideration. The data can be considered to be available to
thealgorithm from a one-by-one continuous stream or
artificiallysampled one-by-one from a very large data set. Let the
ELM
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
empirical risk be defined as follows
JðWÞ ¼minW
12
XNi ¼ 1
Jyi�ϕTi W J2
¼minW
12Jy1�ϕT1W J2þ⋯þ
12JyN�ϕTNW J2
� �¼min
WJ1ðWÞþ J2ðWÞþ⋯þ JNðWÞ
� �: ð16Þ
where WARnh�yd , yiAR1�yd ϕARnh�yd is the hidden layer
output
(see HT in Eq. (3)). If an error eiAR1�yd can be defined
asðyi�ϕTi WÞ, the learning objective for a data observation i can
begiven by
JiðWÞ ¼12eTi ei
¼ 12ðyi�ϕTi WÞT ðyi�ϕTi WÞ
¼ 12yTi yiþ
12WTϕiϕ
Ti W�yTi ϕTi W
∂Ji∂W
¼ϕiϕTi W�ϕiyi ¼ϕiðϕTi W�yiÞ¼ �ϕiei: ð17Þ
In a regular gradient descent (GD) algorithm, the gradient of
JðWÞis used to update the model parameters as follows.
∂J∂W
¼ ∂J1∂W
þ ∂J2∂W
þ⋯þ ∂JN∂W
) ∂J∂W
¼ �ϕ1e1�ϕ2e2�⋯�ϕNeN
Wkþ1 ¼Wk�ΓSG∂J∂W
¼WkþΓSGðϕ1e1Þþ⋯þΓSGðϕNeNÞ ð18Þwhere k is the iteration count,
ΓSGARnh�nh represents the step sizeor update gain matrix for the GD
algorithm.
It can be seen from Eq. (18) that the parameter matrix W
isupdated based on gradients calculated from all the
availableexamples. If the number of data observations is large, the
gradientcalculation can take enormous computational effort. The
stochas-tic gradient descent algorithm considers one example at a
timeand updatesW based on gradients calculated from ðxi; yiÞ as
shownin
Wiþ1 ¼WiþΓSGðϕieiÞ: ð19ÞFrom Eq. (18), it is clear that the
optimal W is a function of gra-dients calculated from all the
examples. As a result, as more databecomes available, W converges
close to its optimal value in SGDalgorithm. Processing data
one-by-one significantly reduces thecomputational requirement
making the algorithm scale well tolarge data sets.
In order to handle class imbalance, the algorithm in (19) can
bemodified by weighting the minority class data more. The
modifiedalgorithm can be expressed as
Wiþ1 ¼WiþΓimbΓSGðϕieiÞ ð20Þwhere Γimb ¼ r � f s, r and f s
represent the imbalance ratio (arunning count of majority class
data to minority class data untilthat instant) and the scaling
factor that needs to be tuned to obtaintradeoffs between high false
positives and missed detections for agiven application.
3.2. Stability analysis
The stability analysis of the SG-ELM algorithm can be derivedas
follows. The ELM structure makes the analysis simple andsimilar to
that of a linear gradient based algorithm [35].
The instantaneous prediction error ei (Here the error e
andoutput y are transposed as opposed to their previous definition
in
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
Table 1Specifications of the experimental HCCI engine.
Engine type 4-stroke In-lineFuel GasolineDisplacement 2.0
LBore/stroke 86/86 mmcompression ratio 11.25:1Injection type Direct
injection
Variable valve timing withhydraulic cam phaser having
Valvetrain 119° constant duration,defined at 0.25 mm lift, 3.5
mm peak.lift and 50° crank angle, andphasing authority.
HCCI strategy Exhaust recompressionusing negative valve
overlap
1 IMEP and NMEP has been interchangeably used in this paper
although bothrefers to the net quantity.
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 5
Section 3.1 for ease of derivations) can be expressed in terms
of theparametric error ð ~W ¼Wn�WÞ asei ¼ yi�WTϕi¼WT
nϕi�WTϕi
¼ ~WTϕi ð21Þwhere Wn represents true model parameters. Further,
the para-metric error dynamics can be obtained as follows.
~Wiþ1 ¼Wn�Wiþ1¼Wn�Wi�ΓSGϕieTi¼ ~Wi�ΓSGϕieTi ð22Þ
Consider the following positive definite, decrescent and
radiallyunbounded [35] Lyapunov function V
Vð ~W Þ ¼ trð ~WTΓ�1SG ~W Þ ð23Þwhere tr represents the trace of
a matrix.
ΔVð ~WiÞ ¼ V ð ~Wiþ1Þ�Vð ~WiÞ¼ trð ~WTiþ1Γ�1SG ~Wiþ1Þ�trð ~W
Ti Γ
�1SG
~WiÞ¼ trðð ~Wi�ΓSGϕieTi ÞTΓ�1SG ð ~Wi�ΓSGϕieTi ÞÞ�trð ~WTi Γ�1SG
~WiÞ
¼ trð�2 ~WTi ϕieTi þeiϕTi ΓSGϕieTi Þ¼ trð�2eieTi þeiϕTi ΓSGϕieTi
Þ¼ �2eTi eiþeTi eiϕTi ΓSGϕi¼ �2eTi eiþeTi ϕTi ΓSGϕiei¼ �eTi MSGei
ð24Þ
where MSG ¼ 2�ϕTi ΓSGϕi. It can be seen that Viþ1�Vir0 if MSG40
or 2�ϕTi ΓSGϕi40 or0oλmaxðΓSGÞo2 ð25ÞWhen (25) is satisfied, Vð ~W
ÞZ0 is non-increasing in i and thelimit
limk-1
Vð ~W Þ ¼ V1 ð26Þ
exists. From (24),
Viþ1�Vi ¼ �eTi MSGeiX1i ¼ 0
ðViþ1�ViÞ ¼ �X1i ¼ 0
eTi MSGei ð27Þ
)X1i ¼ 0
eTi MSGei ¼ V ð0Þ�V1o1 ð28Þ
Also,
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
X1i ¼ 0
eTi IeirX1i ¼ 0
eTi MSGeio1 ð29Þ
when MSG4 I or when
λmaxðΓSGÞo1: ð30ÞHence, when (30) is satisfied, eiAL2. From
(19), ðWiþ1�WiÞAL2 \ L1. Using discrete time Barbalat's lemma
[36],limi-1
ei ¼ 0 ð31Þ
limi-1
Wiþ1 ¼Wi ð32Þ
Hence, the SGD learning law in (19) guarantees that the
esti-mated output ŷi converges to the actual output yi and the
modelparameters W converge to some constant values. The
parametersconverge to the true parameters Wn only under conditions
ofpersistence of excitation [35] in input signals of the
system(amplitude and frequency richness of x). Further, using
bounded-ness of Vi, eiAL1 which guarantees that the online model
pre-dictions are bounded as long as the system output is bounded.
Asthe error between the true model and the estimation
modelconverges to zero, the estimation model becomes a
one-stepahead predictive model of the nonlinear system. In the next
fewsections, we evaluate our SG-ELM algorithm on a HCCI
engineidentification problem.
4. Homogeneous charge compression ignition engine
This section gives an overview of the homogeneous
chargecompression ignition engine system and experimentation.
Theengine specifications are listed in Table 1 [11]. HCCI is
achieved byauto-ignition of the gas mixture in the cylinder. The
fuel is injectedearly in the intake stroke and given sufficient
time to mix with airforming a homogeneous mixture. A large fraction
of exhaust gasfrom the previous cycle is retained to elevate the
temperature andreaction rates of the fuel and air mixture. The
variable valve timingcapability of the engine enables trapping
suitable quantities ofexhaust gas in the cylinder.
The engine control knobs include injected fuel mass (FM in
mg/cyc), crank angle at intake valve opening (IVO), crank angle
atexhaust valve closing (EVC), crank angle at start of fuel
injection(SOI). The valve events are measured in degrees after
exhaust topdead center (deg eTDC) while SOI is measured in degrees
aftercombustion top dead center (deg cTDC). Other important
physicalvariables that influence the performance of HCCI
combustioninclude intake manifold temperature Tin, intake manifold
pressurePin, mass flow rate of air at intake _min, exhaust gas
temperatureTex, exhaust manifold pressure Pex, coolant temperature
Tc , fuel toair ratio (FA) etc. The engine performance metrics are
given bycombustion phasing indicated by the crank angle at 50%
massfraction burned (CA50), combustion work output given by
netindicated mean effective pressure (IMEP1). For further reading
onHCCI combustion and related variables, please refer [37].
4.1. Experiment design
In order to perform HCCI engine identification,
suitableexperiments to obtain dynamic data from the engine needs to
bedesigned. The modeled variables such as engine states and
oper-ating envelope are dynamic variables. In order to capture
bothtransient and steady state behavior, a set of dynamic
experiments
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
5200 5400 5600 5800 6000
0
2
4
IME
P (b
ar)
5200 5400 5600 5800 6000
−200
20406080
CA
50 (d
eg a
TDC
)5200 5400 5600 5800 6000
0
5
10
Rm
ax (b
ar/d
eg)
5200 5400 5600 5800 6000
1
1.5
2
λ (−
)
5200 5400 5600 5800 60007
8
9
10
11
Fuel
Mas
s (m
g/cy
c)
5200 5400 5600 5800 6000
80
100
120
IVO
(deg
aC
TDC
)
5200 5400 5600 5800 6000
−110
−100
−90
−80
EV
C (d
eg a
CTD
C)
5200 5400 5600 5800 6000
280
300
320
340
360
SO
I (de
g aE
TDC
)
Fig. 1. A subset of the HCCI engine experimental data showing
A-PRBS inputs and engine outputs. The misfire regions are shown in
dotted rectangles. The data is indexed bycombustion cycles.
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎6
is conducted at constant rotational speeds and naturally
aspiratedconditions (no supercharging/turbocharging) by varying FM,
IVO,EVC and SOI in a uniformly random manner. At every step
change,the engine makes a transition between two set points and
thetransition is recorded as temporal data. In order to capture
severalsuch transients, an amplitude modulated pseudo-random
binarysequence (A-PRBS) has been used to design the excitation
signalsfor FM, IVO, EVC and SOI. A-PRBS excites the engine at
differentamplitudes and frequencies exploring the operating space
of theengine for the identification problem considered in this
work.
4.2. HCCI instabilities
A subset of the data collected from the engine is shown in Fig.
1[12] where it can be observed that for some combinations of
theinputs (left figures), the HCCI engine misfires (shown by
dottedrectangles). HCCI operation is limited by several phenomena
thatlead to undesirable engine behavior. As described in [38], the
HCCIoperating range is constrained to a small region of
permissibleunburned (pre-combustion) and burned (post-combustion)
chargetemperature states. A sufficiently high unburned gas
temperaturesare required to achieve ignition in the HCCI operating
rangewithout which a complete misfire will occur. If the
resultingcombustion cannot achieve sufficiently high burned gas
tem-peratures, commonly occurring in conditions with low fuel
todiluent ratios or late combustion phasing, various degrees
ofquenching can occur resulting in reduced work output andincreased
hydrocarbon and carbon monoxide emissions. Undersome conditions,
this may lead to high cyclic variation due to thepositive feedback
loop existing through the trapped residual gas
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
[15,16]. Operation with high burned gas temperature,
althoughstable and commonly reached at higher fueling rates where
thefuel to diluent ratio is also high, yields high heat release and
thuspressure rise rates that may pose challenges for engine noise
anddurability constraints. A discussion of the temperatures at
whichthese phenomena occur may be found in [38].
HCCI operation is limited by a combination of the
aboveinstabilities and during transients, it may be challenging to
reac-tively respond to instabilities. Thus, a proactive means by
whichsuch instabilities can be determined is by developing a
predictivemodel for the operating envelope of the engine discussed
in Sec-tion 6.
4.3. Learning the HCCI engine data
In the HCCI modeling problem, both the inputs and the outputsof
the engine are available as sensor measurements. The HCCIengine is
a nonlinear dynamic system and sensor measurementsrepresent
discrete time sequences. The input–output mapping canbe modeled
using a nonlinear auto regressive model with exo-genous input
(NARX) [39] as follows:
yðkÞ ¼ f NARX ½uðk�1Þ ,.., uðk�nuÞ;yðk�1Þ ,.., yðk�nyÞ� ð33Þ
where uðkÞARud and yðkÞARyd represent the inputs and outputs
ofthe system respectively, k represents the discrete time index, f
NARXð:Þ represents the nonlinear function mapping specified by
themodel, nu, ny represent the number of past input and
outputsamples required (order of the system) while ud and yd
representthe dimension of inputs and outputs respectively. Let x
represent
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 7
the augmented input vector obtained by time-embedding theinput
and output measurements from the system.
x¼ ½uðk�1Þ ,.., uðk�nuÞ; yðk�1Þ ,.., yðk�nyÞ�T ð34Þ
The measurement sequence can be converted to the form oftraining
data
fðx1; y1Þ;…; ðxN ; yNÞgA X ;Yð Þ ð35Þwhere N denotes the number
of training samples, X denotes thespace of the input features (Here
X ¼Rudnu þydny and Y ¼R forregression and Y ¼ ½þ1; �1� for a binary
classification). The aboveconversion of system measurements to
training data is natural fora series-parallel model architecture,
that can be used to perform aone-step ahead prediction (OSAP) i.e.,
given a set of measurementsuntil time index k, the model predicts
the output at time kþ1 (seeEq. (36)). A parallel architecture on
the other hand is used toperform multiple step ahead predictions
(MSAP)2 by feeding backthe predictions of the OSAP model in a
recurrent manner (see Eq.(37)). The series-parallel and parallel
architectures are wellexplained in [40].
ŷðkþ1Þ ¼ f̂ NARX ½uðkÞ ,.., uðk�nuþ1Þ; yðkÞ ,.., yðk�nyþ1Þ�
ð36Þ
ŷðkþnpredÞ ¼ f̂ NARX uðkþnpred�1Þ ,.., uðk�nuþnpredÞ;
ŷðkþnpred�1Þ ,.., ŷðk�nyþnpredÞ
ð37Þ
The OSAP model is used for training as existing simple
trainingalgorithms can be used and once the model becomes accurate
forOSAP, it can be converted to a MSAP model in a
straightforwardmanner. The MSAP model can be used for making long
termpredictions useful for predictive control [5,41].
5. Application case study 1: online regression learning
forsystem identification of an hcci engine.
The problem considered in this case study is to develop
apredictive model of the state variables of the HCCI engine
usingonline learning. The state variables of an engine are the
funda-mental quantities that represent the engine's state of
operation. Asa consequence, these variables also influence the
performance ofthe engine such as fuel efficiency, emissions and
stability, and arerequired to be monitored/regulated carefully. The
significance ofthe state variables for control and a data based
modeling approachwere recently analyzed [11,14] where an offline
model wasdeveloped using archived data. In this paper, an online
learningframework for modeling the state variables of HCCI engine
such asthe IMEP and CA50, is developed and is shown to be
comparable tothat of the offline models.
The proposed SG-ELM is applied and is compared against OS-ELM
for regression performance evaluated using one-step aheadand
multi-step ahead predictions. A linear model and an offlinetrained
nonlinear ELM model similar to the one in [11] are inclu-ded as
baselines. The linear baseline model is included to justifythe
benefits of adopting a nonlinear model while the offlinetrained
model is included to show the effectiveness of onlinealgorithms in
capturing the underlying behavior fully in spite ofprocessing data
sequentially. The offline ELM model is expected toproduce an
accurate model as it has sufficient time, computationand
utilization of all training data simultaneously to learn the
HCCIbehavior sufficiently well.
2 MSAP predictions are necessary for planning trajectories for a
given engineoperation, such as in the case of optimal control.
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
5.1. Model setup and evaluation metric
For the purpose of demonstration, the variables IMEP and CA50are
considered as outputs whereas the control variables such asfueling
(FM), exhaust valve closing (EVC) and fuel injection timing(SOI)
are considered inputs. Transient data from the HCCI engineat a
constant speed of 1800 RPM and naturally aspirated condi-tions is
used. A NARX model as shown in Section 4.3 is consideredwhere u¼
½FM EVC SOI�T and y¼ ½IMEP CA50�T , nu and nychosen as 1 (tuned by
trial and error). The nonlinear modelapproximating f NARX is
initialized to an extreme learning machinemodel with random input
layer weights and random values for thecovariance matrices and
output layer weights.
All the nonlinear models consist of 100 hidden units with
fixedrandomized input layer parameters. About 11,000 cycles of data
isconsidered one-by-one as it is sampled by the engine
dataacquisition and model parameters updated in a sequential
manner.After the training phase, the parameter update is switched
off andthe models are evaluated for the next 5100 cycles of data
for onestep ahead predictions. Further, to evaluate if the learned
modelsrepresent the actual HCCI dynamics, the multi-step ahead
pre-diction of the models are compared using about 600 cycles of
data.It should be noted that both the one-step ahead and
multi-stepahead evaluations are done using data unseen during
thetraining phase.
The parameters of each of the models are tuned for the
givendataset. As recommended by OS-ELM [17], the model is
initializedprior to online learning using about 800 cycles of data
(see Eqs.(14) and (15)). The initialization was performed using the
batchELM algorithm [30]. In order to have a fair comparison, the W0
isused as an initial condition for both OS-ELM and SG-ELM. The
onlyparameter of SG-ELM, namely the gradient step size was tuned
tobe ΓSG ¼ 0:0008 I100 for best accuracy.
The performance of the models are measured using normalizedroot
mean squared error (RMSE) given by
RMSE¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1n
Xni ¼ 1
Xydj ¼ 1
ðyij� ŷijÞ2
vuut ð38Þwhere both yij and ŷ
ij are normalized to lie between -1 and þ1.
5.2. Results and discussion
On performing online learning, it can be observed from Fig.
2that the parameters of OS-ELM grow more aggressively as com-pared
to the SG-ELM. In spite of both models having the sameinitial
conditions, the step size parameter ΓSG for SG-ELM givesadditional
control over the parameter growth and keeps thembounded as shown in
Section 3.2. On the other hand, OS-ELM doesnot have any control
over the parameter evolution. It is governedby the evolution of the
co-variance matrix M (see Eq. (14)). It isexpected that the
co-variance matrix M would add stability to theparameter evolution
but in practice, it tends to be more aggressiveespecially when
having correlated data and over-parameterizedmodels, leading to
potential instabilities as reported in[24,25,20,26,27]. As a
consequence, the parameter values for SG-ELM remain small compared
to the OS-ELM (the norm of esti-mated parameters for OS-ELM is
16.64 and SG-ELM is 3.71). Thishas a significant implication in the
statistical learning theory [33].A small norm of model parameters
implies a simpler model whichresults in good generalization.
Although this effect is slightlyreflected in the results summarized
in prediction results sum-marized in Table 2 (see SG-ELM having the
lowest MSAP RMSE), itis not significantly better for this problem
possibly because ofinsufficient data for convergence. The value of
ΓSG has to be tunedcorrectly along with sufficient training data in
order to ensure
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
0 2000 4000 6000 8000 10000 12000−5
0
5Parameter convergence of OS−ELM
0 2000 4000 6000 8000 10000 12000−2
−1
0
1Parameter convergence of SG−ELM
Engine Cycles
8000 9000 10000
−1
−0.5
0
0.5
1
zoomed−in
0 5000 10000
0.130.140.150.160.170.18
zoomed−in
Engine CyclesFig. 2. Comparison of parameter evolution for the
OS-ELM and SG-ELM algorithms during online learning (each engine
cycle corresponds to a sample of engine dataprocessed by the
models). A zoomed-in plot (figures to the right) shows that the
parameter update for OS-ELM is more aggressive compared to SG-ELM.
Although both OS-ELM and SG-ELM parameters are initialized to the
same values, the parameters of OS-ELM continue to grow aggressively
compared to the SG-ELM. Note that small parametervalues indicate
good regularization. This plot gives a qualitative visualization of
this behavior.
Table 2Performance comparison of OS-ELM and SG-ELM for the HCCI
online regressionlearning problem. A baseline linear model and an
offline trained ELM model (O-ELM) are also included for comparison.
The offline O-ELM algorithm has access toall the data and can use
the available memory and computational power, so itstraining time
is not compared with the online algorithms. The RMSE values
areaveraged over 100 different trials.
training OSAP MSAP
Time3 in s RMSE RMSE
Linear 1.0111 0.1277 0.1859OS-ELM 9.6563 0.0952 0.1018SG-ELM
1.5624 0.1050 0.0955O-ELM – 0.1018 0.1027
4 For the pairwise t-test, the proposed SG-ELM is compared with
the com-peting OS-ELM algorithm and a statistical test is performed
with the nullhypothesis being that the two algorithms are not
statistically significant. About 100
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎8
parameter convergence. Ultimately, the online learning
mechan-ism is aimed to run along with the engine and hence the
slowconvergence may not be an issue in a real application.
The prediction results as well as training time3 for the
onlinemodels are compared in Table 2 where each algorithm is run
for100 trials and the average RMSE is reported. It can be
observedthat the computational time for SG-ELM is significantly
less (about6.2 times) compared to OS-ELM showing the time gain in
elim-inating the covariance estimation step. The reduction in
compu-tation is expected to be more pronounced as the dimension
andcomplexity of the data increase. It could be seen from Table 2
thatthe one-step ahead prediction accuracies (OSAP RMSE) of
thenonlinear models are similar, and OS-ELM winning marginally.
Onthe other hand, the multi-step prediction accuracies (MSAP
RMSE)are similar for the nonlinear models with SG-ELM
performingmarginally better. The MSAP accuracy reflect the
generalizationperformance of the model and is more crucial for the
modeling
3 The training time is the time taken for training without
considering tuning ofhyper-parameters such as number of hidden
neurons, etc.
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
problem as the models ultimately feed its prediction to a
pre-dictive control framework that requires accurate and robust
pre-dictions of the engine several steps ahead of time. From
ourunderstanding on model complexity and generalization error,
amodel that is less complex (indicated by minimum norm ofparameters
[30,33]) tend to generalize better, which is againdemonstrated by
SG-ELM. The performance of the linear baselinemodel is
significantly low compared to the nonlinear models jus-tifying the
need for nonlinear models for the HCCI system. In orderto show that
the results of SG-ELM are statistically significant withrespect to
OS-ELM, a pairwise t-test is performed [42,43] using
100bootstrapped sub-sample instances4. The p-values of the
pairwiset-test for the OSAP RMSE and MSAP RMSE are 2.9632E�96
and3.8491E�76, indicating that the results of the SG-ELM is
statisti-cally significant from that of the OS-ELM with a very low
sig-nificance level (high probability that the two algorithms are
sta-tistically significant).
The MSAP predictions of the models are summarized in Fig.
3(a)–(d) where model predictions for NMEP and CA50 are com-pared
against real experimental data. Here the model is initializedusing
the experimental data at the first instant and allowed tomake
predictions recursively for several steps ahead. It can be seenthat
the nonlinear models outperform the linear model and at thesame
time the online learning models perform similar to the off-line
trained models indicating that online learning can fullyidentify
the engine behavior at the operating condition where thedata is
collected. It should be noted that this task is a case of
multi-
trials were performed with a subset of data being sampled with
replacement fromthe training data set for both algorithms and the
OSAP RMSE and MSAP RMSE weremeasured. Using the 100 values of OSAP
and MSAP RMSE for both SG-ELM and OS-ELM, the pairwise t-test was
carried out.
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
0 100 200 300 400 500 6002.5
3
3.5
Engine Cycles
NM
EP
PredictedActual
0 100 200 300 400 500 600−15
−10
−5
0
Engine Cycles
CA
50
OS-ELM MSAP Prediction
0 100 200 300 400 500 6002.5
3
3.5
Engine Cycles
NM
EP
PredictedActual
0 100 200 300 400 500 600−15
−10
−5
0
Engine Cycles
CA
50
SG-ELM MSAP Prediction
0 100 200 300 400 500 6002.5
3
3.5
Engine Cycles
NM
EP
PredictedActual
0 100 200 300 400 500 600−15
−10
−5
0
Engine Cycles
CA
50
O-ELM MSAP Prediction
0 100 200 300 400 500 6002
2.5
3
3.5
Engine Cycles
NM
EP
PredictedActual
0 100 200 300 400 500 600−20
−15
−10
−5
0
Engine Cycles
CA
50
Linear MSAP Prediction
Fig. 3. (a) OS-ELM MSAP prediction, (b) O-ELM MSAP prediction,
(c) Linear MSAP prediction. 600-step ahead prediction results of
OS-ELM, SG-ELM, O-ELM and the linearmodel are compared. The OS-ELM,
SG-ELM and linear models are learned online using 11,000 cycles of
data while the O-ELM model is trained using the same data but in
anoffline manner. The 600-step ahead prediction is performed on an
unseen dataset. For each of the models, NMEP and CA50 predictions
are compared to the experimentallyrecorded values. It has to be
noted that the control inputs at the 600 cycles are used while NMEP
and CA50 are recurrently fed back to the model to perform
multi-step aheadpredictions.
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 9
input multi-output modeling which adds some limitations to
theSG-ELM methods. When the model complexity increases, the SG-ELM
may require more excitations for convergence, as opposed toOS-ELM
which converges more aggressively (although possiblylosing
stability). Further, tuning the learning rate ΓSG may be
time-consuming for models predicting multiple outputs with
differentnoise characteristics and stability requirements.
6. Application case study 2: online classification learning
(withclass imbalance) for identifying the dynamic operating
envel-ope of an HCCI engine
The problem considered in this case study is to model thedynamic
operating envelope of the HCCI engine using onlinelearning. The
dynamic operating envelope of an engine can bedefined as the stable
operating space of the engine. The sig-nificance of the operating
envelope and data based modelingapproaches were recently introduced
[13] where an offline modelwas developed using archived data. In
this paper, an onlinelearning framework for modeling the operating
envelope of HCCIengine is developed is shown to be as good as the
offline models indetermining the HCCI operating envelope.
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
We consider the operating envelope defined by two commonHCCI
unstable modes – a complete misfire and a high
variabilitycombustion (a more detailed description is given in
Section 4.2) isstudied. The problem of identifying the HCCI
operating envelopeusing experimental data can be posed as a binary
classificationproblem. The engine sensor data can be labeled as
being stable orunstable depending on some engine based heuristics
[13]. Further,the engine dynamic data consists of a large number of
stable classdata compared to unstable class data, which introduces
a classimbalance. A cost-sensitive approach that modifies the
objectivefunction of the learning system to weigh the minority
class datamore heavily, is preferred over under-sampling and
over-samplingapproaches [13] and is used in this study.
The proposed SG-ELM is applied and is compared against OS-ELM
for classification performance. A linear classification modeland an
offline trained nonlinear ELM model similar to the one in[13] are
included as baselines to make similar justifications as inthe
previous case study. The linear baseline model is included
tojustify the benefits of adopting a nonlinear model while the
offlinetrained model is included to show the effectiveness of
onlinealgorithms in capturing the underlying behavior fully in
spite ofprocessing data sequentially.
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
Table 3Performance comparison of the nonlinear models (OS-ELM
and SG-ELM) for theonline class imbalance learning problem. A
baseline linear model and an offlinetrained ELM model (O-ELM) are
also used for comparison. The O-ELM results areincluded for
comparing classification accuracies. The offline O-ELM algorithm
hasaccess to all the data and can use the available memory and
computational power,so its training time is not compared with the
online algorithms. All values reportedare averaged over 100
different trials.
Algorithms Training TPR TNR Total GM
time6 in s accuracy accuracy
Linear 1.8648 0.9996 0.5665 0.7830 0.7524OS-ELM 1.8443 0.7000
0.8713 0.7857 0.7791SG-ELM 1.6011 0.8605 0.6923 0.7764 0.7714O-ELM
– 0.7530 0.8426 0.7978 0.7947
Table 4Comparison of best performance of the nonlinear models
after fine tuning theparameters and with a good set of initial
conditions. These models are used for finalprediction of the
operating envelope of the HCCI engine.
Algorithms Training TPR TNR Total GM
Time6 in s Accuracy Accuracy
OS-ELM 1.8374 0.8328 0.8341 0.8335 0.8335SG-ELM 0.9822 0.9876
0.7707 0.8792 0.8725O-ELM - 0.8265 0.8569 0.8417 0.8416
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎10
6.1. Model setup and evaluation metric
The HCCI operating envelope is a function of the engine
controlinputs and engine physical variables such as temperature,
pres-sure, flow rate, etc. Also, the envelope is a dynamic system
and soa predictive model requires the measurement history up to
anorder of Nh. The dynamic classifier model can be given by
ŷkþ1 ¼ sgnðf ðxkÞÞ ð39Þwhere sgnð:Þ represents the sign
function, ŷkþ1 indicates modelprediction for the future cycle kþ1,
f ð:Þ can take any structuredepending on the learning algorithm and
xk is given by
xk ¼ IVO; EVC; FM; SOI; Tin; Pin; _min;½Tex; Pex; Tc; FA;
IMEP;CA50�T ð40Þ
at cycle k up to cycle k�Nhþ1. In the following sections,
thefunction f ð:Þ is learned using the available engine
experimentaldata using OS-ELM and SG-ELM algorithms. The engine
measure-ments and their time histories (defined by xk) are
consideredinputs to the model while the stability labels are
considered out-puts. The feature vector is of dimension n¼ 39
includes sensormeasurements such as FM, IVO, EVC, SOI, Tc, Tin,
Pin, _min, Tex, Pex,IMEP, CA50 and FA along with Nh ¼ 1 cycles of
history (see (40)).The engine experimental data is split into
training and testing sets.The training set consists of about 14300
cycles of data processedone-by-one as sampled by the engine data
acquisition. After thetraining phase, the parameter update is
switched off and themodels are evaluated for the next 6200 cycles
of data for one stepahead classification. The ratio of number of
majority class data tonumber minority class data ðrÞ for the
training set is about 4.5 andfor the testing set is 9. The
nonlinear model approximating f ð:Þ isinitialized to an extreme
learning machine model with randominput layer weights and random
values for the covariance matricesand output layer weights. All the
nonlinear models consist of 10hidden units with fixed randomized
input layer parameters.Similar to the previous case study, a small
portion of the trainingdata is used to initialize the ELM model
parameters as well as thecovariance matrix. The SG-ELM parameter
ΓSG is tuned to be 0:001I10 using trial and error. A weighted
classification version of the
algorithms is developed to handle the class imbalance
problem.The minority class data is weighted higher by r times f s
where r isthe imbalance ratio of the training data and is computed
online asthe ratio of the number of majority class to number of
minorityclass data until that instant.
For the class imbalance problem considered here, a conven-tional
classifier metric like the overall misclassification rate cannotbe
used as it would find a biased classifier, i.e., it would find
aclassifier that ignores the minority class data. For instance, a
dataset that has 95% of majority class data (with label þ1)
wouldachieve 95% classification accuracy by predicting all the
labels to
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
be þ1 which is obviously undesirable. Hence the following
eva-luation metric used for skewed data sets is considered. Let TP
andTN represent the total number of positive (stable operation)
andnegative class (unstable modes) data classified correctly by
theclassifier. If Nþ and N� represent the total number of positive
andnegative class data respectively, the true positive rate (TPR)
andtrue negative rate (TNR), geometric mean (GM) of TPR and TNR,and
the total accuracy (TA) of the classifier can be defined as
fol-lows [44]. It should be noted that the total accuracy and
geometricmean weights the accuracy of majority and minority
classesequally, i.e., they have high values only when both classes
of dataare classified correctly.
TPR¼ TPNþ
TNR¼ TNN�
GM¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiTPR�
TNR
p
TA¼ 0:5ðTPRþTNRÞ: ð41Þ
6.2. Results and discussion
The results of online imbalance classification can be
summar-ized in Table 3 where computational time as well as
classificationperformance can be compared based on 100 different
trials. It canbe observed that for the HCCI classification problem,
all themodels perform with similar average accuracies. Both OS-ELM
andSG-ELM achieve results similar to an offline model
indicatingcompleteness of learning. The SG-ELM has a slight
advantage interms of training time because of the simplicity of
SG-ELM com-pared to OS-ELM.
In the experiments above, it should be noted that for
differentinitializations of the 100 trials, the model's
hyper-parameters arenot fine-tuned and so it may be possible to
achieve better per-formance by fine-tuning. A further experiment is
conducted wherethe hyper-parameters of each algorithm are
fine-tuned for oneparticular initialization so that the best model
is identified forfurther engine controls development. Ignoring the
results of thelinear model (that had a large imbalance in TPR and
TNR similar tothe average results in Table 3), the results of the
fine-tuned non-linear models are reported in Table 4. It can be
seen that theaccuracies of all the algorithms improved
significantly with SG-ELM slightly better and with a stability
guarantee, indicates thesuitability of SGD based online learning
for the HCCI problem. Asubtle advantage observed for the OS-ELM is
that, although thecombined accuracy is slightly inferior to that of
the SG-ELM, theaccuracies of the positive examples and negative
examples arevery close to each other indicating that the model is
well balancedto predict both majority class as well as minority
class data well.The SG-ELM on the other hand, in spite of
fine-tuning the para-meters, fails to achieve this. A further
tuning can be done toimprove the accuracy of a particular class of
data, typically sacri-ficing some accuracy predicting the other. In
order to show thatthe results of SG-ELM are statistically
significant with respect to
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
0 50 100 150 200 250 300 350 400−10
0
10
20
CA
50
0 50 100 150 200 250 300 350 400
0
2
4
NM
EP
0 50 100 150 200 250 300 350 40010
20
30
Combustion Cycles
Fuel
inpu
t
OS-ELM(dataset1)
0 50 100 150 200 250 300 350 400−10
0
10
20
CA
50
0 50 100 150 200 250 300 350 400
0
2
4
NM
EP
0 50 100 150 200 250 300 350 40010
20
30
Combustion Cycles
Fuel
inpu
t
SG-ELM(dataset1)
Fig. 4. (a) OS-ELM (dataset 1) and (b) SG-ELM (dataset 1).
Online classification results of OS-ELM and SG-ELM models showing
CA50, NMEP and one input variable (fueling)for 2 different unseen
data sets. The color code indicates model prediction - green (and
red) indicate stable (and unstable) prediction by the model. The
horizontal dotted linein the NMEP plot indicates misfire limit,
dotted ellipse in CA50 plot indicates high variability instability
mode while dotted rectangle shows false alarms by
model.(Forinterpretation of the references to color in this figure
legend, the reader is referred to the web version of this
article.)
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 11
OS-ELM, a pairwise t-test is performed [42,43] using 100
boot-strapped sub-sample instances5. The p-value of the pairwise
t-testis 0.0208 which indicates that the results of the SG-ELM is
sta-tistically significant from that of the OS-ELM with a
significancelevel of 5%.
The models developed using OS-ELM and SG-ELM algorithmsare used
to make predictions on unseen engine inputs and classpredictions
are summarized in Fig. 4, while quantitative results areincluded in
Table 3. As mentioned earlier, the operating envelopeis a decision
boundary in the input space within which any inputoperates the HCCI
in a stable manner and any input outside theenvelope might operate
the engine in an unstable manner. TheHCCI state variables such as
IMEP, CA50 and engine sensorobservations such as Tin; Pin; _min;
Tex; Pex; Tc at time instant k, alongwith engine control inputs
such as FM, EVC, SOI at time instantkþ1, are given as input to the
models (see (40)). The model pre-dictions at time kþ1 are obtained.
The engine's actual response attime kþ1 is also recorded. A data
point is marked in red if themodel predicts the engine operation to
be unstable (labeled as�1) while it is marked in green if the model
predicts the datapoint to be stable (labeled as þ1). In the
figures, a dotted line inthe NMEP plot indicates the misfire limit,
a dotted ellipse in CA50plot indicates high variability instability
mode while a dottedrectangle indicates misclassified predictions by
model. To under-stand the variation of NMEP and CA50 with changes
in controlinputs, the fueling input (abbreviated as FM) is also
included in theplots. It should be noted that FM is not the only
input for pre-diction and the signals are defined as in Eq. (40)
but only thefueling input is shown in the plots owing to space
constraints.
It can be seen from the above plots that as a whole, both OS-ELM
and SG-ELM models classify the HCCI engine data fairly wellin spite
of the high amplitude noise inherent in the HCCI experi-mental
data. The data consists of step changes in FM, EVC and SOIand
whenever a ‘bad’ combination of inputs is chosen, the engineeither
misfires completely (see NMEP fall below misfire limit) orexhibits
high variability combustion (see dotted ellipses). The goal
5 For the pairwise t-test, the proposed SG-ELM is compared with
the com-peting OS-ELM algorithm and a statistical test is performed
with the nullhypothesis being that the two algorithms are not
statistically significant. About 100trials were performed with a
subset of data being sampled with replacement fromthe full data set
for both algorithms and the total accuracy was measured. Using
the100 values of total accuracy for both SG-ELM and OS-ELM, the
pairwise t-test wascarried out.
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
of this work as stated previously, is to predict if a future
HCCIcombustion event is stable or unstable based on available
mea-surements. The results summarized in Table 3 indicates that
thedeveloped models indeed accomplished the goal with a reason-able
accuracy. From Fig. 4, it is observed that the OS-ELM has someclear
false alarms in predicting stable class data (see dotted
rec-tangles in the plots) while this is not observed for SG-ELM.
This isnot surprising as the false alarm rate of SG-ELM (see Table
3) isvery low6. On the other hand, the SG-ELM has an inferior TNR.
Byadjusting the weighting factor Γimb in Eq. (20), one can achieve
arequired tradeoff between TPR and TNR as desired in
theapplication.
7. Conclusion
A stochastic gradient descent based online learning algorithmfor
ELM has been developed, that guarantees stability in
parameterestimation suitable for control purposes. Further, the
SG-ELMdemands less computation compared to the OS-ELM algorithm,as
the covariance estimation step is eliminated. A stability proof
isdeveloped based on Lyapunov approach. However, the
SG-ELMalgorithm might involve tedious tuning of step-size parameter
aswell as suffer from slow convergence. The tuning of
step-sizeparameter and convergence properties of SG-ELM will be
con-sidered for future work.
The SG-ELM and OS-ELM algorithms are applied to developonline
models for state variables and dynamic operating envelopeof a HCCI
engine to assist in model based control. The results fromthis paper
suggest that good generalization performance can beachieved using
both OS-ELM and SG-ELM methods but the SG-ELM might have an
advantage in terms of stability, crucial fordesigning robust
control systems.
Although the SG-ELM appears to perform well in the
HCCIidentification problem, a comprehensive analysis and
evaluationon several benchmark data sets is required and will be
consideredfor future. From an application perspective, interesting
areas forexploration include implementing the algorithm in
real-time
6 The false alarm rate of SG-ELM is about 1.24%. In this study,
the label of �1corresponds to a ‘bad’ data and so false alarm rate
corresponds to false negativerate which is 1-TPR
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎12
hardware, exploring a wide operating range of HCCI operation
anddevelopment of controllers.
Disclaimer
This report was prepared as an account of work sponsored byan
agency of the United States Government. Neither the UnitedStates
Government nor any agency thereof, nor any of theiremployees, makes
any warranty, express or implied, or assumesany legal liability or
responsibility for the accuracy, completeness,or usefulness of any
information, apparatus, product, or processdisclosed, or represents
that its use would not infringe privatelyowned rights. Reference
herein to any specific commercial pro-duct, process, or service by
trade name, trademark, manufacturer,or otherwise does not
necessarily constitute or imply its endor-sement, recommendation,
or favoring by the United States Gov-ernment or any agency thereof.
The views and opinions of authorsexpressed herein do not
necessarily state or reflect those of theUnited States Government
or any agency thereof.
Acknowledgements
This material is based upon work supported by the Departmentof
Energy [National Energy Technology Laboratory] under Awardnumber(s)
DE-EE0003533. This work is performed as a part of theACCESS project
consortium (Robert Bosch LLC, AVL Inc., EmitecInc.) under the
direction of PI Hakan Yilmaz, Robert Bosch, LLC.Prof. X. Nguyen is
supported in part by NSF Grants CCF-1115769and ACI-1047871.
References
[1] R. Thring, Homogeneous-charge compression-ignition engines,
SAE (1989)892068.
[2] M. Christensen, P. Einewall, B. Johansson, Homogeneous
charge compressionignition using iso-octane, ethanol and natural
gas – a comparison to sparkignition operation, in: International
Fuels & Lubricants Meeting & Exposition,Tulsa, OK, USA,
1997, (SAE paper 972874).
[3] T. Aoyama, Y. Hattori, J. Mizuta, Y. Sato, An experimental
study on premixed-charge compression ignition gasoline engine, in:
International Congress &Exposition, Detroit, MI, USA, 1996,
(SAE paper 960081).
[4] K. Chang, A. Babajimopoulos, G. A. Lavoie, Z. S. Filipi, D.
N. Assanis, Analysis ofload and speed transitions in an HCCI engine
using 1-D cycle simulation andthermal networks, SAE international,
2006.
[5] J. Bengtsson, P. Strandh, R. Johansson, P. Tunestal, B.
Johansson, Model pre-dictive Control of Homogeneous Charge
Compression Ignition (HCCI) enginedynamics, in: 2006 IEEE
International Conference on Control Applications,2006.
[6] Y. Wang, S. Makkapati, M. Jankovic, M. Zubeck, D. Lee,
control oriented modeland dynamometer testing for a
single-cylinder, heated-air HCCI engine, SAEinternational,
2009.
[7] Y. Urata, M. Awasaka, J. Takanashi, T. Kakinuma, T.
Hakozaki, A. Umemoto, Astudy of gasoline-fuelled HCCI engine
equipped with an electromagnetic valvetrain, SAE International,
2004.
[8] R. Scaringe, C. Wildman, W.K. Cheng, On the high load limit
of boostedgasoline HCCI engine operating in NVO mode, SAE, Int. J.
Engines 3 (2010)35–45.
[9] C. Chiang, C. Chen, Constrained control of homogeneous
charge compressionignition (HCCI) engines, in: 5th IEEE Conference
on Industrial Electronics andApplications (ICIEA), 2010.
[10] N. Ravi, M. J. Roelle, H. H. Liao, A. F. Jungkunz, C. F.
Chang, S. Park, J. C. Gerdes,Model-based control of HCCI engines
using exhaust recompression, in: IEEETransactions on Control
Systems Technology, 2009.
[11] V.M. Janakiraman, X. Nguyen, D. Assanis, Nonlinear
identification of a gasolineHCCI engine using neural networks
coupled with principal component ana-lysis, Appl. Soft Comput. 13
(5) (2013) 2375–2389.
[12] V. M. Janakiraman, X. Nguyen, J. Sterniak, D. Assanis, A
system identificationframework for modeling complex combustion
dynamics using support vectormachines, in: Informatics in Control,
Automation and Robotics, Vol. 283 ofLecture Notes in Electrical
Engineering, Springer International Publishing,2014, pp.
297–313.
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
[13] V.M. Janakiraman, X. Nguyen, J. Sterniak, D. Assanis,
Identification of thedynamic operating envelope of HCCI engines
using class imbalance learning,IEEE Trans. Neural Netw. Learn.
Syst. 26 (1) (2015) 98–112.
[14] V.M. Janakiraman, X. Nguyen, D. Assanis, An ELM based
predictive controlmethod for HCCI engines, Eng. Appl. Artifi.
Intell. 48 (2016) 106–118,
http://dx.doi.org/10.1016/j.engappai.2015.10.007.
[15] G.T. Kalghatgi, R.A. Head, Combustion limits and efficiency
in a homogeneouscharge compression ignition engine, Int. J. Engine
Res. 7 (2006) 215–236.
[16] M. Shahbakhti, C.R. Koch, Characterizing the cyclic
variability of ignitiontiming in a homogenous charge compression
ignition engine fueled withN-heptane/iso-octane blend fuels, Int.
J. Engine Res. 9 (2008) 361–397.
[17] N. Liang, G. Huang, P. Saratchandran, N. Sundararajan, A
fast and accurateonline sequential learning algorithm for
feedforward networks, IEEE Trans.Neural Netw. 17 (6) (2006)
1411–1423.
[18] Y. Jun, M. J. Er, An enhanced online sequential extreme
learning machinealgorithm, in: Control and Decision Conference,
2008.CDC 2008. Chinese,2008, pp. 2902–2907.
[19] B. Mirza, Z. Lin, K.-A. Toh, Weighted online sequential
extreme learningmachine for class imbalance learning, Neural
Process. Lett. 38 (3) (2013)465–486.
[20] M. T. Hoang, H. Huynh, N. Vo, Y. Won, A robust online
sequential extremelearning machine, in: Advances in Neural
Networks, Vol. 4491 of LectureNotes in Computer Science, Springer
Berlin/Heidelberg, 2007, pp. 1077–1086.
[21] V. M. Janakiraman, Machine learning for identification and
optimal control ofadvanced automotive engines (Dissertation)
University of Michigan, AnnArbor,
〈http://hdl.handle.net/2027.42/102392〉.
[22] J. Kivinen, A. Smola, R. Williamson, Online learning with
kernels, IEEE Trans.Signal Process. 52 (8) (2004) 2165–2176.
[23] A. Bordes, L. Bottou, The huller: a simple and efficient
online SVM, in: InMachine Learning: ECML 2005, Lecture Notes in
Artificial Intelligence, LNAI3720, Springer Verlag, 2005, pp.
505–512.
[24] G. Zhao, Z. Shen, C. Miao, Z. Man, On improving the
conditioning of extremelearning machine: a linear case, in:
Proceedings of the 7th InternationalConference on Information,
Communications and Signal Processing, 2009,ICICS 2009., 2009, pp.
1–5.
[25] F. Han, H.-F. Yao, Q.-H. Ling, An improved extreme learning
machine based onparticle swarm optimization, in: Bio-Inspired
Computing and Applications,Lecture Notes in Computer Science.
[26] H.T. Huynh, Y. Won, Regularized online sequential learning
algorithm forsingle-hidden layer feedforward neural networks,
Pattern Recognit. Lett. 32(14) (2011) 1930–1935.
[27] V. M. Janakiraman, X. Nguyen, D. Assanis, A lyapunov based
stable onlinelearning algorithm for nonlinear dynamical systems
using extreme learningmachines, in: Neural Networks (IJCNN), The
2013 International Joint Con-ference on, 2013, pp. 1–8.
http://dx.doi.org/10.1109/IJCNN.2013.7090813.
[28] V. Akpan, G. Hassapis, Adaptive predictive control using
recurrent neuralnetwork identification, in: Proceedings of the 17th
Mediterranean Conferenceon Control and Automation, 2009. MED ’09.,
2009, pp. 61–66.
[29] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning
machine: theory andapplications, Neurocomputing 70 (2006)
489–501.
[30] G.-B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning
machine forregression and multiclass classification, IEEE Trans.
Syst. Man Cybern. Part B42 (2) (2012) 513–529.
[31] L. Bottou, Large-scale machine learning with stochastic
gradient descent, in: Y.Lechevallier, G. Saporta (Eds.),
Proceedings of COMPSTAT’2010, Physica-VerlagHD, 2010, pp.
177–186.
[32] N. Le Roux, M. Schmidt, F. Bach, A Stochastic Gradient
Method with anExponential Convergence Rate for Strongly-Convex
Optimization with FiniteTraining Sets, Tech. Rep.arXiv:1202.6258v1,
INRIA 2012.
[33] V. Vapnik, The Nature of Statistical Learning Theory,
Springer, New York, 1995.[34] S. Shalev-Shwartz, N. Srebro, SVM
optimization: inverse dependence on
training set size, in: Proceedings of the 25th International
Conference onMachine learning, ICML ’08, ACM, New York, NY, USA,
2008, pp. 928–935.
[35] P. Ioannou, J. Sun, Robust Adaptive Control, PTR
Prentice-Hall, 1996.[36] J. Spooner, M. Maggiore, R. Ordóñez, K.
Passino, Stable Adaptive Control and
Estimation for Nonlinear Systems: Neural and Fuzzy Approximator
Techni-ques, Adaptive and Learning Systems for Signal Processing,
Communicationsand Control Series, Wiley, 2004.
[37] F. Zhao, T.N. Asmus, D.N. Assanis, J.E. Dec, J.A. Eng, P.M.
Najt, Homogeneouscharge compression ignition (HCCI) engines, SAE
International (March) .
[38] G.A. Lavoie, J. Martz, M. Wooldridge, D. Assanis, A
multi-mode combustiondiagram for spark assisted compression
ignition, Combust. Flame 157 (6)(2010) 1106–1110.
[39] O. Nelles, Nonlinear System Identification: From Classical
Approaches toNeural Networks and Fuzzy Models, Springer, 2001.
[40] K.S. Narendra, K. Parthasarathy, Identification and Control
of Dynamical Sys-tems Using Neural Networks 1 (1) (1990) 4–27.
[41] L. Re, F. Allgöwer, L. Glielmo, C. Guardiola, I.
Kolmanovsky, Automotive ModelPredictive Control: Models, Methods
and Applications, Lecture Notes in Con-trol and Information
Sciences, Springer, 2010.
[42] J.-B. Yang, C.-J. Ong, Feature selection using
probabilistic prediction of supportvector regression, IEEE Trans.
Neural Netw. 22 (6) (2011) 954–962.
[43] X. Liu, L. Wang, G.-B. Huang, J. Zhang, J. Yin, Multiple
kernel extreme learningmachine, Neurocomputing 149, Part A 0 (2015)
253–264.
[44] K.-A. Toh, Deterministic neural classification, Neural
Comput. 20 (6) (2008)1565–1595.
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref1http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref1http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref8http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref8http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref8http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref8http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref11http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref11http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref11http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref11http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref13http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref13http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref13http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref13http://dx.doi.org/10.1016/j.engappai.2015.10.007http://dx.doi.org/10.1016/j.engappai.2015.10.007http://dx.doi.org/10.1016/j.engappai.2015.10.007http://dx.doi.org/10.1016/j.engappai.2015.10.007http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref15http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref15http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref15http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref16http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref16http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref16http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref16http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref17http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref17http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref17http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref17http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref19http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref19http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref19http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref19http://hdl.handle.net/2027.42/102392http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref22http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref22http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref22http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref26http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref26http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref26http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref26dx.doi.org/10.1109/IJCNN.2013.7090813http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref29http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref29http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref29http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref30http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref30http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref30http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref30http://arXiv:1202.6258v1http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref37http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref37http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref38http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref38http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref38http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref38http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref40http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref40http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref40http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref42http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref42http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref42http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref43http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref43http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref43http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref44http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref44http://refhub.elsevier.com/S0925-2312(15)01743-9/sbref44http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
-
V.M. Janakiraman et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 13
Vijay Manikandan Janakiraman received the bache-lor's degree in
Mechanical Engineering (2007) from SriVenkateswara College of
Engineering, Chennai, India.He received Master's degrees in
Mechanical Engineer-ing (2008) and Electrical Engineering – Systems
(2013)and the Ph.D. degree in Mechanical Engineering (2013)from the
University of Michigan at Ann Arbor, MI, USA.Since August 2013, he
has been working as a researchscientist with the Data Sciences
Group, IntelligentSystems Division at the NASA Ames Research
Center,Moffett Field, CA, USA. His current research
interestsinclude machine learning, data mining in high dimen-
sional time series and decision making under
uncertainty.
XuanLong Nguyen received the Ph.D. degree in com-puter science
and the M.S. degree in statistics, bothfrom the University of
California, Berkeley. He is cur-rently an Assistant Professor of
Statistics at the Uni-versity of Michigan. His research interests
lie in dis-tributed and variational inference,
nonparametricBayesian statistics, and applications to
detection/esti-mation problems in distributed and adaptive
systems.Dr. Nguyen is a recipient of the CAREER award from theNSF
Division of Mathematical Sciences, the Leon O.Chua Award from the
UC Berkeley, the IEEE SignalProcessing Society's Young Author Best
Paper award,
and an Outstanding Paper award from the International
Conference on Machine Learning.
Please cite this article as: V.M. Janakiraman, et al.,
Stochastic gradienadvanced combustion engines, Neurocomputing
(2015), http://dx.do
Dennis Assanis received the Ph.D. degree in Power andPropulsion
and the M.S. degrees in Naval Architectureand Marine Engineering
and Mechanical Engineeringfrom the Massachusetts Institute of
Technology. Dr.Assanis is a Professor in the Department of
MechanicalEngineering and is also the Provost, Senior Vice
Pre-sident for Academic Affairs, and Vice President forBrookhaven
Affairs at the Stonybrook University, NY,USA. Assanis served as the
Jon R. and Beverly S. HoltProfessor of Engineering and Arthur F.
Thurnau Pro-fessor at the University of Michigan, as well as
Directorof the Michigan Memorial Phoenix Energy Institute,
Founding Director of the US–China Clean Energy
Research Center for Clean Vehicles and Director of the Walter E.
Lay AutomotiveLaboratory. Dr. Assanis' research interests lie in
the thermal sciences and theirapplications to energy conversion,
power and propulsion, and automotive systemsdesign. His research
focuses on analytical and experimental studies of the thermal,fluid
and chemical phenomena that occur in internal combustion engines,
after-treatment systems, and fuel processors. His efforts to gain
new understanding ofthe basic energy conversion processes have made
significant impact in the devel-opment of energy and power systems
with significantly improved fuel economyand dramatically reduced
emissions. His group's research accomplishments havebeen published
in over 250 articles in journals and international conference
pro-ceedings. Dr. Assanis is a Member of the National Academy of
Engineering and is anASME and SAE Fellow.
t based extreme learning machines for stable online learning
ofi.org/10.1016/j.neucom.2015.11.024i
http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024http://dx.doi.org/10.1016/j.neucom.2015.11.024
Stochastic gradient based extreme learning machines for stable
online learning of advanced combustion enginesIntroductionExtreme
learning machinesBatch (Offline) ELMOnline Sequential ELM
(OS-ELM)
Stochastic gradient based ELM algorithmSG-ELM parameter
updateStability analysis
Homogeneous charge compression ignition engineExperiment
designHCCI instabilitiesLearning the HCCI engine data
Application case study 1: online regression learning for system
identification of an hcci engine.Model setup and evaluation
metricResults and discussion
Application case study 2: online classification learning (with
class imbalance) for identifying the dynamic operating...Model
setup and evaluation metricResults and discussion
ConclusionDisclaimerAcknowledgementsReferences