Page 1
On optimal experiment design for identifying premise and conclu-
sion parameters of Takagi-Sugeno models: nonlinear regression
case
A. Kroll*. A. Dürrbaum**
* Department of Measurement and Control, University of Kassel, Kassel, Germany
(Tel: +49 561 804-3248; e-mail: [email protected] ).
** Department of Measurement and Control, University of Kassel, Kassel, Germany
(Tel: +49 561 804-3261; e-mail: [email protected] ).
Abstract: Optimal Experiment Design (OED) is a well-developed concept for regression prob-
lems that are linear-in-the-parameters. In case of experiment design to identify nonlinear Tak-
agi-Sugeno (TS) models, non-model-based approaches or OED restricted to the local model pa-
rameters (assuming the partitioning to be given) have been proposed. In this article, a Fisher In-
formation Matrix (FIM) based OED method is proposed that considers local model and parti-
tion parameters. Due to the nonlinear model, the FIM depends on the model parameters that are
subject of the subsequent identification. To resolve this paradoxical situation, at first a model-
free space filling design (such as Latin Hypercube Sampling) is carried out. The collected data
permits making design decisions such as determining the number of local models and identify-
ing the parameters of an initial TS model. This initial TS model permits a FIM-based OED,
such that data is collected which is optimal for a TS model. The estimates of this first stage will
in general not be ideal. To become robust against parameter mismatch, a sequential optimal de-
sign is applied. In this work the focus is on D-optimal designs. The proposed method is demon-
strated for three nonlinear regression problems: an industrial axial compressor and two test
functions.
Keywords: Takagi-Sugeno fuzzy models; optimal experiment design; design of experiments,
nonlinear regression; nonlinear system identification.
1. INTRODUCTION
Takagi-Sugeno (TS) models [56] are composed of a weighted superposition of local models. They are
often applied for non-linear regression, prediction/simulation and model-based control design. Often, lo-
cally affine models are used, which have the advantage that linear systems analysis and design methods
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 2
can be applied (after enhancement). An example is model-based nonlinear control design as parallel-
distributed-compensators [60]. The identification of TS models includes several tasks:
- Data collection
- Data pre-processing
- Input/regressor selection
- Dynamic order and dead time selection (in case of dynamical models)
- Determination of the number of local models
- Locating and shaping of local model boundaries
- Estimation of local model parameters
- Model validation
In data-driven modelling two major cases can be distinguished: The first option is that informative data
has to be selected from operational data records e.g. stored in historical databases. The second option is
that experiments can be carried out purposefully in order to generate informative data. This article con-
siders the latter. It addresses the problem of Fisher Information Matrix (FIM) based optimal experiment
designs (OED) for identifying both local model and partitioning related parameters of TS models. As the
design problem is nonlinear in the partition parameters, the partial derivatives that form the FIM depend
on the model parameters, which are unknown at the beginning of the experiments. Robust design methods
have been proposed to answer this problem including Bayesian, sequential and minimax designs [26,59].
A Bayesian design requires reliable a-priori information on the statistical description of the uncertainty. A
minimax design adopts the view that sometimes the best design under the worst conditions may be most
useful. It requires defining value ranges for the parameters and is computationally demanding. A sequen-
tial design uses all data ‘presently’ available to estimate the model parameters, which are in turn used for
designing the continuation of the experiment. The latter will be adopted in this paper as it requires little a
priori knowledge on the system and has moderate computational demands.
As the membership functions are strongly nonlinear, already a moderate deviation from the true parame-
ter values may prevent even robust design methods to fail. Moreover, the model structure has to be known
for FIM-based experiment design, which will in general not be the case. For this reason, a multi-stage
method is proposed: A (not optimal) space filling design is used in a first stage to obtain data suitable for
structural decision-making and to identify an initial model of sufficient quality. In the second stage, a
FIM-based optimal sequential design jointly for premise and conclusion parameters is carried out. It will
be shown how knowledge about the properties of the sensitivity functions of TS models can be exploited
to initialize the design points in highly informative areas such that the OED optimization problem be-
comes easier to solve. The contribution focusses on static modelling problems. The method is demon-
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 3
strated in three case studies: In the first, the true system is a ‘synthetic’ TS system such the system is in
the model class. The second targets approximating an industrial axial compressor map. The third is the
Friedman test function 1, which depends on five variables and therefore represents a higher dimensional
design problem.
Optimal experiment design to determine nonlinear static models is of significant practical interest as the
following recent examples show: Stricter emission legislation and demand for higher fuel efficiency in
tandem with more complex engines result in an increasing number of calibration parameters of the char-
acteristic maps of Diesel and Otto engines. Due to limited capacity and operational cost of test stands, in-
dustrial practitioners develop and apply Design of Experiments (DOE) techniques for experiment plan-
ning, see e.g. [24,37,58] for some recent work or the proceedings of the international conference series
“DOE in powertrain development” [53]. A traditional application area of DOE is bio-pharmacy, see e.g.
[17,48]. In [32] DOE is used in to efficiently estimate the strain life and the stress-strain curve for fatigue
analysis using nonlinear regression. DOE is used in [61] to analyse operating points of fuel cell with re-
spect to performance, efficiency and parametric sensitivity using regression models. Similarly, in [19]
DOE is used to fit models to identify optimal designs and operating points of chemical processes. For a
recent cross-domain compilation of the related area of (nonlinear) response surface methods the reader is
referred to [63].
In section 2, the state of the art is reviewed. Section 3 provides the problem statement. Section 4 decribes
the used identification method. Section 5 addresses the proposed experiment design method. Section 6
contains the case studies. Conclusions and outlook complete the paper.
2. OVERVIEW OF RELATED WORK
Methods for Design of Experiments (DOE) can be categorized in different ways, e.g. model-free or mod-
el-based, for static or dynamic modelling, for on-line or off-line application, for parameter estimation or
model discrimination, see [54] for a compact overview. In the following the focus is on work addressing
Takagi-Sugeno fuzzy models while selected works on neuro-fuzzy and neural network systems is also
included.
Block and factorial designs belong to the model-free design methods. Buragohain and Mahanta [9] apply
a full factorial design to select the training data for a dynamic ANFIS model from recorded data. Far-
zaneh and Tootoonchi [16] use a three-level factorial design to minimize the required amount of data for
designing TS systems. The method is demonstrated for a static nonlinear regression problem with two
inputs. Zanchettin et al. [65] used a two-level full factorial design followed by variance analysis (ANO-
VA) to examine the significance of design decisions regarding ANFIS and evolving fuzzy neural net-
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 4
works such as number or type of membership function. The method is tested using Mackey-Glass time
series and Box-Jenkins gas furnace data. Alizadeh et al. [1] extend this approach by using a three-level
factorial design and by considering a larger number of factors. Pontes et al. [51] use evolutionary opera-
tions (EVOP) to determine the topology of a 2-hidden layer MLP network and learning strategy parame-
ters. This includes using Taguchi arrays for identifying factor levels and full factorial designs with two
levels and central point. Another group of model-free approaches are the space-filling designs. Hartmann
et al. [29] target minimizing the global model error (instead of parameter variance) of static local model
networks. A pseudo Monte-Carlo sampling algorithm is used with the goal to homogenously distribute
the samples. The algorithm selects the point from a set of random candidate points that has the maximum
distance from the already chosen points. This selection is repeated until the desired number of points is
obtained. It uses a fixed number of samples per partition. Skrjanc [54] proposes a method that uses a su-
pervised hierarchical clustering algorithm that locally deploys the Gustafson-Kessel algorithm in combi-
nation with a space-filling experiment design for an evolving fuzzy model identification. The space-
filling design is realized by a pseudo Monte-Carlo sampling algorithm. The method is demonstrated in
static test problems. Belz et al. [5] investigate what order of experimentation is most suitable for regres-
sion problems in case the model is already used while the experiments are conducted. Deregnaucourt et
al. [12] propose a maximin distance-based design point selection from a candidate set for on-line design
of experiments and identification. They compare designs in input, output and product space and demon-
strate it for static modelling problems.
While these works address DOE for static modelling, another area is DOE for dynamic modelling. This is
often also referred to as test signal design. This problem is addressed by Heinz and Nelles [30] as well as
Gringard and Kroll [27] who compare the effects of using different general-purpose broad-band test sig-
nals (multisine, APRBS, chirp) on covering the model input space and on resulting model quality when
identifying locally affine dynamic TS models.
Model-free designs don’t consider the intended model structure during the experiment design such that
the collected data will in general not be optimal for the intended use. On the contrary, Optimal Experi-
ment Design (OED) methods plan experiments such that given a chosen model structure the design points
optimize the specified assessment criterion respecting possible constraints. One possible approach is to
optimize a criterion on the Fisher Information Matrix (FIM), which targets reducing the parametric uncer-
tainty of the resulting models. The frequently used criterion in engineering is the determinant of the FIM,
which yields D-optimal design. Hirsch and del Re [33] work towards this direction: They deal with static
TS models that locally employ second order polynomials using an extended D-optimal experiment de-
sign. In order to prevent several samples in the same location additional costs are assigned to the minimal
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 5
observed distance between samples. The partitioning is assumed to be given. Hametner et al. [28] assume
the partitioning to be given and investigate D-optimal on-line designs regarding the local model parame-
ters of dynamic local model networks and the weights of MLP networks with a single hidden layer. ARX
and output error (OE) model configurations as well as constraints on model input and output are consid-
ered. Deflorian and Zaglauer [11] address DOE for training dynamical MLP-type artificial neural net-
works. They utilize ramp-stair-sequence type test signals. Noise is assumed to be normal distributed. In
case of off-line designs of the amplitudes, Latin Hypercube Sampling and a third order polynomial based
D-optimal design for the amplitudes of an amplitude-modulated pseudo random signal is used. For on-
line designs query-by-committee and sequential D-optimal designs are used.
Instead of basing the design on the FIM, which means an uncertainty reducing experiment design, other
assessment criteria can be used. One example is to use the approximation error of the model. The work of
Suzuki and Yamakita [55] can be attributed to this area. They address DOE for piecewise affine ARX
(PWARX) models for hybrid systems and push the design points towards the discriminating surfaces be-
tween neighbouring ARX models to optimally estimate the local model partition boundaries. A cost func-
tion that punishes the distance from the boundaries is used and constraints on the input are considered. As
the design requires knowledge of the partition boundaries, alternating DOE and identification is proposed.
On DOE for artificial neural networks further literature is available, e.g. [3,10,13,31,57]. Also, different
terminology is used such as ‘active learning’, ‘lazy learning’ or ‘query construction’, see e.g. [2,8]. The
partitioning problem of TS systems, however, does not occur in ANN, which clearly distinguishes both
fields.
Compared with DOE for parameter identification of nonlinear physical models, DOE for TS models is a
large-scale problem due to the black-box multi-model approach. Kitsos [38] mentions a maximum num-
ber of 5 parameters. In the survey paper [21], the authors refer to a ‘large number of parameters’ in con-
text of estimating kinetic coefficients in process engineering models. The referenced problem [20] has 12
parameters and a joint experiment design for all 12 parameters proved impossible. For a wider coverage
of the field of DOE, the reader is referred to the overview articles [21,52].
Finally, it can be stated that all papers that address optimal experiment design for TS models assume the
partitioning to be given. This significantly simplifies the design, as the design problem regarding the local
model parameters is LiP in case of static or (N)ARX-type dynamic models. As the partitioning is the sole
source of nonlinearity in locally affine TS models, it is important to also determine the partition parame-
ters accurate and with little uncertainty. Non-model-based experiment design approaches such as space
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 6
filling designs do not provide for optimal designs e.g. in the sense of a maximal reduction of the uncer-
tainty of the estimated parameters. Also they don’t provide for an uncertainty estimate.
This contribution builds on previous work [14,41], where first algorithmic ideas of a robust 2-stage de-
sign were presented and demonstrated using the synthetic TS system of case study one as target. For this
work, the distribution of the design points over the different stages was investigated and initialization
methods for the optimal design in the 2nd stage were developed. More complex and higher dimensional
case studies and extensive parameter studies were carried out.
3. PROBLEM STATEMENT
Given is a nonlinear static multi-input single-output system f:
)())(())(( kekfky += xx (1)
where x ∈ ℜn are the inputs, y ∈ ℜ is the output, and e(k) is a random variable. f is to be approximated by
a Takagi-Sugeno (TS) model that is estimated from N observations (xk,yk).
3.1 Model
TS models are composed of c local models. Notation can be in form of rules or mathematical expressions.
The latter will be adopted:
∑∑
∑
=
=
= ⋅=⋅
=c
iiiic
jjj
c
iiiii
yy
y1
,LMMF
1,MF
1,LM,MF
),(ˆ),(),(
),(ˆ),(),,(ˆ ΘxΘz
θz
ΘxΘzΘzx φ
µ
µ (2)
The i-th local model uses the inputs or ‘regressors’ x to predict ),(ˆ ,LM iiy Θx . In the sequel, local affine
models are considered:
xaxΘΘx Tii
TTTiii ay +== 0,,LM,LM ] 1[),(ˆ , (3)
i.e. ΘLM, i ∈ ℜn+1. The membership functions ),( ,MF ii Θzµ define the region and degree of validity of the i-th
local model. The ),( MFΘziφ are referred to as fuzzy basis functions. They permit a more compact nota-
tion. The scheduling variable z can differ from the regressor x, but is often chosen identical as in this arti-
cle. In this work, multi-variate membership functions (MF) which result in explicit form from fuzzy clus-
tering [6] are used:
1
1
1
122
)(
−
=
−
−−= ∑
c
jjii
ji
νµAA
vxvxx (4)
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 7
Their parameters are the partition centres vi, vj ∈ ℜn, i, j ∈ {1, …, c} and the fuzziness parameter
ν ∈ ℜ>1. The latter is a global parameter used to adjust the fuzziness of the partitioning, see [40] for a
discussion on its choice. Distance norms can e.g. be inner product norms:
)()(2
jjjjj
vxAvxvxA
−−=− , (5)
where A j is a norm inducing matrix (including Euclidean and Mahalanobis norm as special cases), or Lp
norms. These multi-variate MF permit to achieve parsimonious models with good approximation proper-
ties. They are continuously differentiable, simplifying the computation of the FIM. As µi = ϕi holds, (2)
simplifies to:
∑=
⋅=c
iiii yy
1,LMMF ),(ˆ),(),(ˆ ΘxΘxΘx µ (6)
with ];...;[ 1TMF
Tc
T vvΘ = . All local model parameters can be concatenated as ];...;[ TLM,
TLM,1
TLM cΘΘΘ = . An al-
ternative are Gaussian MF, which are often used for tree-type partitioning approaches such as LOLIMOT
[49]. These can be treated similarly but are omitted due to limited space.
3.2 Optimal experiment design
In this work it will be assumed that {e(k)} in (1) is a sequence of i.i.d. normally distributed random varia-
bles with zero mean and finite variance σ2 > 0. This simplifies computing the FIM for the discrete case
with N observations (xk,yk), where the x(k) = xk are referred to as ‘design points’ to [36]:
0
)),((ˆ)),((ˆ1
12
ΘΘΘ
ΘxΘ
ΘxI
== ∂∂⋅
∂∂= ∑
TN
k
kyky
σor (7)
0
)),((ˆ)),((ˆ1 ];[
12,,
ΘΘ
ΘxΘxI
== ∂∂⋅
∂∂== ∑
j
N
k ijiji
kykyII
θθσ (8)
The Cramer-Rao inequality states that for any unbiased estimator
1)ˆ(cov −≥ IΘ N (9)
holds [44], which relates the FIM with the uncertainty of the parameter estimate. The aim of an optimal
experiment design is to plan the experiments such that a scalar metric of the FIM e.g. as the magnitude of
its determinant, its trace or maximum eigenvalue is maximized. This provides for the well-known D-, A-
and E-optimal designs, respectively. In case of D-optimal designs the required computations are compara-
tively simple and yield the uncertainty region with minimum volume. A disadvantage is that the design
overemphasizes parameters with high sensitivity. An A-optimal design neglects all non-diagonal elements
of the FIM. Hence, it does not exploit all available information and it is not reliable in case of strong cor-
relations between parameters. E-optimal designs address the shape but not the size of the uncertainty re-
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 8
gion, such that large regions may result. In addition, the possible discontinuities (due to changing eigen-
values) complicate the optimization. In engineering, D-optimal designs are popular [64] and will also be
considered in this paper. This means N design points ];...;[ 1 NxxX = are to be determined such that
GN ∈=
xxIX
,...opt
1
)det(maxarg (10)
holds, where G is the permitted domain/design space. As the FIM is evaluated at Θ = Θ0 it is only locally
optimal in case of models that are nonlinear in their parameters. This contribution considers a joint opti-
mal design for partition (ΘMF) and local model parameters (ΘLM) of TS models. A- and E-optimal designs
can be treated similarly.
4. IDENTIFICATION OF TAKAGI-SUGENO MODELS
Given N observations (xk,yk), the identification objective is to determine the number c of partitions (local
models) and their soft boundaries (defined by the prototypes v1,…,vc) and the parameters {(ai,0,ai)} of all
c local models. The identification criterion used is the common mean squared approximation error
∑=
−=N
k
kykyN
J1
2MSE )))((ˆ))(((
1: xx
. (11)
Theoretically, all parameters ΘMF and ΘLM could be determined by solving the mixed-integer optimiza-
tion problem resulting from minimizing (11). However, this problem is very hard to solve. Instead, if little
information about the system is available, computationally efficient fuzzy clustering and ordinary least
squares methods are used to determine a (suboptimal) initial model. Succeeding, the model’s parameters
are simultaneously optimized subject to (11) using Matlab’s function fmincon.
Typical fuzzy clustering algorithms are Fuzzy c-Means (FCM), Gustafson-Kessel and Gath-Geva algo-
rithm [4,6,34]. The FCM uses the same distance norm for all clusters. It is a simple, robust and quick
basic method. The Gustafson-Kessel algorithm adapts the distance norm of each cluster individually to
the data scattering. This adds flexibility at the cost of significantly more parameters, higher computational
demand and worse convergence characteristics. The Gath and Geva algorithm is even more flexible.
Clustering is carried out in the product (X,Y) space to acknowledge the system’s non-linearity. The FCM
minimizes the cost function
∑∑= =
−=N
k
c
iikkiJ
1 1
2
2,FCM )( vxνµ (12)
by alternatingly computing new partition centers
};...;1{1
,1
, ciN
kki
N
kkkii ∈∀⋅= ∑∑
==
νν µµ xv (13)
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 9
for given values of the membership values and then updating the memberships using (4) with the updated
values of the vi. It converges to a local minimum or saddle point of JFCM. The prototypes are initialized
using uniform distributed random numbers rather than the membership values: The latter results in proto-
types that scatter just little around the mean of all data, as for uniformly distributed random variables
ciNik ,...,1for , =∞∀→→ xvx holds. The number of clusters c can e.g. be determined using cluster validation
criteria such as the Xie-Beni-Index [62] or the fuzzy hyper volume [25] or by using cluster-merging
methods [4]. In this work, the approximation error is used, instead, as it is closer to the utilization of the
clustering results. In this article, a modified implementation of the Matlab™ FCM is used such that the
prototypes can be initialized.
Given the membership functions µi, the model output (6) is linear in the local model parameters
cii ,...,1,,LM =Θ . The latter can therefore be estimated using the ordinary least squares (OLS) method. The
parameters i,LMΘ of each local model can be identified independently from each other by minimizing the
deviation of the local model’s prediction (3) from the reference value y, which is called ‘local LS’
(‘LLS’), see e.g. [66]:
( )∑=
−N
kiiii kykyk
i1
2LM,,LM )),((ˆ))(()(minarg:ˆ
LM,ΘxxΘ Θ φ (14)
iT
nniiii kkxakxaaky ,LM,11,0, )(:)(...)())((ˆ Θφx =+++= (15)
with ))((:)( kk ii xφφ = . This permits to interpret a local model as a linearization of the true system. However,
neglecting the interpolation between neighbouring local models reduces the prediction quality, in general.
Alternatively, all c parameter vectors i,LMΘ can be estimated simultaneously by minimizing the approxima-
tion error of the TS model, which is referred to as ‘global LS’ (‘GLS’):
( )∑=
−N
k
kyky1
2LM )),((ˆ))((minarg:ˆ
LMΘxxΘ Θ
(16)
LM
,,11,1,10,0,111111
1,11,0,
)(:
];...;|...|;...;|;...;[ )]()();...;()(|...|)()();...;()(|)();...;([
))(...)()(()(ˆ
Θφ k
aaaaaakxkkxkkxkkxkkk
kxakxaakky
T
Tncnccncncc
c
inniiii
=
⋅=
+++=∑=
φφφφφφ
φ
(17)
The GLS provides for better predictions at the cost of worse local interpretability, in general. Differences
between LLS and GLS shrink as ν→1. In both cases the optimal parameters are computed by
YΦΦΦΘTT 1][ˆ −= (18)
with the vector of the output observations ][ ky=Y and the regression matrix )]([ kTφΦ = composed of
regressors ϕ(k) as defined in (15) and (17) for LLS and GLS, respectively. (It is remarked that the same
value of the fuzziness parameterν is used for clustering, parameter estimation and model evaluation.)
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 10
The TS model obtained so far is not optimal regarding its prediction error, as clustering minimizes the
grouping-related criterion (12) but not the approximation error (11). Therefore, all real-valued parameters
MFΘ andLMΘ are subsequently optimized using cost function (11) with Matlab™’s fmincon.
5. EXPERIMENT DESIGN FOR TAKAGI-SUGENO MODELS
At the beginning of the modelling task, often knowledge about an appropriate model structure is not
available, which however is required for a model-based experiment design. Therefore, a 2-stage proce-
dure is proposed: In the first stage, a model-free space filling design is used. The resulting data is used to
identify the structure and the local model parameters of an initial TS model. This model is the basis for an
optimal experiment design with respect to both membership function and local model parameters in the
2nd stage. The proposed procedure is shown in Fig. 1 and will be explained in the following subsections.
5.1 Initial space filling design and initial TS model identification (1st stage)
In the first stage, a model-free space filling design is used to place N1 design points {xk}. Different meth-
ods can be used; for example, random placement, regular grids and Latin Hypercube Sampling (LHS).
Random sampling can result in locally strongly varying density of design points, which may be disadvan-
tageous. Regular grids enforce a uniform distribution, but result in an exponentially growing number of
points with increasing number of design variables. This is avoided by LHS. LHS will not necessarily pro-
vide non-collapsing designs that cover the design space uniformly. Optimal LHS have these properties
(see [15] for a detailed discussion). In this contribution, an approximately optimal LHS that maximizes
the minimum distance between the design points is used, as provided by the function lhsdesign of the
Matlab Statistics Toolbox™ [46]. The latter generates a number of LHS designs and takes over the best
one [35]. This procedure will in general result in a suboptimal design, but it is fast and considered as suf-
ficient for the purpose of this work. For the resulting design an experiment is carried out that provides for
N1 observations M1 = {(xk,yk)} (step 1). This data is used to identify an initial TS model using FCM and
OLS providing for parameters )},{( )0()0(ii av
)) (step 2). Partition and local model parameters are optimized
using fmincon to minimize the approximation error (11) on the given data set M1 yielding )},{( )0()0(ii av
(step 3).
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 11
5.2 Optimal experiment design for TS models (2nd stage): Main algorithm
In the 2nd stage, a FIM-based D-optimal design is applied jointly for premise and conclusion parameters.
Using the initial TS model from the 1st stage, the partial derivatives required for the FIM can be computed
and N2 optimal design points Xopt can be determined. A ‘robust’ sequential design is used as the TS model
parameters from the first stage that are used for the optimal experiment design will possibly deviate from
the true parameters. Initialization strategies for the design points of the OED cycles are discussed in the
next subsection. A corresponding experiment is carried out providing for the new observations M2(1) =
Fig. 1. Proposed 2-stage optimal experiment design and model identification process.
Start
Make design decisions: N1, N2, ν, nFCM, δgreedy
Choose termination criteria for FCM, fmincon, greedy andsequential OED alg.; choose initialization strategy forOED
l = 1
LHSStep 1
Random initialized FCMOLS
N1 observationsM1 = {(xk,yk)}
)},{( )0()0(ii av
Step 2
Nonlinear LSE (FMINCON)Step 3)},{( )0()0(
ii av))
Stage 1
OED (Greedy) l-th run
Repeat nFCM times
Select best model
Nonlinear LSE with )(2
)1(21 ... lMMM ∪∪∪
Step 4+2(l-1)
Step 5+2(l-1)Update FIM with updated parameters
1+← ll
End
)},{( )()( li
li av
Stage 2
no
yes
Experiment
Initial model parameters
Optimized model parameters
ExperimentN2 observationsM2
(l) = {(xk,yk)}
Updated model parameters
Step 0
Initialize N2 design points for OED
N1 design points {xk}
N2 design points {xk}
Terminate?
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 12
{( xk,yk)}. All available data )1(21 MM ∪ are used to estimate a new TS model )},,{( )1()1(
0,)1(
iii a av . The FIM is
recalculated with the updated TS model parameters to update the assessment of the information content of
the collected data. If the TS model parameters have significantly changed, another OED cycle is carried
out, which uses the updated model parameters to compute the FIM and the design points from the previ-
ous cycle for initialization. Again N2 optimal design points are determined, new observations M2(2) =
{( xk,yk)} are collected in the experiment and a new TS model is estimated using all available data
)2(2
)1(21 MMM ∪∪ . This procedure is repeated until a given number of design cycles has been carried out
and/or a convergence related criterion is met.
To compute the FIM, partial derivatives (sensitivity functions) of the model output (6) with respect to all
membership function and local model parameters are required. Regarding the local model parameters one
obtains:
),(1|),(ˆ
:
),(|),(ˆ
:|
MF0,
MF0,
0,
,
ΘxΘx
ΘxΘx
iai
i
ijai
jji
ii
jii
ya
xy
a
µθ
µθ
θ
θ
⋅=∂
∂
⋅=∂
∂
=
=≠ (19)
For the membership function parameters vi,j:
∑=
=Θ ∂∂⋅=
∂∂ c
i fi
iiv
i vy
yjii
1 ,
MFLM
),(),(ˆ|
),(ˆ,
ΘxΘx
Θx µθ
(20)
with
∑∑≠=
−−−
≠=
− −−⋅
⋅
+
−−=
∂∂ c
ijj j
fif
j
ic
ijj j
i
fi
i
d
vx
d
d
d
d
v 12
,1
2
2
2
2
1
1
1
2
2
,
MF)(2
11
1),( νν
ν
νµ Θx (21)
and for ir ≠ :
2
,1
1
2
2
2
1
1
1
2
2
,
MF)(2
11
1),(
r
frf
r
ic
ijj j
i
fr
i
d
vx
d
d
d
d
v
−⋅
⋅
+
−−=
∂∂ −
−
≠=
−
∑νν
νµ Θx (22)
can be derived for Euclidean distances
2
2
2rrd vx −= . (23)
Generalized expressions for inner product or Lp norms can be found in [39].
Given c local models and n input variables there are in total c · n membership function parameters (vi =
[vi,j], i = 1,…,c; j = 1,…,n) and c · (n + 1) local model parameters (ai,j, i = 1,…,c; j = 0,…,n). For a design
that covers both parameter groups, the FIM has dimension c(2n + 1) × c(2n + 1). It may not be well con-
ditioned because of the different character of the parameter groups. As the FIM has a block structure
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 13
=
∂∂
∂∂
∂∂
∂∂
∂∂
∂∂
∂∂
∂∂
= ∑= ji
Tji
jijiN
k
sr
k
ji
k
sr
k
ji
k
sr
k
ji
k
sr
k
ji
k
v
y
v
y
a
y
v
y
v
y
a
y
a
y
a
y
,,
,,
1
,,,,
,,,,
2:
ˆˆˆˆ
ˆˆˆˆ
1VB
BAI
σ (24)
and is symmetrical to its secondary diagonal, det(I ) can be computed from the determinants of two lower-
dimensional matrices:
)det()det()det( ,1,,,, jiji
Tjijiji BABVAI −−⋅= . (25)
As this comes at the cost of an additional matrix inversion, decomposition (25) is not used further in this
paper. As the TS model is linear in the local model parameters ai,j, the derivatives jiay ,/ˆ ∂∂ given in (19)
do not depend on the local model parameters. However, the TS model is nonlinear in the membership
function parameters vi,j, such that the partial derivatives jivy ,/ˆ ∂∂ given in (21)-(22) depend on the parame-
ter values making the experiment design only locally optimal. The fuzziness parameter ν in (4) is typical-
ly chosen small to obtain extended kernel regions of the membership functions (and narrow transition
zones), see [40], which supports the interpretation of the local models as approximate linearization of the
system. A side effect is, however, that the sensitivity functions significantly differ from zero only in small
regions making the design/optimization problem hard to solve.
In the case studies very large values of det(I ) will be observed. These can be understood by recalling the
Leibniz formula, which computes the determinant of an (n × n)-square matrix from the sum over all per-
mutations of products of n matrix elements. Hence, as a rule of the thumb the determinant of a FIM can
be expected to increase with the magnitude of the matrix elements (i.e. the sensitivity functions evaluated
in N design points) themselves and with the size of the matrix (resulting from the number of parameters).
As the FIM is additive in all observations, just occasional large sensitivities can result in large values of
det(I ).
At first, Matlab™’s function fmincon was used for design point optimization. However, it worked
properly only in the first of the three case studies. In the other two, it terminated always after the first iter-
ation. This is attributed to the small values of the minimized cost function (det(I )-1) in the order of e.g.
10-50. Instead, a simple greedy algorithm was used for all 3 case studies for comparability of the results.
Assuming similar value ranges for each coordinate; it perturbs each design point coordinate by ±δ and
selects the change providing for maximal improvement. Once the changes for all design points have been
determined, all moves are simultaneously implemented and a new iteration starts. The procedure is re-
peated for all design points until changes are negligible. The step width δ should be chosen sufficiently
small compared with the range of the design space (see section 6.1). This simple approximated gradient
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 14
descent algorithm with fixed step is certainly not efficient, but effective enough to demonstrate the capa-
bility of the proposed design method.
5.3 Choice of initial design points for the optimal design
The used method for optimal experiment design converges locally such that the result depends on the ini-
tial values xk,0 of the design points. In case N2 = N1 holds, the design points of the space filling design car-
ried out in stage 1 can be used as initial values for the 2nd stage. If N2 ≠ N1, e.g. grid, random or LHS
placement can be used to obtain the initial values xk,0. Kroll and Dürrbaum [41] showed that design points
that are optimal regarding local model parameters are expected to lie on the inner partition boundaries or
on the range limitations. The latter will (with some abuse of denomination) be referred to as ‘outer parti-
tion boundaries’ in the sequel. Design points that are optimal regarding the prototypes are expected to lie
on the inner partition boundaries approximately where these are crossed by the connecting line between
two neighbouring prototypes [41]. The TS model resulting from the first stage can be used to compute
initial design points in regions where high information gain is expected. One approach is to distribute the
design points along the inner and outer partition boundaries. This can be implemented by applying a Vo-
ronoi decomposition given the set of partition centres. In case of 2 inputs, the Voronoi decomposition
consists of a set of lines and can be computed with the Matlab™ function voronoi. It provides for the
Voronoi cell bounds. The open outer cells are augmented with boundaries resulting from the design space
bounds providing for c bounded convex polytopes. Then, design points are placed one in each vertex of a
polytope and distributed equidistantly on each bounding line. The same number of m points is assigned to
each inner bounding line. On each design space bound, also m points are placed. Moreover, where an in-
ner bounding line is crossed by the connecting line between two neighbouring prototypes, an additional
design point is located. Finally, multiple identical design points (particularly in the vertices) are removed.
This method will be referred to as ‘Voronoi initialization’ in the sequel.
For more than two dimensions the polytope’s boundaries are (hyper-)planes and the Matlab™ function
voronoin has to be used. The latter, however, does not provide the required information to determine
the separating (hyper-)planes for the outer ‘unbound’ Voronoi cells. For that reason an approximation of
the Voronoi initialization is used, referred to as ‘µ0.5 initialization’: Points with membership of µ = 0.5 lie
‘in the middle of the transitions’ between neighbouring partitions. In fact, for ν → 1 the bounds of the
support of an α-cut for µ = 0.5 of a membership function equal the bounds of the corresponding Voronoi
cell (if Euclidean distance norm is used in both cases). For this reason, if N2 initial points are needed,
those N2 points from the LHS design having membership values closest to 0.5 are selected. This strategy
can certainly be refined e.g. by also considering the resulting distribution over the different inner bounda-
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 15
ries, but proved to work sufficiently in the third case study. Fig. 2 illustrates the four initialization strate-
gies.
5.4 Selecting the number of design points for space-filling and optimal design
For a successful experiment design the number of design points should be chosen such that the system’s
nonlinearities are captured. This depends on the character of the targeted system: Systems with locally
strongly varying gradients require denser sampling than system with smooth curvature of their response
surface. A rule of the thumb for the area of space filling designs is to choose )dim(101 x=N . This rule
was analysed in detail in [45] up to 20 dimensions for Latin Hypercube Designs in context of Gaussian
process (GP) models that are parametrized using maximum likelihood estimation. The conclusion was
that this rule “will provide reasonable prediction accuracy for ‘tractable’ functions and is sufficient to di-
agnose more difficult problems” and that “it is a reasonable rule of the thumb for an initial experiment.”
For a D-optimal design, another rule of the thumb suggests to use )dim(5.12 Θ=N [50].
The analysis for N1 in [45] refers to GP models having one parameter per dimension plus mean and vari-
ance: 2)dim()dim( += xΘ . On the contrary, with )1)dim(2()dim( += xΘ c TS models have significantly
more parameters, see section 4.2. Hence it can be expected (and will be evident in the case studies) that at
least for TS models with many local models the proposed value of N1 will be too small. While the rule
worked for the simplest case study that went in tandem with a small value of c = 3 partitions, dependent
on the complexity of the system characteristics c to 2c times the number proposed by the aforementioned
rule provided for suitable results in the 1st stage. The rule for N2 worked well in all case studies in sec-
tion 6. It has to be remarked, however, that in the 2nd stage also the design points from the 1st stage are
used for parameter estimation.
6. CASE STUDIES
(a) (b) (c) (d)
Fig. 2. Illustration of the initialization strategies: LHS (a) or grid (b) placement of N = 25 points, Voronoi with
m = 5 (c) and µ0.5 with N2 = 10 (d).
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 16
Three case studies on experiment design and nonlinear regression were selected: In the first study the true
system is a 2-input TS system and therefore within the model class. In the second study the characteristic
map of an industrial axial compressor has to be approximated. The response surface of this challenging
system has the shape of a steep bended cliff with locally varying curvature embedded in two flat or nearly
flat plateaus, respectively. The narrow cliff area must be well covered by design points to obtain a good
TS model. In the third study, the Friedman 1 test function is approximated. Having a 5-dimensional de-
sign space, this benchmark problem is higher dimensional. In order to assess the improvement of the pro-
posed OED method, the results can be compared with the ones from the space filling design carried out in
the first stage.
6.1 Design choices and implementation
For a statistical assessment of the methods, in the first stage for each choice of N1 20 different LHS data
sets were generated. The FCM was repeated nFCM = 20 times with different random initializations. The
toolbox default settings on FCM termination were taken over, i.e. a maximum of 100 iterations or a JFCM
change of less than 10-5. LLS and GLS have both been used in the case studies, but for the sake of com-
pactness only the GLS results are reported, which have provided for smaller prediction errors, in general.
The Matlab function fmincon can draw on different optimization algorithms, of which SQP was cho-
sen. Some changes in the default settings improved the performance (ConstraintTolerance: 1.0000e-10
(default: 1e-6), MaxFunctionEvaluations: 100000 (default: 100 n_Var), MaxIterations: 1000 (default:
400), OptimalityTolerance: 1.0000e-10 (default: 1e-6), StepTolerance: 1.0000e-10 (default: 1e-6). Given
design spaces with value ranges of 1 to 2, the step width of the greedy algorithm was chosen as δgreedy =
0.05 as compromise between resolution and computational burden. The greedy algorithm is terminated if
det(I ) has not improved within 15 consecutive iterations or if 500 iterations on all N2 design data were
carried. A fixed number of two OED cycles was used as termination criterion for the 2nd stage, as the
changes after the first OED cycle were in most case insignificant. The case studies were carried out using
Intel Xeon PCs with 3.4 GHz, 4 core/8 threads, 64 GB RAM, Windows 7 64 bit, Matlab™ R2016a 64bit
using the Parallelized Optimization Toolbox.
6.2 Assessment criteria
As criterion for the model performance the root mean squared error (RMSE):
))(ˆ)((1
:1
2RMSE ∑
=
−=lN
kl
kykyN
J (26)
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 17
determined for a uniform grid of test points in the input space will be used and referred to as Jgrid. It is
easier to interpret as the MSE, as it is in the physical unit of the dependent variable. The outmost points
are placed into the limits of the chosen design range (xj,min, xj,max), and the remaining are equally distribut-
ed in between. Noise-free observations are used for the test data set to ease interpretation of the RMSE
values. If the system is in the model class as in the first case study, the error of the parameter estimates is
also assessed. As parameter confidence/uncertainty related measure, the determinant of the FIM on the
identification data is used. As the local model and partition parameters related partial derivatives have
different magnitudes, not just the FIM for all parameters (det (I ) referred to as ‘det (I (av)’) will be stud-
ied, but also the submatrices in (24), i.e. A ij related to the local model parameters (ai,0, ai) (det(A ij) re-
ferred to as ‘det (I (a)’) and V ij related to the partition parameters vi (det(V ij) referred to as ‘det (I (v)’).
The values after the FIM update with the updated TS model parameters will be reported. For a statistical
assessment, the criteria’s mean is reported in the following and additional criteria (incl. the computational
effort) are given in the appendix.
6.3 TS system
In this case study, data is generated using a TS system such that the true system is known and is within
the model class. It consists of three subsystems of similar size and compact shape, such that it is easy to
sample the systems response surface and an initial partitioning using the FCM will not be difficult. Being
a (2 × 1)-system, results can easily be visualized. The true local models are:
321213221212121211 ]1;;[:12 ;]1;;[:424 ;]1;;[:244 ΘΘΘ ⋅=++=⋅=−−=⋅=−+−= xxxxyxxxxyxxxxy , (27)
and the true membership functions (4) have centres in:
[ ] [ ] [ ]TTT 1 ;5.1;5.1 ;5.0;5.0 ;5.0 321 === vvv (28)
and use Euclidian distance. The admissible range of the inputs is constrained to xj ∈ [0; 2]. As test data set
a grid of 50 × 50 points with the corresponding observations is used. The fuzziness parameter is chosen as
ν = 1.3. The true system is shown in Fig. 3. It is assumed that the number of local models is known (c =
3). N1 = N2 = 25 is chosen to identify the 15 model parameters, i.e. the OED in the 2nd stage is initialized
with the LHS design points from the first stage. The noise-free case and noisy data with σ2∈{0.2, 0.5,
1.0} is considered. Other design choices (e.g. other values of ν) were studied in [14]. The results for 20
different LHS data sets are recorded in Table 1 to 3.
(a) (b)
Fig. 3. True local models (a), true MF (b), and true TS system graph (c).
(b) (a) (c)
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 18
Table 1 and 2 show that the approximation error declines successively along the algorithm steps, though
only slightly in the 2nd stage. Table 3 shows that the parametric uncertainty, however, is significantly re-
duced by the 2nd stage. Note that the FIM bases on 50 observations after step 2 and 3, respectively, on 75
after step 5 and on 100 after step 7. The first column (σ2 = 0) in Table 1 and 2 show that the true system is
ideally recovered in the noise-free case. In case of noisy data the model error increases noticeable. This
comes by no surprise due to the small size of the identification data set compared to the number of model
parameters. The model quality can be improved at the cost of using more design points, see [14].
In Fig. 4 the course of the algorithm is illustrated for an example run (noisy observations with σ2 = 0.2). In
this example, the partitioning resulting from the FCM significantly deviates from the true one, which can
be seen in the Voronoi decomposition. This dilutes the local model estimates computed by the GLS. The
Table 1: Median of RMSE on Test Data for Different Noise Levels (TS System)
Result
after step
Median Jgrid
σ2 = 0 σ
2 = 0.2 σ2 = 0.5 σ
2 = 1.0
2 1.0504·100 1.1072·100 1.2104·100 1.5786
3 7.1462·10-7 2.7456·10-1 6.6154·10-1 1.2693
5 2.6699·10-7 2.0532·10-1 5.3672·10-1 1.0612
7 2.1609·10-7 1.9698·10-1 5.2050·10-1 1.0282
Table 2: Median of Frobenius Norm of Parameter Error for Different Noise Levels (TS System)
Result
after step
Median ||Θ - Θ0||2
σ2 = 0 σ
2 = 0.2 σ2 = 0.5 σ
2 = 1.0
2 4.6524·100 4.6215·100 4.5393·100 5.8597
3 2.0539·10-6 8.9439·10-1 2.2061·100 4.9549
5 1.1800·10-6 4.0161·10-1 1.0606·100 2.4260
7 1.1528·10-6 2.6854·10-1 6.7917·10-1 1.5992
Table 3: Median of Determinates of Full Information Matrix for Different Noise Levels (TS System)
Result
after step
Median det(Iav)
σ2 = 0 σ
2 = 0.2 σ2 = 0.5 σ
2 = 1.0
2 7.7467·1016 9.9911·1016 9.7608·1016 2.0528·1017
3 1.9760·1017 2.7578·1017 6.1872·1017 1.6657·1018
5 6.9534·1026 5.8458·1026 2.7993·1026 3.0188·1026
7 3.7158·1035 2.6184·1035 1.3573·1035 4.4041·1034
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 19
nonlinear optimization of partition and model parameters in step 3 significantly improves the model: The
optimized prototypes (○) approximately match the true ones and so do the local model parameters. As
expected, the first OED cycle moves the design points towards the partition boundaries. The approxima-
tion error on the test data set reduces insignificantly, but the uncertainty of the estimates is significantly
reduced as det(I ) increases by several magnitudes. The second OED cycle further reduces the uncertainty.
However, the design points have changed little, such that from a cost point of view the first OED cycle
would have been sufficient.
1st stage: Results of FCM and GLS in top row and of optimization in row below:
Jgrid|GLS = 1.278, Jgrid|opt = 0.248, det(I )|GLS = 1.87·1018, det(I )|opt = 6.57·1017
LHS data Voronoi dec. after FCM MF from FCM Local models from GLS TS model
Voronoi dec. after opt. MF after opt. Local models after opt. TS model after opt.
1st OED cycle: Jgrid = 0.212, det(I )|opt = 6.70·1026
Optimal design points Voronoi decomposition MF Local models TS model
2nd OED cycle: Jgrid = 0.208, det(I )|opt = 1.23·1030
Optimal design points Voronoi decomposition MF Local models TS model
Fig. 4. Example design process (TS system)
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 20
6.4 Axial compressor characteristic map
The second case study addresses estimating the characteristic map of a single-stage axial compressor
(NASA CR-72694) [18]. Fig. 5 shows the reference map. The true system is not within the model class.
This problem is difficult for experiment design, as most of the response surface is flat and does not pro-
vide any information on the choosing the membership function parameters. The informative area is a tiny
belt which is rarely sampled by space filling design such that a comparatively high number of design
points are required in the 1st stage. Secondly, mismatch of parameters used for the OED can yield design
points, which lie outside the informative area. The cliff-type response surface in Fig. 5 is bended horizon-
tally and vertically and has locally varying curvature making an approximation with locally-affine TS
models difficult. The data for this case study was generated by a tool for compressor maps [43]. With this
tool (and some post-processing) a database of N = 56,248 observations of the input/output behaviour of
the compressor was generated. An output value of 0 was assigned to inputs in the surge area such that the
“cliff” bends crisply at the bottom but softly at the top. While Fig. 5 shows the transfer characteristics in
physical units, for the case study each signal was scaled to the unit interval to avoid scale-effects. As the
tool [43] is no simulator that can be used to provide an output for an arbitrary chosen input vector, when-
ever a design point is calculated, the computed point is replaced by the (Euclidean) nearest neighbouring
point in the data base. Different values of c ∈{4, 5, 6} and ν ∈{1.05, 1.1, 1.2, 1.5, 1.8} were examined.
The choice of c = 6 and ν = 1.2 was found to be well suited for the given problem (see also [42]) and was
used for the following results. The TS model has 30 model parameters. The test data set was produced
using a grid of 50 × 50 points.
The design procedure was tested with different numbers of design points N1 ∈ {200, 250, 500, 750} for
the LHS design in the first stage. For the case of noise-free observations, the resulting model performance
Fig. 5. Characteristic map of single-stage axial compressor Nasa CR-72694 based on data from [18].
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 21
is shown in Fig. 6: For each value of N1 20 different LHS data sets were generated. For each data set the
FCM was applied 20 times each with a different random initialization. The model parameters were com-
puted with the GLS and the best of the 20 models was used for the optimization in step 3. Fig. 6 shows
the RMSE on the test data set. As the slope between the two plateaus of the map is steep and narrow, a
sufficiently large number of design points is required to cover it. In case of N1 = 200, about 13 to 18 of
the 20 LHS data sets result in below average model quality on the test data set after step 3, for N1 = 250
only one. If N1 is increased further, all models perform well, but little improvement results. The same is
observed for noisy data. Therefore, N1 = 250 will be used for the first stage in the sequel.
The LHS design points from the 1st stage are used to initialize the optimal experiment design in case of N2
= N1. For N2 ≠ N1 the design points were initialized as regular grid, randomly with N2 ∈ {49, 81, 121,
169, 225}, or by using the Voronoi initialization with m ∈ {4, 5, 6} points per line. The noise-free case
and noisy data with σ2∈{0.2, 0.5, 1.0} are considered. Table 4 and 5 record results for a Voronoi initiali-
zation with m = 4, which provide for approximately N2 = 40 design points (after identical points have
been removed). The other initialization methods provided for very similar results and a larger value of m
did not result in better models such that for the sake of compactness those results are not recorded in this
paper. The OED cycles negligibly reduce the prediction error, but significantly reduce the model uncer-
tainty. Note that I bases on 250 observations after step 2 and 3, respectively, on 290 after step 5 and 330
after step 7. Table 4 and 5 also show that noisy observations significantly impair the resulting model qual-
ity. This can be encountered by increasing the number of observations used for parameter estimation.
For the same design choices, the initialization dependency of the results is illustrated by Fig. 7: Fig. 7a)
shows how the results vary for 20 different LHS data sets and for 20 random initializations of the FCM
for noisy observations (σ2 = 0.2). The best model after 20 applications of FCM and GLS in steps 1 and 2,
Fig. 6. Error on test data set after step 2 (-) and step 3 (--) on test data set for N1 = 200 (×); N1 = 250 (*); N1 =
500 (○); N1 = 750 (◊).
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 22
respectively, for each of the 20 LHS data sets is taken over (Fig. 7b) for optimization in step 3 (Fig. 7c)
and then initializes the OED stage. Two OED cycles are carried out (Fig. 7d,e) Therefore, Fig. 7a) shows
results for 400 models and Fig. 7b-e) each for 20. Fig. 7 shows, that the error on the test data set little var-
ies with changing FCM initialization. On the contrary, the error significantly varies with the used LHS
data set. LHS data set 19 results by far in the worst model, which, however, becomes the 6th best model
after step 3 obtained and matches the median after the OED stage. The optimization in step 3 mitigates
effect of disadvantages initializations.
Table 4: Median of RMSE on Test Data for Different Noise Levels (Compressor)
Result
after step
Median Jgrid
σ 2 = 0 σ
2 = 0.2 σ 2 = 0.5 σ
2 = 1.0
2 1.4327·10-1 2.3918·10-1 5.2824·10-1 1.0380·100
3 6.3405·10-2 2.3061·10-1 5.5808·10-1 1.1390·100
5 5.9154·10-2 2.2101·10-1 5.3274·10-1 1.0596·100
7 5.9876·10-2 2.1932·10-1 5.2832·10-1 1.0393·100
Table 5: Median of Determinates of Full Information Matrix for Different Noise Levels (Compressor)
Result
after step
Median det(Iav)
σ 2 = 0 σ
2 = 0.2 σ 2 = 0.5 σ
2 = 1.0
2 1.2957·1017 1.2264·1014 5.9883·1016 1.9512·1023
3 2.8777·1012 3.8980·1015 2.7135·1025 5.4594·1032
5 9.2080·1016 5.2694·1017 2.2865·1030 3.8234·1031
7 5.7240·1019 2.9885·1020 7.2423·1033 1.3206·1035
(a) (b) (c) (d) (e)
Fig. 7. Dependency on used LHS data set and FCM initialization illustrated by RMSE on test data set: a) for
20 random initializations of FCM and GLS (step 1+2) for each of the 20 LHS data sets, b) for best model after
step 1+2 for each LHS data set, c) if best model of steps 1+2 is optimized (step 3), and after the first (d) and
the second OED cycle (e) (···: median) (Compressor)
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 23
In Fig. 8 the results for an exemplary design run with noise-free data are shown. The FCM does not pro-
vide for a good partitioning and correspondingly bad are the local models estimated by the GLS. After the
optimization in step 3 the partitioning well approximate the shape of the map and so do the local models.
For simplicity, the local models are shown within the bounds of the corresponding Voronoi decomposi-
tion. The resulting model already well matches the reference map in Fig. 5. The OED cycles steer the de-
sign points to the edges of the partitions or the crossing points of partition boundaries and connection line
between neighboring prototypes, as expected. As in the first case study, the approximation error on the
1st stage: Results of FCM and GLS in top row and of optimization (step 3) in row below
Jgrid|GLS = 0.123, Jgrid|opt = 0.055, det(I )|GLS = 6.58·1016, det(I )|opt = 2.93·1016
LHS data Voronoi dec. after FCM MF from FCM Local models from GLS TS model
Voronoi dec. after opt. MF after opt. Local models after opt. TS model after opt.
1st OED cycle: Jgrid = 0.047, det(I )|opt = 4.57·1023
Optimal design points Voronoi decomposition MF Local models TS model
2nd OED cycle: Jgrid = 0.046, det(I )|opt = 6.89·1026
Optimal design points Voronoi decomposition MF Local models TS model
Fig. 8. Example design process (compressor)
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 24
test data set reduces insignificantly, but the parametric uncertainty reduces significantly. Again, from a
cost point of view, a single OED cycle would have been sufficient.
6.5 Friedman test function 1
The Friedman test function 1 [22,23]
542
321 510)5.0(20) sin(10)( xxxxxf ++−+= πx (28)
with ]1;0[∈ix (i.e., the design space is the unit (hyper-)cube) is a benchmark problem for nonlinear regres-
sion. With 5 variables, it is a higher dimensional problem where grid type designs are not applicable due
to the exponential increase of the number of designs points with the design space dimension. The system
is not within the model class. The cases of noise-free (d(k) = 0) and noisy observations with i.i.d. normal
distributed random variables d(k) with zero mean and variance σ2 = 4,876642 as in [7] will be considered.
In the original problem description N = 200 points were used that are uniformly distributed in [0;1]5. The
sample size, however, varies from work to work (e.g. from 102 to 106). In this case study, N1 = 250 design
points will be used in the following. Then N2 ∈ {50, 100, 150, 200, 250} design points are considered. A
test data set is produced using a grid of 215 ≈ 4·106 points as compromise between grid granularity and
resulting number of function evaluations. Different values of c ∈{3, 4, 5, 6, 7, 8, 9, 10} and ν ∈{1.05,
1.1, 1.2, 1.5, 1.8} were examined. One suitable choice is c = 6 and ν = 1.2. It will be used in the following
for the sake of compactness.
Selected results from 20 repeated experiments regarding the impact of the noise level are recorded in Ta-
ble 6 and 7. N1 = 250 design points were placed by LHS in step 1. N2 = 100 design points were used for
the OED cycles and µ0.5-initialization was applied. Table 6 and 7 show that the prediction error succes-
sively reduces along the algorithm steps. The OED cycles insignificantly reduce the approximation error,
but significantly reduce the uncertainty. Note that I bases on 250 observations after step 2 and 3, respec-
tively, on 350 after step 5 and on 450 after step 7. If compared with the compressor case study, an in-
creasing noise-level just moderately increases the prediction error. Another observation is that the 2-input
compressor system with the cliff-type response surface needs the same 250 design points in the 1st stage
as the 5-input Friedman test function to work properly.
Table 8 to 9 present results for different initialization strategies in step 4 (noise-free observations). For
higher dimensional problems, a grid-type initialization is little suitable: Just 3 points per dimension result
already in N2 = 243 design points (but 2 points per dimension only for N2 = 32). In this study, however,
N2 = 100 design points are sufficient (but N2 = 50 are too few). Due to the exponential growth with the
number of variables, grid initialization does not scale well. The results for random initialization and for
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 25
µ0.5-initialization using the same number of design points N2 are similar regarding the approximation er-
ror. However, regarding uncertainty reduction, the µ0.5-initialization provides for significantly better re-
sults after the first OED cycle. This is advantageous if due to time or cost reasons only a single OED run
is carried out.
Table 6: Median of RMSE on Test Data for Different Noise Levels and µ0.5-Initialization (Friedman)
Result
after step
Median Jgrid
σ 2 = 0 σ
2 = 0.2 σ 2 = 0.5 σ
2 = 1.0
2 2.1336·100 2.1570·100 2.1999·100 2.3932·100
3 7.3723·10-1 7.9404·10-1 1.0652·100 1.4890·100
5 6.8497·10-1 6.8290·10-1 8.7728·10-1 1.3345·100
7 6.6146·10-1 7.0122·10-1 8.5989·10-1 1.2849·100
Table 7: Median of Determinates of Full Information Matrix for Different Noise Levels and µ0.5-Initialization
(Friedman)
Result
after step
Median det(Iav)
σ 2 = 0 σ
2 = 0.2 σ 2 = 0.5 σ
2 = 1.0
2 2.0843·1091 7.9850·1090 5.3806·1089 6.4159·1089
3 7.3674·1075 3.0443·1073 2.6882·1072 1.0120·1081
5 7.7835·1093 3.1339·1092 2.1024·1093 1.3877·1093
7 8.0735·10106 8.7316·10109 6.2212·1096 2.8957·10102
Table 8: Median of RMSE on Test Data for Different Sample Sizes N2 for Random, Grid and µ0.5-Initialization
(Friedman)
Result
after
step
Median Jgrid
Random Grid µ0.5
N2 = 100 N2 = 150 N2 = 200 N2 = 250 N2 = 243 N2 = 50 N2
= 100 N2 = 150 N2
= 200 N2 = 250
2 2.1073 2.1311 2.1311 2.1336 2.1336 2.1217 2.1217 2.1217 2.1217 2.1217
3 0.7083 0.7555 0.7274 2.1336 0.7083 1.1379 1.1379 1.1379 1.1379 1.1379
5 0.7087 0.7555 0.7232 0.7526 0.7511 1.0602 0.78052 0.7630 0.64826 1.1523
7 0.6679 0.6862 0.7233 0.7114 0.7171 1.0226 0.6074 0.7159 0.5975 0.6910
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 26
The approximation quality is visualized for an example run in Fig. 10. The predictions are plotted over
the true values after step 3, 5 and 7, for noisy observations with σ2 = 0.5. By the OED in the 2nd stage the
performance on the test data set could be significantly improved. It is remarked that OED design points
are often placed in “difficult” regions which is good to obtain a reliable model but dilutes the average
Table 9: Median of Determinates of Full Information Matrix for Different Sample Sizes N2 for Random, Grid
and µ0.5-Initialization Initialization (Friedman)
Result
after
step
Median det(Iav)
Random Grid µ0.5
N2 = 100 N2 = 150 N2 = 200 N2 = 250 N2 = 243 N2 = 50 N2
= 100 N2 = 150 N2
= 200 N2 = 250
4+5 4.1387·1091 5.0697·1098 5.9350·1099 4.0994·10109 5.7916·10109 1.2·1081 1.5·10105 5.5·10117 4.0·10126 2.3·10131
6+7 1.9232·10102 3.8110·10115 4.4677·10122 1.1151·10128 4.2933·10127 6.8·1086 1.7·1094 6.0·10126 5.2·10121 1.1·10154
Fig. 10. Predictions vs. true value for initial model after step 3 (top), step 5 (middle), and step 7 (bottom) for
identification data (left) and test data (right) in case of noise data (σ2 = 0.5) (Friedman).
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 27
numerical performance of a model. Note that due to the large number (4·106) of test points the point cloud
appears as a continuum.
The presented results are better than results found in the literature: Meyer et al. [47] generated 100 train-
ing data sets each with 200 samples and 100 test data sets each with 1000 samples. They compared the
mean of the mean squared test data set errors for 10 different regression methods. Their MSEs range be-
tween 3.22 and 11.87 i.e. the RMSEs range between 1.79 and 3.45. Binev et al. [7] compared different
approximators. They generated uniformly distributed data sets: a test data set with 106 points and different
training data sets with 10j ,j ∈{3, 4, 5, 6}, points, respectively. In case of using 103 points, they obtained
RMSEs ranging between 1.38 and 6 on the test data set.
7. CONCLUSIONS AND OUTLOOK
This article addressed optimal experiment design for identifying both local model as well as partition pa-
rameters of Takagi-Sugeno models for nonlinear regression problems. The optimal experiment design re-
quires making assumptions on the unknown parameter values to calculate the parameter sensitivities,
where already small deviation of the partition parameters may cause identification failure. To handle this
problem, a model-free design phase (in this work optimized Latin Hypercube Sampling) is used to make
structural decisions and to estimate an initial model of sufficient quality for the subsequent optimal mod-
el-based experiment design. To cope with the generally remaining parametric error, a robust sequential
optimal design procedure is applied.
The case studies showed that it is essential to obtain a model of sufficient quality particularly regarding
the partitioning in the model-free design phase to initialize the optimal design stage. For this the numeri-
cal optimization of the model parameters resulting from Fuzzy-c-Means clustering and ordinary least
squares was essential. Then, in the case studies just one or two optimal design cycles were sufficient to
significantly reduce the parametric uncertainty. Therefore, for problems with little a-priori knowledge on
the nonlinearities, the authors recommend to use a larger number of design points in the first design stage.
For the optimal design stage, a smaller number of design points was sufficient; for practical purposes with
limited resources a single OED cycle may often be sufficient. Exploiting the TS model structure, the de-
sign points can be initialized in selected locations on the partition/design-space boundaries to relieve the
OED problem.
Different research opportunities remain for the future: At first, the presented methodology can be im-
proved: Other than the standard widespread alphabet criteria can be investigated to better assess the suita-
bility of a design, as already proposed in [21] in context of physical models. This includes multi-criteria
assessment/optimization approaches. The FIMs were often observed to be approximately ill-conditioned;
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 28
also due to the different groups of local model vs. partition parameters. Application of scaling and sepa-
rate designs for subgroups can be considered here. Certainly, more efficient optimization methods than
the approximated gradient descent can be deployed. A different area of future research is OED to identify
TS models for nonlinear dynamic systems, which requires substantial changes due to the additional tem-
poral constraints and the issue of noise modelling. Finally, experiments with laboratory test stands would
provide for more realistic observations regarding disturbances.
ACKNOWLEDGEMENTS
The authors thank the reviewers for their constructive comments that helped to significantly improve the
paper.
REFERENCES
[1] M. Alizadeh, M. Gharakhani, E. Fotoohi, R. Rada, Design and analysis of experiments in ANFIS modeling
for stock price prediction, Int. Journal of Industrial Engineering Computations 2 (2011) 409-418.
[2] C. G. Atkeson, A.W. Moore, S. Schaal, S, Locally weighted learning, Artificial Intelligence Review 11
(1997) 11-17.
[3] M. Ayeb, H. Theuerkauf, C. Wilhelm, T. Winsel, Robust identification of nonlinear dynamic systems using
design of experiments, Proc. IEEE Conf. on Computer Aided Control Systems Design (2006) 2321-2326.
[4] R. Babuska, Fuzzy modelling for control, Kluwer Academic Publishers, Norwell, MA, USA (1998).
[5] J. Belz, K. Bamberger, O. Nelles, Order of experimentation for metamodeling tasks, Proc. Int. Joint Confer-
ence on Neural Networks (IJCNN) (2016) 4843-4849.
[6] J.C. Bezdek, Pattern recognition with fuzzy clustering algorithms, Plenum Press, New York (1981).
[7] P. Binev, W. Dahmen, P. Lamby, Fast high-dimensional approximation with sparse occupancy trees, Jour-
nal of Computational and Applied Mathematics 235 (2011) 2063-2076.
[8] G. Bontempi, M. Biratti, H. Bersini, Lazy learning for local modeling and control design, Int. Journal of
Control 72 (1999) 643-658.
[9] M. Buragohain, C. Mahanta, A novel approach for ANFIS modelling based on full factorial design, Appl.
Soft Computing 8 (2008) 609-625.
[10] D.A. Cohn, Neural network exploration using optimal experiment design, Neural Networks 9 (1996) 1071-
1083.
[11] M. Deflorian, S. Zaglauer, Design of experiments for nonlinear dynamic system identification, Proc. 18th
IFAC World Congress (2011) 13179-13184.
[12] M. Deregnaucourt, M. Stadlbauer, C. Hametner, S. Jakubek, H.M. Koegler, Evolving model architecture
for custom output range exploration, Math. and Comp. Mod. of Dyn. Systems 21 (2015) 1-22.
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 29
[13] S.K. Doherty, J.B. Gomm, D. Williams, Experiment design considerations for non-linear system identifica-
tion using neural networks, Computers in Chemical Engineering 21 (1997) 327-346.
[14] A. Dürrbaum, A. Kroll, On robust experiment design for identifying locally affine Takagi-Sugeno models,
Proc. IEEE Int. Conf. on Systems, Man, and Cybernetics (SMC) (2016).
[15] T. Ebert, T. Fischer, J. Belz, T.O. Heinz, G. Kampmann, O. Nelles, Extended deterministic local search al-
gorithm for maximin Latin Hypercube Designs, Proc. IEEE Symposium Series on Computational Intelli-
gence (2015) 375-382.
[16] Y. Farzaneh, A. Tootoonchi, A novel data reduction method for Takagi-Sugeno fuzzy system design based
on statistical design of experiment, Appl. Soft Computing 9 (2009) 1367-1376.
[17] V.V. Federow, S.L. Leonov, Optimal design for nonlinear response models, CRC Press, Boca Raton
(2014).
[18] J.T. Flynn, M.J. Keenan, D.H. Sulam, Single-stage evaluation of highly-loaded high Mach-number com-
pressor stages, NASA Technical Report (1970).
[19] E. Forte. E. von Harbou, J. Burger, N. Asprion, M. Bortz, Optimal design of laboratory and pilot-plant ex-
periments using multiobjective optimization, Chem. Ing. Tech. 89 (2017) 645–654.
[20] G. Franceschini, S. Macchietto, Validation of a model for biodiesel production through model-based exper-
iment design, Ind. and Eng. Chem. Res. 46 (2007) 220-232.
[21] G. Franceschini, S. Macchietto, Model-based design of experiments for parameter precision: state of the
art, Chem. Eng. Science 63 (2008) 4846-4872.
[22] J.H. Friedman, Multivariate adaptive regression splines, The Annals of Statistics 19 (1991) 1-67.
[23] J.H. Friedman, E. Grosse, W. Stuetzle, Multidimensional adaptive spline regression, SIAM J. on Scientific
and Statistical Comp. 4 (1983) 291-301.
[24] C. Friedrich, M. Auer, G. Stiech, Model based calibration techniques for medium speed engine optimiza-
tion: investigations on common modeling approaches for modeling of selected steady state outputs, SAE
Int. J. Engines 9 (2016).
[25] I. Gath, A.B. Geva, Unsupervised optimal clustering, IEEE Trans. on Pattern Analysis and Machine Intel-
ligence 11 (1989) 773-778.
[26] G.C. Goodwin, C.R. Rojas, J.S. Welsh, Good, bad and optimal experiments for identification, in T. Glad,
G. Hendeby (Eds.), Forever Ljung in System Identification - Workshop on the occasion of Lennart Ljung’s
60th birthday, Studentlitteratur, Lund, Sweden (2006).
[27] M. Gringard, A. Kroll, On the systematic analysis of the impact of the parametrization of standard test sig-
nals, Proc. IEEE Symposium Series Computational Intelligence (2016).
[28] C. Hametner, M. Stadlbauer, M. Deregnaucourt, S. Jakubek, T. Winsel, Optimal experiment design based
on local model networks and multilayer perceptron networks, Eng. Appl. Art. Int. 26 (2013) 251-261.
[29] B. Hartmann, T. Ebert, O. Nelles, Model-based design of experiments based on local model networks for
nonlinear processes with low noise levels, Proc. ACC (2011) 5306-5311.
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 30
[30] T.O. Heinz, O. Nelles, Comparison of excitation signals in nonlinear identification problems (in German),
Proc. 26. Workshop Computational Intelligence (2016) 139-158.
[31] P. Hering, M. Simandl, Sequential optimal experiment design for neural networks using multiple lineariza-
tion, Neurocomputing 73 (2010) 3284-3290.
[32] I. Hertel, M. Kohler, Estimation of the optimal design of nonlinear parametric regression problem via Mon-
te Carlo experiments, Computational Statistics and Data Analysis 59 (2013) 1-12.
[33] M. Hirsch, L. del Re, Adapted D-optimal experimental design for transient emission models of diesel en-
gines, SAE Technical Paper 2009-01-0621 (2009) 115-122.
[34] F. Höppner, R. Kruse, F. Klawonn, T. Runkler, Fuzzy Cluster Analysis, Wiley, New York, 1999.
[35] B.G.M. Husslage, G. Rennen, E.R. van Dam, D. den Hertog, Space-filling Latin hypercube designs for
computer experiments, Optimization and Engineering 12 (2011) 611–630.
[36] S. M. Kay, Fundamentals of statistical signal processing, vol. 1, 20th ed., Prentice Hall, Upper Saddle Riv-
er (2013).
[37] M.R. Kianifar, F. Campean, A. Wood, Application of permutation genetic algorithm for sequential model
building – model validation design of experiments, Soft Computing 20 (2016) 3023-3044.
[38] C.P. Kitsos, Optimal experimental design for nonlinear models, Springer, Berlin (2013).
[39] A. Kroll, Fuzzy systems for modeling and control of complex technical systems (in German), PhD disserta-
tion, Fortschrittberichte VDI, Reihe 8, No. 612 (1997).
[40] A. Kroll, On choosing the fuzziness parameter for identifying TS models with multidimensional member-
ship functions, J. of Artificial Intelligence and Soft Comp. Res. 1 (2011) 283–300.
[41] A. Kroll, A. Dürrbaum, On joint experiment design for identifying partition and local model parameters of
Takagi-Sugeno models, Proc. 17th SysID (2015) 1427-1432.
[42] A. Kroll, S. Soldan, On Data-driven Takagi-Sugeno Modeling of Heterogeneous Systems with Multidi-
mensional Membership Functions, Proc. 18th IFAC World Congress (2011) 14994-14999.
[43] J. Kurzke, Smooth C 8.2: Preparing Compressor Maps for Gas Turbine Performance Modeling, User’s
Manual (2009).
[44] L. Ljung, System identification: Theory for the user, 2nd ed., Prentice Hall, Upper Saddle River, 1999.
[45] J. L. Loepky, J. Sacks, W.J. Welch, Choosing the sample size of a computer experiment: a practical guide,
Technometrics 51 (2009) 366-376.
[46] Matlab, Statistics Toolbox™: User’s guide, R2016, The MathWorks (2016).
[47] D. Meyer, F. Leisch, K. Hornik, The support vector machine under test, Neurocomputing 55 (2003) 169-
186.
[48] J. Möller, R. Pörtner, Model-Based Design of Process Strategies for Cell Culture Bioprocesses: State of the
Art and New Perspectives, New Insights into Cell Culture Technology, S. J. Thatha Gowder (Ed.), InTech
(2017).
[49] O. Nelles, Nonlinear system identification, Springer, London (2001).
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 31
[50] H. Petersen, Selection of statistical experimental designs (in German), vol. 3.2, Plan 30.0, ecomed, Land-
berg/Lech (1992).
[51] F.J. Pontes, G.F. Amorin, P.P. Balestrassi, A.P. Paiva, J.R. Ferreira, Design of experiments and focused
grid search for neural network parameter optimization, Neurocomputing 186 (2016) 22-34.
[52] L. Pronzato, Optimal experiment design and some related control problems, Automatica 44 (2008) 303-
325.
[53] K. Röpke, C. Gühmann (eds.), Proc. 1st – 8th Conf. on Design of Experiments (DoE) in powertrain devel-
opment, Expert Verlag, Renningen (2003-2015).
[54] I. Skrjanc, Evolving fuzzy-model-based design of experiments with supervised hierarchical clustering,
IEEE Transactions on Fuzzy Systems 23 (2015) 861-871.
[55] H. Suzuki, M. Yamakita, Input design for hybrid system identification for accurate estimation of submodel
regions, Proc. ACC (2011) 1236-1241.
[56] T. Takagi, M. Sugeno, Fuzzy identification of systems and its application to modelling and control, IEEE
Trans. Systems, Man, and Cybernetics 15 (1985) 116–132.
[57] D. M. Titterington, Optimal design in flexible models, including feed-forward networks and nonparametric
regression, Chap. 23 in: A. Atkinson et al. (Eds.), Optimum Design, Kluwer Academic Publishers, Norwell,
MA, USA (2001) 261-273.
[58] A. Varsha, A. Rainer, P. Santiago, R. Umale, Global COR iDOE methodology: an efficient way to cali-
brate medium and heavy commercial vehicle engine emission and fuel calibration, SAE Technical Paper
2017-26-0032 (2017).
[59] E. Walter, L. Prozato, Identification of parametric models from experimental data, Springer, London
(1997).
[60] H.O. Wang, K. Tanaka, M.F. Griffin: Parallel distributed compensation of nonlinear systems by Takagi-
Sugeno fuzzy model, Proc. Fuzzy Systems (1995) 531–538.
[61] C. Werner, G. Preiß, F. Gores, M. Griebenow, S. Heitmann, A comparison of low-pressure and super-
charged operation of polymer electrolyte membrane fuel cell systems for aircraft applications, Progress in
Aerospace Sciences 85 (2016) 51–64.
[62] X.L. Xie, G.A. Beni, Validity measure for clustering, IEEE Trans. PAMI 3 (1991) 841-846.
[63] Ö. Yeniay, Comparative study of algorithms for response surface optimization, Mathematical and Compu-
tational Applications 19 (2014) 93-104.
[64] S. Zaglauer, Bayesian design of experiments for nonlinear dynamic system identification, Proc. Simutools
’12 (2012) 85-92.
[65] C. Zanchettin, L.L. Minku, T.B. Ludermir, Design of experiments in neuro-fuzzy systems, Int. Journal of
Computational Intelligence and Applications 9 (2010) 137-152.
[66] R. Zimmerschied, R. Isermann, Regularization techniques for identification using local-affine models (in
German), at-Automatisierungstechnik 56 (2008) 339-349.
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 32
7. APPENDIX
7.1 TS system
Table 10: Median of CPU Sec used for each Step for Different Noise Levels
For step Median CPU seconds
σ2 = 0 σ
2 = 0.2 σ2 = 0.5 σ
2 = 1.0
2-FCM 1,22·10-1 1,27·10-1 1,26·10-1 2,17·10-1
2-GLS 5,98·10-3 5,91·10-3 5,89·10-3 6,51·10-3
3 2,10·100 8,08·10-1 8,25·10-1 1,87·100
4 1,56·101 1,63·101 1,62·101 2,88·101
5 6,05·10-1 9,17·10-1 1,46·100 2,08·100
6 2,25·100 3,20·100 4,13·100 1,19·101
7 3,60·10-1 1,08·100 1,75·100 2,73·100
Total run 2,23·101 2,47·101 2,59·101 4,79·101
Table 11: Average Prediction Errors after Different Design and Identification Steps for TS-System
Result
after
step
Jgrid
1st / 2nd / 3rd
Quartiles
Jgrid
Mean
JRMSE
1st / 2nd / 3rd
Quartiles
JRMSE
Mean
2 8.7812·101 /
1.0504·100 /
1.1846·100
1.0352·100 7.0953·10-1 /
8.7548·10-1 /
1.0408·100
8.6131·10-1
3 3.6973·10-7 /
7.1462·10-7 /
1.6345·10-6
1.8018·10-6 2.0808·10-7 /
2.7578·10-7 /
3.5221·10-7
3.2768·10 -7
5 1.4077·10-7 /
2.6699·10-7 /
3.8541·10-7
3.4557·10-7 1.8203·10-7 /
3.5829·10-7 /
6.4529·10-7
5.6339·10-7
7 1.4290·10-7 /
2.1609·10-7 /
3.1243·10-7
2.9563·10-7 2.0237·10-7 /
3.2884·10-7 /
5.7163·10-7
5.4470·10 -7
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 33
7.2 Axial compressor characteristic map
Table 12: Average Uncertainty after Different Design and Identification Steps for for TS-System
Result
after
step
detI (a)
1st / 2nd / 3rd
Quartiles
detI (v)
1st / 2nd / 3rd
Quartiles
detI (av)
1st / 2nd / 3rd
Quartiles
2 2.4132·104 /
2.8664·104 /
3.4696·104
6.9720·1012 /
3.9634·1013 /
3.5590·1015
4.2021·1013 /
6.4552·1015 /
1.3787·1018
3 2.5384·104 /
3.0746·104 /
3.6591·104
3.0613·1012 /
1.2738·1013 /
2.8361·1014
7.2184·1015 /
2.2462·1016 /
7.4214·1017
5 2.0811·108 /
2.2192·108 /
8.1389·1026
3.9808·1019 /
4.6136·1019 /
5.5218·1019
5.5739·1026 /
6.6454·1026 /
8.1389·1026
7 1.8156·1013 /
1.8671·1013 /
4.0044·1035
4.7782·1023 /
5.0811·1023 /
5.4317·1023
3.1675·1035 /
3.4773·1035 /
4.0044·1035
Table 13: Median of CPU Sec used for each Step for Different Noise Levels
For step Median CPU seconds
σ2 = 0 σ
2 = 0.2 σ 2 = 0.5 σ
2 = 1.0
2-FCM 1 2.18 1.04 2.10
2-GLS 0 0 0 0
3 9.45 25.20 17.72 38.28
4 4.69 10.72 7.16 11.34
5 8.56 16.83 13.01 26.67
6 4.14 11.78 7.60 11.97
7 8.93 7.27 15.94 25.62
Total run 36.77 73.99 62.47 115.97
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 34
Table 14: Average Prediction Errors after Different Design and Identification Steps for TS-System
Result
after
step
Jgrid
1st / 2nd / 3rd
Quartiles
Jgrid
Mean
JRMSE
1st / 2nd / 3rd
Quartiles
JRMSE
Mean
JRMSE
Mean
2 1.3442·10-1 /
1.4327·10-1 /
1.4890·10-1
1.4161·10-1 1.1686·10-1 /
1.2774·10-1 /
1.3976·10-1
1.2901·10-1 1.2901·10-1
3 6.1401·10-2 /
6.3405·10-2 /
6.8075·10-2
6.5256·10-2 3.6716·10-2 /
4.1524·10-2 /
4.7041·10-2
4.0593·10-2 4.0593·10-2
5 5.8590·10-2 /
5.9154·10-2 /
6.1839·10-2
6.0463·10-2 3.9939·10-2 /
4.2413·10-2 /
4.7574·10-2
4.2892·10-2 4.2892·10-2
7 5.8491·10-2 /
5.9876·10-2 /
6.1377·10-2
5.9331·10-2 4.1096·10-2 /
4.5062·10-2 /
4.9498·10-2
4.4760·10-2 4.4760·10-2
Table 15: Average Uncertainty after Different Design and Identification Steps for for TS-System
Result
after step
detI (a)
1st / 2nd / 3rd
Quartiles
detI (v)
1st / 2nd / 3rd
Quartiles
detI (av)
1st / 2nd / 3rd
Quartiles
2 1.3002·104 /
2.1557·104 /
5.8797·104
2.0971·1014 /
3.1150·1015 /
7.9937·1016
1.8187·1016 /
1.2957·1017 /
9.5446·1017
3 3.9711·101 /
3.3280·102 /
4.5881·103
1.0416·1011 /
3.4340·1012 /
3.2037·1016
1.7999·109 /
2.8777·1012 /
1.5042·1016
5 7.9640·102 /
4.1709·104 /
2.0184·1019
1.1300·1013 /
1.1257·1015 /
1.2563·1020
1.3052·1011 /
9.2080·1016 /
2.0184·1019
7 1.9630·104 /
3.7726·105 /
2.5795·1023
1.9118·1016 /
4.0279·1017 /
5.2894·1021
-1.8831·10-4 /
1.6445·1016 /
2.5795·1023
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
Page 35
7.3 Friedman test function 1
Table 16: Median of CPU Sec used for each Step for Different Noise Levels
For step Median CPU seconds
σ 2 = 0 σ
2 = 0.2 σ 2 = 0.5 σ
2 = 1.0
2-FCM 1.89·100 1.74·100 1.21·100 1.14·100
2-GLS 4.17·10-03 4.08·10-3 3.19·10-3 4.10·10-3
3 1.40·102 1.20·102 1.21·102 1.12·102
4 3.48·102 2.38·102 2.48·102 2.50·102
5 1.26·102 9.45·101 8.63·101 1.22·102
6 3.62·102 2.57·102 2.03·102 1.97·102
7 9.96·101 1.03·102 7.33·101 1.12·102
Total run 1.10·103 8.56·102 6.98·102 7.54·102
Table 17: Average Prediction Errors after Different Design and Identification Steps for TS-System
Result
after step
Jgrid
1st / 2nd / 3rd
Quartiles
Jgrid
Mean
JRMSE
1st / 2nd / 3rd
Quartiles
JRMSE
Mean
2 2.1073·100 /
2.1311·100 /
2.1773·100
2.1455·100 1.5015·100 /
1.6224·100 /
1.7425·100
1.6270·100
3 2.1073·100 /
7.0834·10-1 /
8.6846·10-1
7.6682·10-1 3.1722·10-1 /
3.6678·10-1 /
3.9201·10-1
3.6281·10-1
5 6.3567·10-1 /
7.0872·10-1 /
7.6114·10-1
7.01016·10-1 4 .6259·10-1 /
5.1444·10-1 /
5.5471·10-1
5.1644·10-1
7 6.2547·10-1 /
6.6787·10-1 /
7.5225·10-1
6.8146·10-1 5.0196·10-1 /
5.4895·10-1 /
5.6643·10-1
5.4558·10-1
© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/