Top Banner
On optimal experiment design for identifying premise and conclu- sion parameters of Takagi-Sugeno models: nonlinear regression case A. Kroll*. A. Dürrbaum** * Department of Measurement and Control, University of Kassel, Kassel, Germany (Tel: +49 561 804-3248; e-mail: [email protected]). ** Department of Measurement and Control, University of Kassel, Kassel, Germany (Tel: +49 561 804-3261; e-mail: [email protected]). Abstract: Optimal Experiment Design (OED) is a well-developed concept for regression prob- lems that are linear-in-the-parameters. In case of experiment design to identify nonlinear Tak- agi-Sugeno (TS) models, non-model-based approaches or OED restricted to the local model pa- rameters (assuming the partitioning to be given) have been proposed. In this article, a Fisher In- formation Matrix (FIM) based OED method is proposed that considers local model and parti- tion parameters. Due to the nonlinear model, the FIM depends on the model parameters that are subject of the subsequent identification. To resolve this paradoxical situation, at first a model- free space filling design (such as Latin Hypercube Sampling) is carried out. The collected data permits making design decisions such as determining the number of local models and identify- ing the parameters of an initial TS model. This initial TS model permits a FIM-based OED, such that data is collected which is optimal for a TS model. The estimates of this first stage will in general not be ideal. To become robust against parameter mismatch, a sequential optimal de- sign is applied. In this work the focus is on D-optimal designs. The proposed method is demon- strated for three nonlinear regression problems: an industrial axial compressor and two test functions. Keywords: Takagi-Sugeno fuzzy models; optimal experiment design; design of experiments, nonlinear regression; nonlinear system identification. 1. INTRODUCTION Takagi-Sugeno (TS) models [56] are composed of a weighted superposition of local models. They are often applied for non-linear regression, prediction/simulation and model-based control design. Often, lo- cally affine models are used, which have the advantage that linear systems analysis and design methods © 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
35

On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

Oct 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

On optimal experiment design for identifying premise and conclu-

sion parameters of Takagi-Sugeno models: nonlinear regression

case

A. Kroll*. A. Dürrbaum**

* Department of Measurement and Control, University of Kassel, Kassel, Germany

(Tel: +49 561 804-3248; e-mail: [email protected]).

** Department of Measurement and Control, University of Kassel, Kassel, Germany

(Tel: +49 561 804-3261; e-mail: [email protected]).

Abstract: Optimal Experiment Design (OED) is a well-developed concept for regression prob-

lems that are linear-in-the-parameters. In case of experiment design to identify nonlinear Tak-

agi-Sugeno (TS) models, non-model-based approaches or OED restricted to the local model pa-

rameters (assuming the partitioning to be given) have been proposed. In this article, a Fisher In-

formation Matrix (FIM) based OED method is proposed that considers local model and parti-

tion parameters. Due to the nonlinear model, the FIM depends on the model parameters that are

subject of the subsequent identification. To resolve this paradoxical situation, at first a model-

free space filling design (such as Latin Hypercube Sampling) is carried out. The collected data

permits making design decisions such as determining the number of local models and identify-

ing the parameters of an initial TS model. This initial TS model permits a FIM-based OED,

such that data is collected which is optimal for a TS model. The estimates of this first stage will

in general not be ideal. To become robust against parameter mismatch, a sequential optimal de-

sign is applied. In this work the focus is on D-optimal designs. The proposed method is demon-

strated for three nonlinear regression problems: an industrial axial compressor and two test

functions.

Keywords: Takagi-Sugeno fuzzy models; optimal experiment design; design of experiments,

nonlinear regression; nonlinear system identification.

1. INTRODUCTION

Takagi-Sugeno (TS) models [56] are composed of a weighted superposition of local models. They are

often applied for non-linear regression, prediction/simulation and model-based control design. Often, lo-

cally affine models are used, which have the advantage that linear systems analysis and design methods

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 2: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

can be applied (after enhancement). An example is model-based nonlinear control design as parallel-

distributed-compensators [60]. The identification of TS models includes several tasks:

- Data collection

- Data pre-processing

- Input/regressor selection

- Dynamic order and dead time selection (in case of dynamical models)

- Determination of the number of local models

- Locating and shaping of local model boundaries

- Estimation of local model parameters

- Model validation

In data-driven modelling two major cases can be distinguished: The first option is that informative data

has to be selected from operational data records e.g. stored in historical databases. The second option is

that experiments can be carried out purposefully in order to generate informative data. This article con-

siders the latter. It addresses the problem of Fisher Information Matrix (FIM) based optimal experiment

designs (OED) for identifying both local model and partitioning related parameters of TS models. As the

design problem is nonlinear in the partition parameters, the partial derivatives that form the FIM depend

on the model parameters, which are unknown at the beginning of the experiments. Robust design methods

have been proposed to answer this problem including Bayesian, sequential and minimax designs [26,59].

A Bayesian design requires reliable a-priori information on the statistical description of the uncertainty. A

minimax design adopts the view that sometimes the best design under the worst conditions may be most

useful. It requires defining value ranges for the parameters and is computationally demanding. A sequen-

tial design uses all data ‘presently’ available to estimate the model parameters, which are in turn used for

designing the continuation of the experiment. The latter will be adopted in this paper as it requires little a

priori knowledge on the system and has moderate computational demands.

As the membership functions are strongly nonlinear, already a moderate deviation from the true parame-

ter values may prevent even robust design methods to fail. Moreover, the model structure has to be known

for FIM-based experiment design, which will in general not be the case. For this reason, a multi-stage

method is proposed: A (not optimal) space filling design is used in a first stage to obtain data suitable for

structural decision-making and to identify an initial model of sufficient quality. In the second stage, a

FIM-based optimal sequential design jointly for premise and conclusion parameters is carried out. It will

be shown how knowledge about the properties of the sensitivity functions of TS models can be exploited

to initialize the design points in highly informative areas such that the OED optimization problem be-

comes easier to solve. The contribution focusses on static modelling problems. The method is demon-

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 3: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

strated in three case studies: In the first, the true system is a ‘synthetic’ TS system such the system is in

the model class. The second targets approximating an industrial axial compressor map. The third is the

Friedman test function 1, which depends on five variables and therefore represents a higher dimensional

design problem.

Optimal experiment design to determine nonlinear static models is of significant practical interest as the

following recent examples show: Stricter emission legislation and demand for higher fuel efficiency in

tandem with more complex engines result in an increasing number of calibration parameters of the char-

acteristic maps of Diesel and Otto engines. Due to limited capacity and operational cost of test stands, in-

dustrial practitioners develop and apply Design of Experiments (DOE) techniques for experiment plan-

ning, see e.g. [24,37,58] for some recent work or the proceedings of the international conference series

“DOE in powertrain development” [53]. A traditional application area of DOE is bio-pharmacy, see e.g.

[17,48]. In [32] DOE is used in to efficiently estimate the strain life and the stress-strain curve for fatigue

analysis using nonlinear regression. DOE is used in [61] to analyse operating points of fuel cell with re-

spect to performance, efficiency and parametric sensitivity using regression models. Similarly, in [19]

DOE is used to fit models to identify optimal designs and operating points of chemical processes. For a

recent cross-domain compilation of the related area of (nonlinear) response surface methods the reader is

referred to [63].

In section 2, the state of the art is reviewed. Section 3 provides the problem statement. Section 4 decribes

the used identification method. Section 5 addresses the proposed experiment design method. Section 6

contains the case studies. Conclusions and outlook complete the paper.

2. OVERVIEW OF RELATED WORK

Methods for Design of Experiments (DOE) can be categorized in different ways, e.g. model-free or mod-

el-based, for static or dynamic modelling, for on-line or off-line application, for parameter estimation or

model discrimination, see [54] for a compact overview. In the following the focus is on work addressing

Takagi-Sugeno fuzzy models while selected works on neuro-fuzzy and neural network systems is also

included.

Block and factorial designs belong to the model-free design methods. Buragohain and Mahanta [9] apply

a full factorial design to select the training data for a dynamic ANFIS model from recorded data. Far-

zaneh and Tootoonchi [16] use a three-level factorial design to minimize the required amount of data for

designing TS systems. The method is demonstrated for a static nonlinear regression problem with two

inputs. Zanchettin et al. [65] used a two-level full factorial design followed by variance analysis (ANO-

VA) to examine the significance of design decisions regarding ANFIS and evolving fuzzy neural net-

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 4: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

works such as number or type of membership function. The method is tested using Mackey-Glass time

series and Box-Jenkins gas furnace data. Alizadeh et al. [1] extend this approach by using a three-level

factorial design and by considering a larger number of factors. Pontes et al. [51] use evolutionary opera-

tions (EVOP) to determine the topology of a 2-hidden layer MLP network and learning strategy parame-

ters. This includes using Taguchi arrays for identifying factor levels and full factorial designs with two

levels and central point. Another group of model-free approaches are the space-filling designs. Hartmann

et al. [29] target minimizing the global model error (instead of parameter variance) of static local model

networks. A pseudo Monte-Carlo sampling algorithm is used with the goal to homogenously distribute

the samples. The algorithm selects the point from a set of random candidate points that has the maximum

distance from the already chosen points. This selection is repeated until the desired number of points is

obtained. It uses a fixed number of samples per partition. Skrjanc [54] proposes a method that uses a su-

pervised hierarchical clustering algorithm that locally deploys the Gustafson-Kessel algorithm in combi-

nation with a space-filling experiment design for an evolving fuzzy model identification. The space-

filling design is realized by a pseudo Monte-Carlo sampling algorithm. The method is demonstrated in

static test problems. Belz et al. [5] investigate what order of experimentation is most suitable for regres-

sion problems in case the model is already used while the experiments are conducted. Deregnaucourt et

al. [12] propose a maximin distance-based design point selection from a candidate set for on-line design

of experiments and identification. They compare designs in input, output and product space and demon-

strate it for static modelling problems.

While these works address DOE for static modelling, another area is DOE for dynamic modelling. This is

often also referred to as test signal design. This problem is addressed by Heinz and Nelles [30] as well as

Gringard and Kroll [27] who compare the effects of using different general-purpose broad-band test sig-

nals (multisine, APRBS, chirp) on covering the model input space and on resulting model quality when

identifying locally affine dynamic TS models.

Model-free designs don’t consider the intended model structure during the experiment design such that

the collected data will in general not be optimal for the intended use. On the contrary, Optimal Experi-

ment Design (OED) methods plan experiments such that given a chosen model structure the design points

optimize the specified assessment criterion respecting possible constraints. One possible approach is to

optimize a criterion on the Fisher Information Matrix (FIM), which targets reducing the parametric uncer-

tainty of the resulting models. The frequently used criterion in engineering is the determinant of the FIM,

which yields D-optimal design. Hirsch and del Re [33] work towards this direction: They deal with static

TS models that locally employ second order polynomials using an extended D-optimal experiment de-

sign. In order to prevent several samples in the same location additional costs are assigned to the minimal

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 5: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

observed distance between samples. The partitioning is assumed to be given. Hametner et al. [28] assume

the partitioning to be given and investigate D-optimal on-line designs regarding the local model parame-

ters of dynamic local model networks and the weights of MLP networks with a single hidden layer. ARX

and output error (OE) model configurations as well as constraints on model input and output are consid-

ered. Deflorian and Zaglauer [11] address DOE for training dynamical MLP-type artificial neural net-

works. They utilize ramp-stair-sequence type test signals. Noise is assumed to be normal distributed. In

case of off-line designs of the amplitudes, Latin Hypercube Sampling and a third order polynomial based

D-optimal design for the amplitudes of an amplitude-modulated pseudo random signal is used. For on-

line designs query-by-committee and sequential D-optimal designs are used.

Instead of basing the design on the FIM, which means an uncertainty reducing experiment design, other

assessment criteria can be used. One example is to use the approximation error of the model. The work of

Suzuki and Yamakita [55] can be attributed to this area. They address DOE for piecewise affine ARX

(PWARX) models for hybrid systems and push the design points towards the discriminating surfaces be-

tween neighbouring ARX models to optimally estimate the local model partition boundaries. A cost func-

tion that punishes the distance from the boundaries is used and constraints on the input are considered. As

the design requires knowledge of the partition boundaries, alternating DOE and identification is proposed.

On DOE for artificial neural networks further literature is available, e.g. [3,10,13,31,57]. Also, different

terminology is used such as ‘active learning’, ‘lazy learning’ or ‘query construction’, see e.g. [2,8]. The

partitioning problem of TS systems, however, does not occur in ANN, which clearly distinguishes both

fields.

Compared with DOE for parameter identification of nonlinear physical models, DOE for TS models is a

large-scale problem due to the black-box multi-model approach. Kitsos [38] mentions a maximum num-

ber of 5 parameters. In the survey paper [21], the authors refer to a ‘large number of parameters’ in con-

text of estimating kinetic coefficients in process engineering models. The referenced problem [20] has 12

parameters and a joint experiment design for all 12 parameters proved impossible. For a wider coverage

of the field of DOE, the reader is referred to the overview articles [21,52].

Finally, it can be stated that all papers that address optimal experiment design for TS models assume the

partitioning to be given. This significantly simplifies the design, as the design problem regarding the local

model parameters is LiP in case of static or (N)ARX-type dynamic models. As the partitioning is the sole

source of nonlinearity in locally affine TS models, it is important to also determine the partition parame-

ters accurate and with little uncertainty. Non-model-based experiment design approaches such as space

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 6: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

filling designs do not provide for optimal designs e.g. in the sense of a maximal reduction of the uncer-

tainty of the estimated parameters. Also they don’t provide for an uncertainty estimate.

This contribution builds on previous work [14,41], where first algorithmic ideas of a robust 2-stage de-

sign were presented and demonstrated using the synthetic TS system of case study one as target. For this

work, the distribution of the design points over the different stages was investigated and initialization

methods for the optimal design in the 2nd stage were developed. More complex and higher dimensional

case studies and extensive parameter studies were carried out.

3. PROBLEM STATEMENT

Given is a nonlinear static multi-input single-output system f:

)())(())(( kekfky += xx (1)

where x ∈ ℜn are the inputs, y ∈ ℜ is the output, and e(k) is a random variable. f is to be approximated by

a Takagi-Sugeno (TS) model that is estimated from N observations (xk,yk).

3.1 Model

TS models are composed of c local models. Notation can be in form of rules or mathematical expressions.

The latter will be adopted:

∑∑

=

=

= ⋅=⋅

=c

iiiic

jjj

c

iiiii

yy

y1

,LMMF

1,MF

1,LM,MF

),(ˆ),(),(

),(ˆ),(),,(ˆ ΘxΘz

θz

ΘxΘzΘzx φ

µ

µ (2)

The i-th local model uses the inputs or ‘regressors’ x to predict ),(ˆ ,LM iiy Θx . In the sequel, local affine

models are considered:

xaxΘΘx Tii

TTTiii ay +== 0,,LM,LM ] 1[),(ˆ , (3)

i.e. ΘLM, i ∈ ℜn+1. The membership functions ),( ,MF ii Θzµ define the region and degree of validity of the i-th

local model. The ),( MFΘziφ are referred to as fuzzy basis functions. They permit a more compact nota-

tion. The scheduling variable z can differ from the regressor x, but is often chosen identical as in this arti-

cle. In this work, multi-variate membership functions (MF) which result in explicit form from fuzzy clus-

tering [6] are used:

1

1

1

122

)(

=

−−= ∑

c

jjii

ji

νµAA

vxvxx (4)

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 7: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

Their parameters are the partition centres vi, vj ∈ ℜn, i, j ∈ {1, …, c} and the fuzziness parameter

ν ∈ ℜ>1. The latter is a global parameter used to adjust the fuzziness of the partitioning, see [40] for a

discussion on its choice. Distance norms can e.g. be inner product norms:

)()(2

jjjjj

vxAvxvxA

−−=− , (5)

where A j is a norm inducing matrix (including Euclidean and Mahalanobis norm as special cases), or Lp

norms. These multi-variate MF permit to achieve parsimonious models with good approximation proper-

ties. They are continuously differentiable, simplifying the computation of the FIM. As µi = ϕi holds, (2)

simplifies to:

∑=

⋅=c

iiii yy

1,LMMF ),(ˆ),(),(ˆ ΘxΘxΘx µ (6)

with ];...;[ 1TMF

Tc

T vvΘ = . All local model parameters can be concatenated as ];...;[ TLM,

TLM,1

TLM cΘΘΘ = . An al-

ternative are Gaussian MF, which are often used for tree-type partitioning approaches such as LOLIMOT

[49]. These can be treated similarly but are omitted due to limited space.

3.2 Optimal experiment design

In this work it will be assumed that {e(k)} in (1) is a sequence of i.i.d. normally distributed random varia-

bles with zero mean and finite variance σ2 > 0. This simplifies computing the FIM for the discrete case

with N observations (xk,yk), where the x(k) = xk are referred to as ‘design points’ to [36]:

0

)),((ˆ)),((ˆ1

12

ΘΘΘ

ΘxΘ

ΘxI

== ∂∂⋅

∂∂= ∑

TN

k

kyky

σor (7)

0

)),((ˆ)),((ˆ1 ];[

12,,

ΘΘ

ΘxΘxI

== ∂∂⋅

∂∂== ∑

j

N

k ijiji

kykyII

θθσ (8)

The Cramer-Rao inequality states that for any unbiased estimator

1)ˆ(cov −≥ IΘ N (9)

holds [44], which relates the FIM with the uncertainty of the parameter estimate. The aim of an optimal

experiment design is to plan the experiments such that a scalar metric of the FIM e.g. as the magnitude of

its determinant, its trace or maximum eigenvalue is maximized. This provides for the well-known D-, A-

and E-optimal designs, respectively. In case of D-optimal designs the required computations are compara-

tively simple and yield the uncertainty region with minimum volume. A disadvantage is that the design

overemphasizes parameters with high sensitivity. An A-optimal design neglects all non-diagonal elements

of the FIM. Hence, it does not exploit all available information and it is not reliable in case of strong cor-

relations between parameters. E-optimal designs address the shape but not the size of the uncertainty re-

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 8: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

gion, such that large regions may result. In addition, the possible discontinuities (due to changing eigen-

values) complicate the optimization. In engineering, D-optimal designs are popular [64] and will also be

considered in this paper. This means N design points ];...;[ 1 NxxX = are to be determined such that

GN ∈=

xxIX

,...opt

1

)det(maxarg (10)

holds, where G is the permitted domain/design space. As the FIM is evaluated at Θ = Θ0 it is only locally

optimal in case of models that are nonlinear in their parameters. This contribution considers a joint opti-

mal design for partition (ΘMF) and local model parameters (ΘLM) of TS models. A- and E-optimal designs

can be treated similarly.

4. IDENTIFICATION OF TAKAGI-SUGENO MODELS

Given N observations (xk,yk), the identification objective is to determine the number c of partitions (local

models) and their soft boundaries (defined by the prototypes v1,…,vc) and the parameters {(ai,0,ai)} of all

c local models. The identification criterion used is the common mean squared approximation error

∑=

−=N

k

kykyN

J1

2MSE )))((ˆ))(((

1: xx

. (11)

Theoretically, all parameters ΘMF and ΘLM could be determined by solving the mixed-integer optimiza-

tion problem resulting from minimizing (11). However, this problem is very hard to solve. Instead, if little

information about the system is available, computationally efficient fuzzy clustering and ordinary least

squares methods are used to determine a (suboptimal) initial model. Succeeding, the model’s parameters

are simultaneously optimized subject to (11) using Matlab’s function fmincon.

Typical fuzzy clustering algorithms are Fuzzy c-Means (FCM), Gustafson-Kessel and Gath-Geva algo-

rithm [4,6,34]. The FCM uses the same distance norm for all clusters. It is a simple, robust and quick

basic method. The Gustafson-Kessel algorithm adapts the distance norm of each cluster individually to

the data scattering. This adds flexibility at the cost of significantly more parameters, higher computational

demand and worse convergence characteristics. The Gath and Geva algorithm is even more flexible.

Clustering is carried out in the product (X,Y) space to acknowledge the system’s non-linearity. The FCM

minimizes the cost function

∑∑= =

−=N

k

c

iikkiJ

1 1

2

2,FCM )( vxνµ (12)

by alternatingly computing new partition centers

};...;1{1

,1

, ciN

kki

N

kkkii ∈∀⋅= ∑∑

==

νν µµ xv (13)

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 9: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

for given values of the membership values and then updating the memberships using (4) with the updated

values of the vi. It converges to a local minimum or saddle point of JFCM. The prototypes are initialized

using uniform distributed random numbers rather than the membership values: The latter results in proto-

types that scatter just little around the mean of all data, as for uniformly distributed random variables

ciNik ,...,1for , =∞∀→→ xvx holds. The number of clusters c can e.g. be determined using cluster validation

criteria such as the Xie-Beni-Index [62] or the fuzzy hyper volume [25] or by using cluster-merging

methods [4]. In this work, the approximation error is used, instead, as it is closer to the utilization of the

clustering results. In this article, a modified implementation of the Matlab™ FCM is used such that the

prototypes can be initialized.

Given the membership functions µi, the model output (6) is linear in the local model parameters

cii ,...,1,,LM =Θ . The latter can therefore be estimated using the ordinary least squares (OLS) method. The

parameters i,LMΘ of each local model can be identified independently from each other by minimizing the

deviation of the local model’s prediction (3) from the reference value y, which is called ‘local LS’

(‘LLS’), see e.g. [66]:

( )∑=

−N

kiiii kykyk

i1

2LM,,LM )),((ˆ))(()(minarg:ˆ

LM,ΘxxΘ Θ φ (14)

iT

nniiii kkxakxaaky ,LM,11,0, )(:)(...)())((ˆ Θφx =+++= (15)

with ))((:)( kk ii xφφ = . This permits to interpret a local model as a linearization of the true system. However,

neglecting the interpolation between neighbouring local models reduces the prediction quality, in general.

Alternatively, all c parameter vectors i,LMΘ can be estimated simultaneously by minimizing the approxima-

tion error of the TS model, which is referred to as ‘global LS’ (‘GLS’):

( )∑=

−N

k

kyky1

2LM )),((ˆ))((minarg:ˆ

LMΘxxΘ Θ

(16)

LM

,,11,1,10,0,111111

1,11,0,

)(:

];...;|...|;...;|;...;[ )]()();...;()(|...|)()();...;()(|)();...;([

))(...)()(()(ˆ

Θφ k

aaaaaakxkkxkkxkkxkkk

kxakxaakky

T

Tncnccncncc

c

inniiii

=

⋅=

+++=∑=

φφφφφφ

φ

(17)

The GLS provides for better predictions at the cost of worse local interpretability, in general. Differences

between LLS and GLS shrink as ν→1. In both cases the optimal parameters are computed by

YΦΦΦΘTT 1][ˆ −= (18)

with the vector of the output observations ][ ky=Y and the regression matrix )]([ kTφΦ = composed of

regressors ϕ(k) as defined in (15) and (17) for LLS and GLS, respectively. (It is remarked that the same

value of the fuzziness parameterν is used for clustering, parameter estimation and model evaluation.)

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 10: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

The TS model obtained so far is not optimal regarding its prediction error, as clustering minimizes the

grouping-related criterion (12) but not the approximation error (11). Therefore, all real-valued parameters

MFΘ andLMΘ are subsequently optimized using cost function (11) with Matlab™’s fmincon.

5. EXPERIMENT DESIGN FOR TAKAGI-SUGENO MODELS

At the beginning of the modelling task, often knowledge about an appropriate model structure is not

available, which however is required for a model-based experiment design. Therefore, a 2-stage proce-

dure is proposed: In the first stage, a model-free space filling design is used. The resulting data is used to

identify the structure and the local model parameters of an initial TS model. This model is the basis for an

optimal experiment design with respect to both membership function and local model parameters in the

2nd stage. The proposed procedure is shown in Fig. 1 and will be explained in the following subsections.

5.1 Initial space filling design and initial TS model identification (1st stage)

In the first stage, a model-free space filling design is used to place N1 design points {xk}. Different meth-

ods can be used; for example, random placement, regular grids and Latin Hypercube Sampling (LHS).

Random sampling can result in locally strongly varying density of design points, which may be disadvan-

tageous. Regular grids enforce a uniform distribution, but result in an exponentially growing number of

points with increasing number of design variables. This is avoided by LHS. LHS will not necessarily pro-

vide non-collapsing designs that cover the design space uniformly. Optimal LHS have these properties

(see [15] for a detailed discussion). In this contribution, an approximately optimal LHS that maximizes

the minimum distance between the design points is used, as provided by the function lhsdesign of the

Matlab Statistics Toolbox™ [46]. The latter generates a number of LHS designs and takes over the best

one [35]. This procedure will in general result in a suboptimal design, but it is fast and considered as suf-

ficient for the purpose of this work. For the resulting design an experiment is carried out that provides for

N1 observations M1 = {(xk,yk)} (step 1). This data is used to identify an initial TS model using FCM and

OLS providing for parameters )},{( )0()0(ii av

)) (step 2). Partition and local model parameters are optimized

using fmincon to minimize the approximation error (11) on the given data set M1 yielding )},{( )0()0(ii av

(step 3).

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 11: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

5.2 Optimal experiment design for TS models (2nd stage): Main algorithm

In the 2nd stage, a FIM-based D-optimal design is applied jointly for premise and conclusion parameters.

Using the initial TS model from the 1st stage, the partial derivatives required for the FIM can be computed

and N2 optimal design points Xopt can be determined. A ‘robust’ sequential design is used as the TS model

parameters from the first stage that are used for the optimal experiment design will possibly deviate from

the true parameters. Initialization strategies for the design points of the OED cycles are discussed in the

next subsection. A corresponding experiment is carried out providing for the new observations M2(1) =

Fig. 1. Proposed 2-stage optimal experiment design and model identification process.

Start

Make design decisions: N1, N2, ν, nFCM, δgreedy

Choose termination criteria for FCM, fmincon, greedy andsequential OED alg.; choose initialization strategy forOED

l = 1

LHSStep 1

Random initialized FCMOLS

N1 observationsM1 = {(xk,yk)}

)},{( )0()0(ii av

Step 2

Nonlinear LSE (FMINCON)Step 3)},{( )0()0(

ii av))

Stage 1

OED (Greedy) l-th run

Repeat nFCM times

Select best model

Nonlinear LSE with )(2

)1(21 ... lMMM ∪∪∪

Step 4+2(l-1)

Step 5+2(l-1)Update FIM with updated parameters

1+← ll

End

)},{( )()( li

li av

Stage 2

no

yes

Experiment

Initial model parameters

Optimized model parameters

ExperimentN2 observationsM2

(l) = {(xk,yk)}

Updated model parameters

Step 0

Initialize N2 design points for OED

N1 design points {xk}

N2 design points {xk}

Terminate?

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 12: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

{( xk,yk)}. All available data )1(21 MM ∪ are used to estimate a new TS model )},,{( )1()1(

0,)1(

iii a av . The FIM is

recalculated with the updated TS model parameters to update the assessment of the information content of

the collected data. If the TS model parameters have significantly changed, another OED cycle is carried

out, which uses the updated model parameters to compute the FIM and the design points from the previ-

ous cycle for initialization. Again N2 optimal design points are determined, new observations M2(2) =

{( xk,yk)} are collected in the experiment and a new TS model is estimated using all available data

)2(2

)1(21 MMM ∪∪ . This procedure is repeated until a given number of design cycles has been carried out

and/or a convergence related criterion is met.

To compute the FIM, partial derivatives (sensitivity functions) of the model output (6) with respect to all

membership function and local model parameters are required. Regarding the local model parameters one

obtains:

),(1|),(ˆ

:

),(|),(ˆ

:|

MF0,

MF0,

0,

,

ΘxΘx

ΘxΘx

iai

i

ijai

jji

ii

jii

ya

xy

a

µθ

µθ

θ

θ

⋅=∂

⋅=∂

=

=≠ (19)

For the membership function parameters vi,j:

∑=

=Θ ∂∂⋅=

∂∂ c

i fi

iiv

i vy

yjii

1 ,

MFLM

),(),(ˆ|

),(ˆ,

ΘxΘx

Θx µθ

(20)

with

∑∑≠=

−−−

≠=

− −−⋅

+

−−=

∂∂ c

ijj j

fif

j

ic

ijj j

i

fi

i

d

vx

d

d

d

d

v 12

,1

2

2

2

2

1

1

1

2

2

,

MF)(2

11

1),( νν

ν

νµ Θx (21)

and for ir ≠ :

2

,1

1

2

2

2

1

1

1

2

2

,

MF)(2

11

1),(

r

frf

r

ic

ijj j

i

fr

i

d

vx

d

d

d

d

v

−⋅

+

−−=

∂∂ −

≠=

∑νν

νµ Θx (22)

can be derived for Euclidean distances

2

2

2rrd vx −= . (23)

Generalized expressions for inner product or Lp norms can be found in [39].

Given c local models and n input variables there are in total c · n membership function parameters (vi =

[vi,j], i = 1,…,c; j = 1,…,n) and c · (n + 1) local model parameters (ai,j, i = 1,…,c; j = 0,…,n). For a design

that covers both parameter groups, the FIM has dimension c(2n + 1) × c(2n + 1). It may not be well con-

ditioned because of the different character of the parameter groups. As the FIM has a block structure

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 13: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

=

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

∂∂

= ∑= ji

Tji

jijiN

k

sr

k

ji

k

sr

k

ji

k

sr

k

ji

k

sr

k

ji

k

v

y

v

y

a

y

v

y

v

y

a

y

a

y

a

y

,,

,,

1

,,,,

,,,,

2:

ˆˆˆˆ

ˆˆˆˆ

1VB

BAI

σ (24)

and is symmetrical to its secondary diagonal, det(I ) can be computed from the determinants of two lower-

dimensional matrices:

)det()det()det( ,1,,,, jiji

Tjijiji BABVAI −−⋅= . (25)

As this comes at the cost of an additional matrix inversion, decomposition (25) is not used further in this

paper. As the TS model is linear in the local model parameters ai,j, the derivatives jiay ,/ˆ ∂∂ given in (19)

do not depend on the local model parameters. However, the TS model is nonlinear in the membership

function parameters vi,j, such that the partial derivatives jivy ,/ˆ ∂∂ given in (21)-(22) depend on the parame-

ter values making the experiment design only locally optimal. The fuzziness parameter ν in (4) is typical-

ly chosen small to obtain extended kernel regions of the membership functions (and narrow transition

zones), see [40], which supports the interpretation of the local models as approximate linearization of the

system. A side effect is, however, that the sensitivity functions significantly differ from zero only in small

regions making the design/optimization problem hard to solve.

In the case studies very large values of det(I ) will be observed. These can be understood by recalling the

Leibniz formula, which computes the determinant of an (n × n)-square matrix from the sum over all per-

mutations of products of n matrix elements. Hence, as a rule of the thumb the determinant of a FIM can

be expected to increase with the magnitude of the matrix elements (i.e. the sensitivity functions evaluated

in N design points) themselves and with the size of the matrix (resulting from the number of parameters).

As the FIM is additive in all observations, just occasional large sensitivities can result in large values of

det(I ).

At first, Matlab™’s function fmincon was used for design point optimization. However, it worked

properly only in the first of the three case studies. In the other two, it terminated always after the first iter-

ation. This is attributed to the small values of the minimized cost function (det(I )-1) in the order of e.g.

10-50. Instead, a simple greedy algorithm was used for all 3 case studies for comparability of the results.

Assuming similar value ranges for each coordinate; it perturbs each design point coordinate by ±δ and

selects the change providing for maximal improvement. Once the changes for all design points have been

determined, all moves are simultaneously implemented and a new iteration starts. The procedure is re-

peated for all design points until changes are negligible. The step width δ should be chosen sufficiently

small compared with the range of the design space (see section 6.1). This simple approximated gradient

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 14: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

descent algorithm with fixed step is certainly not efficient, but effective enough to demonstrate the capa-

bility of the proposed design method.

5.3 Choice of initial design points for the optimal design

The used method for optimal experiment design converges locally such that the result depends on the ini-

tial values xk,0 of the design points. In case N2 = N1 holds, the design points of the space filling design car-

ried out in stage 1 can be used as initial values for the 2nd stage. If N2 ≠ N1, e.g. grid, random or LHS

placement can be used to obtain the initial values xk,0. Kroll and Dürrbaum [41] showed that design points

that are optimal regarding local model parameters are expected to lie on the inner partition boundaries or

on the range limitations. The latter will (with some abuse of denomination) be referred to as ‘outer parti-

tion boundaries’ in the sequel. Design points that are optimal regarding the prototypes are expected to lie

on the inner partition boundaries approximately where these are crossed by the connecting line between

two neighbouring prototypes [41]. The TS model resulting from the first stage can be used to compute

initial design points in regions where high information gain is expected. One approach is to distribute the

design points along the inner and outer partition boundaries. This can be implemented by applying a Vo-

ronoi decomposition given the set of partition centres. In case of 2 inputs, the Voronoi decomposition

consists of a set of lines and can be computed with the Matlab™ function voronoi. It provides for the

Voronoi cell bounds. The open outer cells are augmented with boundaries resulting from the design space

bounds providing for c bounded convex polytopes. Then, design points are placed one in each vertex of a

polytope and distributed equidistantly on each bounding line. The same number of m points is assigned to

each inner bounding line. On each design space bound, also m points are placed. Moreover, where an in-

ner bounding line is crossed by the connecting line between two neighbouring prototypes, an additional

design point is located. Finally, multiple identical design points (particularly in the vertices) are removed.

This method will be referred to as ‘Voronoi initialization’ in the sequel.

For more than two dimensions the polytope’s boundaries are (hyper-)planes and the Matlab™ function

voronoin has to be used. The latter, however, does not provide the required information to determine

the separating (hyper-)planes for the outer ‘unbound’ Voronoi cells. For that reason an approximation of

the Voronoi initialization is used, referred to as ‘µ0.5 initialization’: Points with membership of µ = 0.5 lie

‘in the middle of the transitions’ between neighbouring partitions. In fact, for ν → 1 the bounds of the

support of an α-cut for µ = 0.5 of a membership function equal the bounds of the corresponding Voronoi

cell (if Euclidean distance norm is used in both cases). For this reason, if N2 initial points are needed,

those N2 points from the LHS design having membership values closest to 0.5 are selected. This strategy

can certainly be refined e.g. by also considering the resulting distribution over the different inner bounda-

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 15: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

ries, but proved to work sufficiently in the third case study. Fig. 2 illustrates the four initialization strate-

gies.

5.4 Selecting the number of design points for space-filling and optimal design

For a successful experiment design the number of design points should be chosen such that the system’s

nonlinearities are captured. This depends on the character of the targeted system: Systems with locally

strongly varying gradients require denser sampling than system with smooth curvature of their response

surface. A rule of the thumb for the area of space filling designs is to choose )dim(101 x=N . This rule

was analysed in detail in [45] up to 20 dimensions for Latin Hypercube Designs in context of Gaussian

process (GP) models that are parametrized using maximum likelihood estimation. The conclusion was

that this rule “will provide reasonable prediction accuracy for ‘tractable’ functions and is sufficient to di-

agnose more difficult problems” and that “it is a reasonable rule of the thumb for an initial experiment.”

For a D-optimal design, another rule of the thumb suggests to use )dim(5.12 Θ=N [50].

The analysis for N1 in [45] refers to GP models having one parameter per dimension plus mean and vari-

ance: 2)dim()dim( += xΘ . On the contrary, with )1)dim(2()dim( += xΘ c TS models have significantly

more parameters, see section 4.2. Hence it can be expected (and will be evident in the case studies) that at

least for TS models with many local models the proposed value of N1 will be too small. While the rule

worked for the simplest case study that went in tandem with a small value of c = 3 partitions, dependent

on the complexity of the system characteristics c to 2c times the number proposed by the aforementioned

rule provided for suitable results in the 1st stage. The rule for N2 worked well in all case studies in sec-

tion 6. It has to be remarked, however, that in the 2nd stage also the design points from the 1st stage are

used for parameter estimation.

6. CASE STUDIES

(a) (b) (c) (d)

Fig. 2. Illustration of the initialization strategies: LHS (a) or grid (b) placement of N = 25 points, Voronoi with

m = 5 (c) and µ0.5 with N2 = 10 (d).

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 16: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

Three case studies on experiment design and nonlinear regression were selected: In the first study the true

system is a 2-input TS system and therefore within the model class. In the second study the characteristic

map of an industrial axial compressor has to be approximated. The response surface of this challenging

system has the shape of a steep bended cliff with locally varying curvature embedded in two flat or nearly

flat plateaus, respectively. The narrow cliff area must be well covered by design points to obtain a good

TS model. In the third study, the Friedman 1 test function is approximated. Having a 5-dimensional de-

sign space, this benchmark problem is higher dimensional. In order to assess the improvement of the pro-

posed OED method, the results can be compared with the ones from the space filling design carried out in

the first stage.

6.1 Design choices and implementation

For a statistical assessment of the methods, in the first stage for each choice of N1 20 different LHS data

sets were generated. The FCM was repeated nFCM = 20 times with different random initializations. The

toolbox default settings on FCM termination were taken over, i.e. a maximum of 100 iterations or a JFCM

change of less than 10-5. LLS and GLS have both been used in the case studies, but for the sake of com-

pactness only the GLS results are reported, which have provided for smaller prediction errors, in general.

The Matlab function fmincon can draw on different optimization algorithms, of which SQP was cho-

sen. Some changes in the default settings improved the performance (ConstraintTolerance: 1.0000e-10

(default: 1e-6), MaxFunctionEvaluations: 100000 (default: 100 n_Var), MaxIterations: 1000 (default:

400), OptimalityTolerance: 1.0000e-10 (default: 1e-6), StepTolerance: 1.0000e-10 (default: 1e-6). Given

design spaces with value ranges of 1 to 2, the step width of the greedy algorithm was chosen as δgreedy =

0.05 as compromise between resolution and computational burden. The greedy algorithm is terminated if

det(I ) has not improved within 15 consecutive iterations or if 500 iterations on all N2 design data were

carried. A fixed number of two OED cycles was used as termination criterion for the 2nd stage, as the

changes after the first OED cycle were in most case insignificant. The case studies were carried out using

Intel Xeon PCs with 3.4 GHz, 4 core/8 threads, 64 GB RAM, Windows 7 64 bit, Matlab™ R2016a 64bit

using the Parallelized Optimization Toolbox.

6.2 Assessment criteria

As criterion for the model performance the root mean squared error (RMSE):

))(ˆ)((1

:1

2RMSE ∑

=

−=lN

kl

kykyN

J (26)

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 17: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

determined for a uniform grid of test points in the input space will be used and referred to as Jgrid. It is

easier to interpret as the MSE, as it is in the physical unit of the dependent variable. The outmost points

are placed into the limits of the chosen design range (xj,min, xj,max), and the remaining are equally distribut-

ed in between. Noise-free observations are used for the test data set to ease interpretation of the RMSE

values. If the system is in the model class as in the first case study, the error of the parameter estimates is

also assessed. As parameter confidence/uncertainty related measure, the determinant of the FIM on the

identification data is used. As the local model and partition parameters related partial derivatives have

different magnitudes, not just the FIM for all parameters (det (I ) referred to as ‘det (I (av)’) will be stud-

ied, but also the submatrices in (24), i.e. A ij related to the local model parameters (ai,0, ai) (det(A ij) re-

ferred to as ‘det (I (a)’) and V ij related to the partition parameters vi (det(V ij) referred to as ‘det (I (v)’).

The values after the FIM update with the updated TS model parameters will be reported. For a statistical

assessment, the criteria’s mean is reported in the following and additional criteria (incl. the computational

effort) are given in the appendix.

6.3 TS system

In this case study, data is generated using a TS system such that the true system is known and is within

the model class. It consists of three subsystems of similar size and compact shape, such that it is easy to

sample the systems response surface and an initial partitioning using the FCM will not be difficult. Being

a (2 × 1)-system, results can easily be visualized. The true local models are:

321213221212121211 ]1;;[:12 ;]1;;[:424 ;]1;;[:244 ΘΘΘ ⋅=++=⋅=−−=⋅=−+−= xxxxyxxxxyxxxxy , (27)

and the true membership functions (4) have centres in:

[ ] [ ] [ ]TTT 1 ;5.1;5.1 ;5.0;5.0 ;5.0 321 === vvv (28)

and use Euclidian distance. The admissible range of the inputs is constrained to xj ∈ [0; 2]. As test data set

a grid of 50 × 50 points with the corresponding observations is used. The fuzziness parameter is chosen as

ν = 1.3. The true system is shown in Fig. 3. It is assumed that the number of local models is known (c =

3). N1 = N2 = 25 is chosen to identify the 15 model parameters, i.e. the OED in the 2nd stage is initialized

with the LHS design points from the first stage. The noise-free case and noisy data with σ2∈{0.2, 0.5,

1.0} is considered. Other design choices (e.g. other values of ν) were studied in [14]. The results for 20

different LHS data sets are recorded in Table 1 to 3.

(a) (b)

Fig. 3. True local models (a), true MF (b), and true TS system graph (c).

(b) (a) (c)

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 18: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

Table 1 and 2 show that the approximation error declines successively along the algorithm steps, though

only slightly in the 2nd stage. Table 3 shows that the parametric uncertainty, however, is significantly re-

duced by the 2nd stage. Note that the FIM bases on 50 observations after step 2 and 3, respectively, on 75

after step 5 and on 100 after step 7. The first column (σ2 = 0) in Table 1 and 2 show that the true system is

ideally recovered in the noise-free case. In case of noisy data the model error increases noticeable. This

comes by no surprise due to the small size of the identification data set compared to the number of model

parameters. The model quality can be improved at the cost of using more design points, see [14].

In Fig. 4 the course of the algorithm is illustrated for an example run (noisy observations with σ2 = 0.2). In

this example, the partitioning resulting from the FCM significantly deviates from the true one, which can

be seen in the Voronoi decomposition. This dilutes the local model estimates computed by the GLS. The

Table 1: Median of RMSE on Test Data for Different Noise Levels (TS System)

Result

after step

Median Jgrid

σ2 = 0 σ

2 = 0.2 σ2 = 0.5 σ

2 = 1.0

2 1.0504·100 1.1072·100 1.2104·100 1.5786

3 7.1462·10-7 2.7456·10-1 6.6154·10-1 1.2693

5 2.6699·10-7 2.0532·10-1 5.3672·10-1 1.0612

7 2.1609·10-7 1.9698·10-1 5.2050·10-1 1.0282

Table 2: Median of Frobenius Norm of Parameter Error for Different Noise Levels (TS System)

Result

after step

Median ||Θ - Θ0||2

σ2 = 0 σ

2 = 0.2 σ2 = 0.5 σ

2 = 1.0

2 4.6524·100 4.6215·100 4.5393·100 5.8597

3 2.0539·10-6 8.9439·10-1 2.2061·100 4.9549

5 1.1800·10-6 4.0161·10-1 1.0606·100 2.4260

7 1.1528·10-6 2.6854·10-1 6.7917·10-1 1.5992

Table 3: Median of Determinates of Full Information Matrix for Different Noise Levels (TS System)

Result

after step

Median det(Iav)

σ2 = 0 σ

2 = 0.2 σ2 = 0.5 σ

2 = 1.0

2 7.7467·1016 9.9911·1016 9.7608·1016 2.0528·1017

3 1.9760·1017 2.7578·1017 6.1872·1017 1.6657·1018

5 6.9534·1026 5.8458·1026 2.7993·1026 3.0188·1026

7 3.7158·1035 2.6184·1035 1.3573·1035 4.4041·1034

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 19: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

nonlinear optimization of partition and model parameters in step 3 significantly improves the model: The

optimized prototypes (○) approximately match the true ones and so do the local model parameters. As

expected, the first OED cycle moves the design points towards the partition boundaries. The approxima-

tion error on the test data set reduces insignificantly, but the uncertainty of the estimates is significantly

reduced as det(I ) increases by several magnitudes. The second OED cycle further reduces the uncertainty.

However, the design points have changed little, such that from a cost point of view the first OED cycle

would have been sufficient.

1st stage: Results of FCM and GLS in top row and of optimization in row below:

Jgrid|GLS = 1.278, Jgrid|opt = 0.248, det(I )|GLS = 1.87·1018, det(I )|opt = 6.57·1017

LHS data Voronoi dec. after FCM MF from FCM Local models from GLS TS model

Voronoi dec. after opt. MF after opt. Local models after opt. TS model after opt.

1st OED cycle: Jgrid = 0.212, det(I )|opt = 6.70·1026

Optimal design points Voronoi decomposition MF Local models TS model

2nd OED cycle: Jgrid = 0.208, det(I )|opt = 1.23·1030

Optimal design points Voronoi decomposition MF Local models TS model

Fig. 4. Example design process (TS system)

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 20: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

6.4 Axial compressor characteristic map

The second case study addresses estimating the characteristic map of a single-stage axial compressor

(NASA CR-72694) [18]. Fig. 5 shows the reference map. The true system is not within the model class.

This problem is difficult for experiment design, as most of the response surface is flat and does not pro-

vide any information on the choosing the membership function parameters. The informative area is a tiny

belt which is rarely sampled by space filling design such that a comparatively high number of design

points are required in the 1st stage. Secondly, mismatch of parameters used for the OED can yield design

points, which lie outside the informative area. The cliff-type response surface in Fig. 5 is bended horizon-

tally and vertically and has locally varying curvature making an approximation with locally-affine TS

models difficult. The data for this case study was generated by a tool for compressor maps [43]. With this

tool (and some post-processing) a database of N = 56,248 observations of the input/output behaviour of

the compressor was generated. An output value of 0 was assigned to inputs in the surge area such that the

“cliff” bends crisply at the bottom but softly at the top. While Fig. 5 shows the transfer characteristics in

physical units, for the case study each signal was scaled to the unit interval to avoid scale-effects. As the

tool [43] is no simulator that can be used to provide an output for an arbitrary chosen input vector, when-

ever a design point is calculated, the computed point is replaced by the (Euclidean) nearest neighbouring

point in the data base. Different values of c ∈{4, 5, 6} and ν ∈{1.05, 1.1, 1.2, 1.5, 1.8} were examined.

The choice of c = 6 and ν = 1.2 was found to be well suited for the given problem (see also [42]) and was

used for the following results. The TS model has 30 model parameters. The test data set was produced

using a grid of 50 × 50 points.

The design procedure was tested with different numbers of design points N1 ∈ {200, 250, 500, 750} for

the LHS design in the first stage. For the case of noise-free observations, the resulting model performance

Fig. 5. Characteristic map of single-stage axial compressor Nasa CR-72694 based on data from [18].

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 21: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

is shown in Fig. 6: For each value of N1 20 different LHS data sets were generated. For each data set the

FCM was applied 20 times each with a different random initialization. The model parameters were com-

puted with the GLS and the best of the 20 models was used for the optimization in step 3. Fig. 6 shows

the RMSE on the test data set. As the slope between the two plateaus of the map is steep and narrow, a

sufficiently large number of design points is required to cover it. In case of N1 = 200, about 13 to 18 of

the 20 LHS data sets result in below average model quality on the test data set after step 3, for N1 = 250

only one. If N1 is increased further, all models perform well, but little improvement results. The same is

observed for noisy data. Therefore, N1 = 250 will be used for the first stage in the sequel.

The LHS design points from the 1st stage are used to initialize the optimal experiment design in case of N2

= N1. For N2 ≠ N1 the design points were initialized as regular grid, randomly with N2 ∈ {49, 81, 121,

169, 225}, or by using the Voronoi initialization with m ∈ {4, 5, 6} points per line. The noise-free case

and noisy data with σ2∈{0.2, 0.5, 1.0} are considered. Table 4 and 5 record results for a Voronoi initiali-

zation with m = 4, which provide for approximately N2 = 40 design points (after identical points have

been removed). The other initialization methods provided for very similar results and a larger value of m

did not result in better models such that for the sake of compactness those results are not recorded in this

paper. The OED cycles negligibly reduce the prediction error, but significantly reduce the model uncer-

tainty. Note that I bases on 250 observations after step 2 and 3, respectively, on 290 after step 5 and 330

after step 7. Table 4 and 5 also show that noisy observations significantly impair the resulting model qual-

ity. This can be encountered by increasing the number of observations used for parameter estimation.

For the same design choices, the initialization dependency of the results is illustrated by Fig. 7: Fig. 7a)

shows how the results vary for 20 different LHS data sets and for 20 random initializations of the FCM

for noisy observations (σ2 = 0.2). The best model after 20 applications of FCM and GLS in steps 1 and 2,

Fig. 6. Error on test data set after step 2 (-) and step 3 (--) on test data set for N1 = 200 (×); N1 = 250 (*); N1 =

500 (○); N1 = 750 (◊).

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 22: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

respectively, for each of the 20 LHS data sets is taken over (Fig. 7b) for optimization in step 3 (Fig. 7c)

and then initializes the OED stage. Two OED cycles are carried out (Fig. 7d,e) Therefore, Fig. 7a) shows

results for 400 models and Fig. 7b-e) each for 20. Fig. 7 shows, that the error on the test data set little var-

ies with changing FCM initialization. On the contrary, the error significantly varies with the used LHS

data set. LHS data set 19 results by far in the worst model, which, however, becomes the 6th best model

after step 3 obtained and matches the median after the OED stage. The optimization in step 3 mitigates

effect of disadvantages initializations.

Table 4: Median of RMSE on Test Data for Different Noise Levels (Compressor)

Result

after step

Median Jgrid

σ 2 = 0 σ

2 = 0.2 σ 2 = 0.5 σ

2 = 1.0

2 1.4327·10-1 2.3918·10-1 5.2824·10-1 1.0380·100

3 6.3405·10-2 2.3061·10-1 5.5808·10-1 1.1390·100

5 5.9154·10-2 2.2101·10-1 5.3274·10-1 1.0596·100

7 5.9876·10-2 2.1932·10-1 5.2832·10-1 1.0393·100

Table 5: Median of Determinates of Full Information Matrix for Different Noise Levels (Compressor)

Result

after step

Median det(Iav)

σ 2 = 0 σ

2 = 0.2 σ 2 = 0.5 σ

2 = 1.0

2 1.2957·1017 1.2264·1014 5.9883·1016 1.9512·1023

3 2.8777·1012 3.8980·1015 2.7135·1025 5.4594·1032

5 9.2080·1016 5.2694·1017 2.2865·1030 3.8234·1031

7 5.7240·1019 2.9885·1020 7.2423·1033 1.3206·1035

(a) (b) (c) (d) (e)

Fig. 7. Dependency on used LHS data set and FCM initialization illustrated by RMSE on test data set: a) for

20 random initializations of FCM and GLS (step 1+2) for each of the 20 LHS data sets, b) for best model after

step 1+2 for each LHS data set, c) if best model of steps 1+2 is optimized (step 3), and after the first (d) and

the second OED cycle (e) (···: median) (Compressor)

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 23: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

In Fig. 8 the results for an exemplary design run with noise-free data are shown. The FCM does not pro-

vide for a good partitioning and correspondingly bad are the local models estimated by the GLS. After the

optimization in step 3 the partitioning well approximate the shape of the map and so do the local models.

For simplicity, the local models are shown within the bounds of the corresponding Voronoi decomposi-

tion. The resulting model already well matches the reference map in Fig. 5. The OED cycles steer the de-

sign points to the edges of the partitions or the crossing points of partition boundaries and connection line

between neighboring prototypes, as expected. As in the first case study, the approximation error on the

1st stage: Results of FCM and GLS in top row and of optimization (step 3) in row below

Jgrid|GLS = 0.123, Jgrid|opt = 0.055, det(I )|GLS = 6.58·1016, det(I )|opt = 2.93·1016

LHS data Voronoi dec. after FCM MF from FCM Local models from GLS TS model

Voronoi dec. after opt. MF after opt. Local models after opt. TS model after opt.

1st OED cycle: Jgrid = 0.047, det(I )|opt = 4.57·1023

Optimal design points Voronoi decomposition MF Local models TS model

2nd OED cycle: Jgrid = 0.046, det(I )|opt = 6.89·1026

Optimal design points Voronoi decomposition MF Local models TS model

Fig. 8. Example design process (compressor)

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 24: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

test data set reduces insignificantly, but the parametric uncertainty reduces significantly. Again, from a

cost point of view, a single OED cycle would have been sufficient.

6.5 Friedman test function 1

The Friedman test function 1 [22,23]

542

321 510)5.0(20) sin(10)( xxxxxf ++−+= πx (28)

with ]1;0[∈ix (i.e., the design space is the unit (hyper-)cube) is a benchmark problem for nonlinear regres-

sion. With 5 variables, it is a higher dimensional problem where grid type designs are not applicable due

to the exponential increase of the number of designs points with the design space dimension. The system

is not within the model class. The cases of noise-free (d(k) = 0) and noisy observations with i.i.d. normal

distributed random variables d(k) with zero mean and variance σ2 = 4,876642 as in [7] will be considered.

In the original problem description N = 200 points were used that are uniformly distributed in [0;1]5. The

sample size, however, varies from work to work (e.g. from 102 to 106). In this case study, N1 = 250 design

points will be used in the following. Then N2 ∈ {50, 100, 150, 200, 250} design points are considered. A

test data set is produced using a grid of 215 ≈ 4·106 points as compromise between grid granularity and

resulting number of function evaluations. Different values of c ∈{3, 4, 5, 6, 7, 8, 9, 10} and ν ∈{1.05,

1.1, 1.2, 1.5, 1.8} were examined. One suitable choice is c = 6 and ν = 1.2. It will be used in the following

for the sake of compactness.

Selected results from 20 repeated experiments regarding the impact of the noise level are recorded in Ta-

ble 6 and 7. N1 = 250 design points were placed by LHS in step 1. N2 = 100 design points were used for

the OED cycles and µ0.5-initialization was applied. Table 6 and 7 show that the prediction error succes-

sively reduces along the algorithm steps. The OED cycles insignificantly reduce the approximation error,

but significantly reduce the uncertainty. Note that I bases on 250 observations after step 2 and 3, respec-

tively, on 350 after step 5 and on 450 after step 7. If compared with the compressor case study, an in-

creasing noise-level just moderately increases the prediction error. Another observation is that the 2-input

compressor system with the cliff-type response surface needs the same 250 design points in the 1st stage

as the 5-input Friedman test function to work properly.

Table 8 to 9 present results for different initialization strategies in step 4 (noise-free observations). For

higher dimensional problems, a grid-type initialization is little suitable: Just 3 points per dimension result

already in N2 = 243 design points (but 2 points per dimension only for N2 = 32). In this study, however,

N2 = 100 design points are sufficient (but N2 = 50 are too few). Due to the exponential growth with the

number of variables, grid initialization does not scale well. The results for random initialization and for

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 25: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

µ0.5-initialization using the same number of design points N2 are similar regarding the approximation er-

ror. However, regarding uncertainty reduction, the µ0.5-initialization provides for significantly better re-

sults after the first OED cycle. This is advantageous if due to time or cost reasons only a single OED run

is carried out.

Table 6: Median of RMSE on Test Data for Different Noise Levels and µ0.5-Initialization (Friedman)

Result

after step

Median Jgrid

σ 2 = 0 σ

2 = 0.2 σ 2 = 0.5 σ

2 = 1.0

2 2.1336·100 2.1570·100 2.1999·100 2.3932·100

3 7.3723·10-1 7.9404·10-1 1.0652·100 1.4890·100

5 6.8497·10-1 6.8290·10-1 8.7728·10-1 1.3345·100

7 6.6146·10-1 7.0122·10-1 8.5989·10-1 1.2849·100

Table 7: Median of Determinates of Full Information Matrix for Different Noise Levels and µ0.5-Initialization

(Friedman)

Result

after step

Median det(Iav)

σ 2 = 0 σ

2 = 0.2 σ 2 = 0.5 σ

2 = 1.0

2 2.0843·1091 7.9850·1090 5.3806·1089 6.4159·1089

3 7.3674·1075 3.0443·1073 2.6882·1072 1.0120·1081

5 7.7835·1093 3.1339·1092 2.1024·1093 1.3877·1093

7 8.0735·10106 8.7316·10109 6.2212·1096 2.8957·10102

Table 8: Median of RMSE on Test Data for Different Sample Sizes N2 for Random, Grid and µ0.5-Initialization

(Friedman)

Result

after

step

Median Jgrid

Random Grid µ0.5

N2 = 100 N2 = 150 N2 = 200 N2 = 250 N2 = 243 N2 = 50 N2

= 100 N2 = 150 N2

= 200 N2 = 250

2 2.1073 2.1311 2.1311 2.1336 2.1336 2.1217 2.1217 2.1217 2.1217 2.1217

3 0.7083 0.7555 0.7274 2.1336 0.7083 1.1379 1.1379 1.1379 1.1379 1.1379

5 0.7087 0.7555 0.7232 0.7526 0.7511 1.0602 0.78052 0.7630 0.64826 1.1523

7 0.6679 0.6862 0.7233 0.7114 0.7171 1.0226 0.6074 0.7159 0.5975 0.6910

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 26: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

The approximation quality is visualized for an example run in Fig. 10. The predictions are plotted over

the true values after step 3, 5 and 7, for noisy observations with σ2 = 0.5. By the OED in the 2nd stage the

performance on the test data set could be significantly improved. It is remarked that OED design points

are often placed in “difficult” regions which is good to obtain a reliable model but dilutes the average

Table 9: Median of Determinates of Full Information Matrix for Different Sample Sizes N2 for Random, Grid

and µ0.5-Initialization Initialization (Friedman)

Result

after

step

Median det(Iav)

Random Grid µ0.5

N2 = 100 N2 = 150 N2 = 200 N2 = 250 N2 = 243 N2 = 50 N2

= 100 N2 = 150 N2

= 200 N2 = 250

4+5 4.1387·1091 5.0697·1098 5.9350·1099 4.0994·10109 5.7916·10109 1.2·1081 1.5·10105 5.5·10117 4.0·10126 2.3·10131

6+7 1.9232·10102 3.8110·10115 4.4677·10122 1.1151·10128 4.2933·10127 6.8·1086 1.7·1094 6.0·10126 5.2·10121 1.1·10154

Fig. 10. Predictions vs. true value for initial model after step 3 (top), step 5 (middle), and step 7 (bottom) for

identification data (left) and test data (right) in case of noise data (σ2 = 0.5) (Friedman).

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 27: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

numerical performance of a model. Note that due to the large number (4·106) of test points the point cloud

appears as a continuum.

The presented results are better than results found in the literature: Meyer et al. [47] generated 100 train-

ing data sets each with 200 samples and 100 test data sets each with 1000 samples. They compared the

mean of the mean squared test data set errors for 10 different regression methods. Their MSEs range be-

tween 3.22 and 11.87 i.e. the RMSEs range between 1.79 and 3.45. Binev et al. [7] compared different

approximators. They generated uniformly distributed data sets: a test data set with 106 points and different

training data sets with 10j ,j ∈{3, 4, 5, 6}, points, respectively. In case of using 103 points, they obtained

RMSEs ranging between 1.38 and 6 on the test data set.

7. CONCLUSIONS AND OUTLOOK

This article addressed optimal experiment design for identifying both local model as well as partition pa-

rameters of Takagi-Sugeno models for nonlinear regression problems. The optimal experiment design re-

quires making assumptions on the unknown parameter values to calculate the parameter sensitivities,

where already small deviation of the partition parameters may cause identification failure. To handle this

problem, a model-free design phase (in this work optimized Latin Hypercube Sampling) is used to make

structural decisions and to estimate an initial model of sufficient quality for the subsequent optimal mod-

el-based experiment design. To cope with the generally remaining parametric error, a robust sequential

optimal design procedure is applied.

The case studies showed that it is essential to obtain a model of sufficient quality particularly regarding

the partitioning in the model-free design phase to initialize the optimal design stage. For this the numeri-

cal optimization of the model parameters resulting from Fuzzy-c-Means clustering and ordinary least

squares was essential. Then, in the case studies just one or two optimal design cycles were sufficient to

significantly reduce the parametric uncertainty. Therefore, for problems with little a-priori knowledge on

the nonlinearities, the authors recommend to use a larger number of design points in the first design stage.

For the optimal design stage, a smaller number of design points was sufficient; for practical purposes with

limited resources a single OED cycle may often be sufficient. Exploiting the TS model structure, the de-

sign points can be initialized in selected locations on the partition/design-space boundaries to relieve the

OED problem.

Different research opportunities remain for the future: At first, the presented methodology can be im-

proved: Other than the standard widespread alphabet criteria can be investigated to better assess the suita-

bility of a design, as already proposed in [21] in context of physical models. This includes multi-criteria

assessment/optimization approaches. The FIMs were often observed to be approximately ill-conditioned;

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 28: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

also due to the different groups of local model vs. partition parameters. Application of scaling and sepa-

rate designs for subgroups can be considered here. Certainly, more efficient optimization methods than

the approximated gradient descent can be deployed. A different area of future research is OED to identify

TS models for nonlinear dynamic systems, which requires substantial changes due to the additional tem-

poral constraints and the issue of noise modelling. Finally, experiments with laboratory test stands would

provide for more realistic observations regarding disturbances.

ACKNOWLEDGEMENTS

The authors thank the reviewers for their constructive comments that helped to significantly improve the

paper.

REFERENCES

[1] M. Alizadeh, M. Gharakhani, E. Fotoohi, R. Rada, Design and analysis of experiments in ANFIS modeling

for stock price prediction, Int. Journal of Industrial Engineering Computations 2 (2011) 409-418.

[2] C. G. Atkeson, A.W. Moore, S. Schaal, S, Locally weighted learning, Artificial Intelligence Review 11

(1997) 11-17.

[3] M. Ayeb, H. Theuerkauf, C. Wilhelm, T. Winsel, Robust identification of nonlinear dynamic systems using

design of experiments, Proc. IEEE Conf. on Computer Aided Control Systems Design (2006) 2321-2326.

[4] R. Babuska, Fuzzy modelling for control, Kluwer Academic Publishers, Norwell, MA, USA (1998).

[5] J. Belz, K. Bamberger, O. Nelles, Order of experimentation for metamodeling tasks, Proc. Int. Joint Confer-

ence on Neural Networks (IJCNN) (2016) 4843-4849.

[6] J.C. Bezdek, Pattern recognition with fuzzy clustering algorithms, Plenum Press, New York (1981).

[7] P. Binev, W. Dahmen, P. Lamby, Fast high-dimensional approximation with sparse occupancy trees, Jour-

nal of Computational and Applied Mathematics 235 (2011) 2063-2076.

[8] G. Bontempi, M. Biratti, H. Bersini, Lazy learning for local modeling and control design, Int. Journal of

Control 72 (1999) 643-658.

[9] M. Buragohain, C. Mahanta, A novel approach for ANFIS modelling based on full factorial design, Appl.

Soft Computing 8 (2008) 609-625.

[10] D.A. Cohn, Neural network exploration using optimal experiment design, Neural Networks 9 (1996) 1071-

1083.

[11] M. Deflorian, S. Zaglauer, Design of experiments for nonlinear dynamic system identification, Proc. 18th

IFAC World Congress (2011) 13179-13184.

[12] M. Deregnaucourt, M. Stadlbauer, C. Hametner, S. Jakubek, H.M. Koegler, Evolving model architecture

for custom output range exploration, Math. and Comp. Mod. of Dyn. Systems 21 (2015) 1-22.

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 29: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

[13] S.K. Doherty, J.B. Gomm, D. Williams, Experiment design considerations for non-linear system identifica-

tion using neural networks, Computers in Chemical Engineering 21 (1997) 327-346.

[14] A. Dürrbaum, A. Kroll, On robust experiment design for identifying locally affine Takagi-Sugeno models,

Proc. IEEE Int. Conf. on Systems, Man, and Cybernetics (SMC) (2016).

[15] T. Ebert, T. Fischer, J. Belz, T.O. Heinz, G. Kampmann, O. Nelles, Extended deterministic local search al-

gorithm for maximin Latin Hypercube Designs, Proc. IEEE Symposium Series on Computational Intelli-

gence (2015) 375-382.

[16] Y. Farzaneh, A. Tootoonchi, A novel data reduction method for Takagi-Sugeno fuzzy system design based

on statistical design of experiment, Appl. Soft Computing 9 (2009) 1367-1376.

[17] V.V. Federow, S.L. Leonov, Optimal design for nonlinear response models, CRC Press, Boca Raton

(2014).

[18] J.T. Flynn, M.J. Keenan, D.H. Sulam, Single-stage evaluation of highly-loaded high Mach-number com-

pressor stages, NASA Technical Report (1970).

[19] E. Forte. E. von Harbou, J. Burger, N. Asprion, M. Bortz, Optimal design of laboratory and pilot-plant ex-

periments using multiobjective optimization, Chem. Ing. Tech. 89 (2017) 645–654.

[20] G. Franceschini, S. Macchietto, Validation of a model for biodiesel production through model-based exper-

iment design, Ind. and Eng. Chem. Res. 46 (2007) 220-232.

[21] G. Franceschini, S. Macchietto, Model-based design of experiments for parameter precision: state of the

art, Chem. Eng. Science 63 (2008) 4846-4872.

[22] J.H. Friedman, Multivariate adaptive regression splines, The Annals of Statistics 19 (1991) 1-67.

[23] J.H. Friedman, E. Grosse, W. Stuetzle, Multidimensional adaptive spline regression, SIAM J. on Scientific

and Statistical Comp. 4 (1983) 291-301.

[24] C. Friedrich, M. Auer, G. Stiech, Model based calibration techniques for medium speed engine optimiza-

tion: investigations on common modeling approaches for modeling of selected steady state outputs, SAE

Int. J. Engines 9 (2016).

[25] I. Gath, A.B. Geva, Unsupervised optimal clustering, IEEE Trans. on Pattern Analysis and Machine Intel-

ligence 11 (1989) 773-778.

[26] G.C. Goodwin, C.R. Rojas, J.S. Welsh, Good, bad and optimal experiments for identification, in T. Glad,

G. Hendeby (Eds.), Forever Ljung in System Identification - Workshop on the occasion of Lennart Ljung’s

60th birthday, Studentlitteratur, Lund, Sweden (2006).

[27] M. Gringard, A. Kroll, On the systematic analysis of the impact of the parametrization of standard test sig-

nals, Proc. IEEE Symposium Series Computational Intelligence (2016).

[28] C. Hametner, M. Stadlbauer, M. Deregnaucourt, S. Jakubek, T. Winsel, Optimal experiment design based

on local model networks and multilayer perceptron networks, Eng. Appl. Art. Int. 26 (2013) 251-261.

[29] B. Hartmann, T. Ebert, O. Nelles, Model-based design of experiments based on local model networks for

nonlinear processes with low noise levels, Proc. ACC (2011) 5306-5311.

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 30: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

[30] T.O. Heinz, O. Nelles, Comparison of excitation signals in nonlinear identification problems (in German),

Proc. 26. Workshop Computational Intelligence (2016) 139-158.

[31] P. Hering, M. Simandl, Sequential optimal experiment design for neural networks using multiple lineariza-

tion, Neurocomputing 73 (2010) 3284-3290.

[32] I. Hertel, M. Kohler, Estimation of the optimal design of nonlinear parametric regression problem via Mon-

te Carlo experiments, Computational Statistics and Data Analysis 59 (2013) 1-12.

[33] M. Hirsch, L. del Re, Adapted D-optimal experimental design for transient emission models of diesel en-

gines, SAE Technical Paper 2009-01-0621 (2009) 115-122.

[34] F. Höppner, R. Kruse, F. Klawonn, T. Runkler, Fuzzy Cluster Analysis, Wiley, New York, 1999.

[35] B.G.M. Husslage, G. Rennen, E.R. van Dam, D. den Hertog, Space-filling Latin hypercube designs for

computer experiments, Optimization and Engineering 12 (2011) 611–630.

[36] S. M. Kay, Fundamentals of statistical signal processing, vol. 1, 20th ed., Prentice Hall, Upper Saddle Riv-

er (2013).

[37] M.R. Kianifar, F. Campean, A. Wood, Application of permutation genetic algorithm for sequential model

building – model validation design of experiments, Soft Computing 20 (2016) 3023-3044.

[38] C.P. Kitsos, Optimal experimental design for nonlinear models, Springer, Berlin (2013).

[39] A. Kroll, Fuzzy systems for modeling and control of complex technical systems (in German), PhD disserta-

tion, Fortschrittberichte VDI, Reihe 8, No. 612 (1997).

[40] A. Kroll, On choosing the fuzziness parameter for identifying TS models with multidimensional member-

ship functions, J. of Artificial Intelligence and Soft Comp. Res. 1 (2011) 283–300.

[41] A. Kroll, A. Dürrbaum, On joint experiment design for identifying partition and local model parameters of

Takagi-Sugeno models, Proc. 17th SysID (2015) 1427-1432.

[42] A. Kroll, S. Soldan, On Data-driven Takagi-Sugeno Modeling of Heterogeneous Systems with Multidi-

mensional Membership Functions, Proc. 18th IFAC World Congress (2011) 14994-14999.

[43] J. Kurzke, Smooth C 8.2: Preparing Compressor Maps for Gas Turbine Performance Modeling, User’s

Manual (2009).

[44] L. Ljung, System identification: Theory for the user, 2nd ed., Prentice Hall, Upper Saddle River, 1999.

[45] J. L. Loepky, J. Sacks, W.J. Welch, Choosing the sample size of a computer experiment: a practical guide,

Technometrics 51 (2009) 366-376.

[46] Matlab, Statistics Toolbox™: User’s guide, R2016, The MathWorks (2016).

[47] D. Meyer, F. Leisch, K. Hornik, The support vector machine under test, Neurocomputing 55 (2003) 169-

186.

[48] J. Möller, R. Pörtner, Model-Based Design of Process Strategies for Cell Culture Bioprocesses: State of the

Art and New Perspectives, New Insights into Cell Culture Technology, S. J. Thatha Gowder (Ed.), InTech

(2017).

[49] O. Nelles, Nonlinear system identification, Springer, London (2001).

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 31: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

[50] H. Petersen, Selection of statistical experimental designs (in German), vol. 3.2, Plan 30.0, ecomed, Land-

berg/Lech (1992).

[51] F.J. Pontes, G.F. Amorin, P.P. Balestrassi, A.P. Paiva, J.R. Ferreira, Design of experiments and focused

grid search for neural network parameter optimization, Neurocomputing 186 (2016) 22-34.

[52] L. Pronzato, Optimal experiment design and some related control problems, Automatica 44 (2008) 303-

325.

[53] K. Röpke, C. Gühmann (eds.), Proc. 1st – 8th Conf. on Design of Experiments (DoE) in powertrain devel-

opment, Expert Verlag, Renningen (2003-2015).

[54] I. Skrjanc, Evolving fuzzy-model-based design of experiments with supervised hierarchical clustering,

IEEE Transactions on Fuzzy Systems 23 (2015) 861-871.

[55] H. Suzuki, M. Yamakita, Input design for hybrid system identification for accurate estimation of submodel

regions, Proc. ACC (2011) 1236-1241.

[56] T. Takagi, M. Sugeno, Fuzzy identification of systems and its application to modelling and control, IEEE

Trans. Systems, Man, and Cybernetics 15 (1985) 116–132.

[57] D. M. Titterington, Optimal design in flexible models, including feed-forward networks and nonparametric

regression, Chap. 23 in: A. Atkinson et al. (Eds.), Optimum Design, Kluwer Academic Publishers, Norwell,

MA, USA (2001) 261-273.

[58] A. Varsha, A. Rainer, P. Santiago, R. Umale, Global COR iDOE methodology: an efficient way to cali-

brate medium and heavy commercial vehicle engine emission and fuel calibration, SAE Technical Paper

2017-26-0032 (2017).

[59] E. Walter, L. Prozato, Identification of parametric models from experimental data, Springer, London

(1997).

[60] H.O. Wang, K. Tanaka, M.F. Griffin: Parallel distributed compensation of nonlinear systems by Takagi-

Sugeno fuzzy model, Proc. Fuzzy Systems (1995) 531–538.

[61] C. Werner, G. Preiß, F. Gores, M. Griebenow, S. Heitmann, A comparison of low-pressure and super-

charged operation of polymer electrolyte membrane fuel cell systems for aircraft applications, Progress in

Aerospace Sciences 85 (2016) 51–64.

[62] X.L. Xie, G.A. Beni, Validity measure for clustering, IEEE Trans. PAMI 3 (1991) 841-846.

[63] Ö. Yeniay, Comparative study of algorithms for response surface optimization, Mathematical and Compu-

tational Applications 19 (2014) 93-104.

[64] S. Zaglauer, Bayesian design of experiments for nonlinear dynamic system identification, Proc. Simutools

’12 (2012) 85-92.

[65] C. Zanchettin, L.L. Minku, T.B. Ludermir, Design of experiments in neuro-fuzzy systems, Int. Journal of

Computational Intelligence and Applications 9 (2010) 137-152.

[66] R. Zimmerschied, R. Isermann, Regularization techniques for identification using local-affine models (in

German), at-Automatisierungstechnik 56 (2008) 339-349.

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 32: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

7. APPENDIX

7.1 TS system

Table 10: Median of CPU Sec used for each Step for Different Noise Levels

For step Median CPU seconds

σ2 = 0 σ

2 = 0.2 σ2 = 0.5 σ

2 = 1.0

2-FCM 1,22·10-1 1,27·10-1 1,26·10-1 2,17·10-1

2-GLS 5,98·10-3 5,91·10-3 5,89·10-3 6,51·10-3

3 2,10·100 8,08·10-1 8,25·10-1 1,87·100

4 1,56·101 1,63·101 1,62·101 2,88·101

5 6,05·10-1 9,17·10-1 1,46·100 2,08·100

6 2,25·100 3,20·100 4,13·100 1,19·101

7 3,60·10-1 1,08·100 1,75·100 2,73·100

Total run 2,23·101 2,47·101 2,59·101 4,79·101

Table 11: Average Prediction Errors after Different Design and Identification Steps for TS-System

Result

after

step

Jgrid

1st / 2nd / 3rd

Quartiles

Jgrid

Mean

JRMSE

1st / 2nd / 3rd

Quartiles

JRMSE

Mean

2 8.7812·101 /

1.0504·100 /

1.1846·100

1.0352·100 7.0953·10-1 /

8.7548·10-1 /

1.0408·100

8.6131·10-1

3 3.6973·10-7 /

7.1462·10-7 /

1.6345·10-6

1.8018·10-6 2.0808·10-7 /

2.7578·10-7 /

3.5221·10-7

3.2768·10 -7

5 1.4077·10-7 /

2.6699·10-7 /

3.8541·10-7

3.4557·10-7 1.8203·10-7 /

3.5829·10-7 /

6.4529·10-7

5.6339·10-7

7 1.4290·10-7 /

2.1609·10-7 /

3.1243·10-7

2.9563·10-7 2.0237·10-7 /

3.2884·10-7 /

5.7163·10-7

5.4470·10 -7

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 33: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

7.2 Axial compressor characteristic map

Table 12: Average Uncertainty after Different Design and Identification Steps for for TS-System

Result

after

step

detI (a)

1st / 2nd / 3rd

Quartiles

detI (v)

1st / 2nd / 3rd

Quartiles

detI (av)

1st / 2nd / 3rd

Quartiles

2 2.4132·104 /

2.8664·104 /

3.4696·104

6.9720·1012 /

3.9634·1013 /

3.5590·1015

4.2021·1013 /

6.4552·1015 /

1.3787·1018

3 2.5384·104 /

3.0746·104 /

3.6591·104

3.0613·1012 /

1.2738·1013 /

2.8361·1014

7.2184·1015 /

2.2462·1016 /

7.4214·1017

5 2.0811·108 /

2.2192·108 /

8.1389·1026

3.9808·1019 /

4.6136·1019 /

5.5218·1019

5.5739·1026 /

6.6454·1026 /

8.1389·1026

7 1.8156·1013 /

1.8671·1013 /

4.0044·1035

4.7782·1023 /

5.0811·1023 /

5.4317·1023

3.1675·1035 /

3.4773·1035 /

4.0044·1035

Table 13: Median of CPU Sec used for each Step for Different Noise Levels

For step Median CPU seconds

σ2 = 0 σ

2 = 0.2 σ 2 = 0.5 σ

2 = 1.0

2-FCM 1 2.18 1.04 2.10

2-GLS 0 0 0 0

3 9.45 25.20 17.72 38.28

4 4.69 10.72 7.16 11.34

5 8.56 16.83 13.01 26.67

6 4.14 11.78 7.60 11.97

7 8.93 7.27 15.94 25.62

Total run 36.77 73.99 62.47 115.97

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 34: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

Table 14: Average Prediction Errors after Different Design and Identification Steps for TS-System

Result

after

step

Jgrid

1st / 2nd / 3rd

Quartiles

Jgrid

Mean

JRMSE

1st / 2nd / 3rd

Quartiles

JRMSE

Mean

JRMSE

Mean

2 1.3442·10-1 /

1.4327·10-1 /

1.4890·10-1

1.4161·10-1 1.1686·10-1 /

1.2774·10-1 /

1.3976·10-1

1.2901·10-1 1.2901·10-1

3 6.1401·10-2 /

6.3405·10-2 /

6.8075·10-2

6.5256·10-2 3.6716·10-2 /

4.1524·10-2 /

4.7041·10-2

4.0593·10-2 4.0593·10-2

5 5.8590·10-2 /

5.9154·10-2 /

6.1839·10-2

6.0463·10-2 3.9939·10-2 /

4.2413·10-2 /

4.7574·10-2

4.2892·10-2 4.2892·10-2

7 5.8491·10-2 /

5.9876·10-2 /

6.1377·10-2

5.9331·10-2 4.1096·10-2 /

4.5062·10-2 /

4.9498·10-2

4.4760·10-2 4.4760·10-2

Table 15: Average Uncertainty after Different Design and Identification Steps for for TS-System

Result

after step

detI (a)

1st / 2nd / 3rd

Quartiles

detI (v)

1st / 2nd / 3rd

Quartiles

detI (av)

1st / 2nd / 3rd

Quartiles

2 1.3002·104 /

2.1557·104 /

5.8797·104

2.0971·1014 /

3.1150·1015 /

7.9937·1016

1.8187·1016 /

1.2957·1017 /

9.5446·1017

3 3.9711·101 /

3.3280·102 /

4.5881·103

1.0416·1011 /

3.4340·1012 /

3.2037·1016

1.7999·109 /

2.8777·1012 /

1.5042·1016

5 7.9640·102 /

4.1709·104 /

2.0184·1019

1.1300·1013 /

1.1257·1015 /

1.2563·1020

1.3052·1011 /

9.2080·1016 /

2.0184·1019

7 1.9630·104 /

3.7726·105 /

2.5795·1023

1.9118·1016 /

4.0279·1017 /

5.2894·1021

-1.8831·10-4 /

1.6445·1016 /

2.5795·1023

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/

Page 35: On optimal experiment design for identifying premise and …141.51.54.2/MRT/Bibliothek/Publikationen/2017-Kroll_Duerrbaum-ASoC-… · strated for three nonlinear regression problems:

7.3 Friedman test function 1

Table 16: Median of CPU Sec used for each Step for Different Noise Levels

For step Median CPU seconds

σ 2 = 0 σ

2 = 0.2 σ 2 = 0.5 σ

2 = 1.0

2-FCM 1.89·100 1.74·100 1.21·100 1.14·100

2-GLS 4.17·10-03 4.08·10-3 3.19·10-3 4.10·10-3

3 1.40·102 1.20·102 1.21·102 1.12·102

4 3.48·102 2.38·102 2.48·102 2.50·102

5 1.26·102 9.45·101 8.63·101 1.22·102

6 3.62·102 2.57·102 2.03·102 1.97·102

7 9.96·101 1.03·102 7.33·101 1.12·102

Total run 1.10·103 8.56·102 6.98·102 7.54·102

Table 17: Average Prediction Errors after Different Design and Identification Steps for TS-System

Result

after step

Jgrid

1st / 2nd / 3rd

Quartiles

Jgrid

Mean

JRMSE

1st / 2nd / 3rd

Quartiles

JRMSE

Mean

2 2.1073·100 /

2.1311·100 /

2.1773·100

2.1455·100 1.5015·100 /

1.6224·100 /

1.7425·100

1.6270·100

3 2.1073·100 /

7.0834·10-1 /

8.6846·10-1

7.6682·10-1 3.1722·10-1 /

3.6678·10-1 /

3.9201·10-1

3.6281·10-1

5 6.3567·10-1 /

7.0872·10-1 /

7.6114·10-1

7.01016·10-1 4 .6259·10-1 /

5.1444·10-1 /

5.5471·10-1

5.1644·10-1

7 6.2547·10-1 /

6.6787·10-1 /

7.5225·10-1

6.8146·10-1 5.0196·10-1 /

5.4895·10-1 /

5.6643·10-1

5.4558·10-1

© 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/