Top Banner

of 12

1-s2.0-S1751570X10000324-main

Apr 14, 2018

Download

Documents

M s
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/29/2019 1-s2.0-S1751570X10000324-main

    1/12

    Nonlinear Analysis: Hybrid Systems 5 (2011) 242253

    Contents lists available at ScienceDirect

    Nonlinear Analysis: Hybrid Systems

    journal homepage: www.elsevier.com/locate/nahs

    A recursive identification algorithm for switched linear/affine models

    Laurent Bako , Khaled Boukharouba, Eric Duviella, Stphane LecoeucheUniv Lille Nord de France, F-59000 Lille, France

    EMDouai, IA, F-59500 Douai, France

    a r t i c l e i n f o

    Article history:Received 30 November 2009Accepted 4 May 2010

    Keywords:

    Switched systemsPiecewise affine systemsSystem identificationRecursive identificationOpen channel systems

    a b s t r a c t

    In this work, a recursive procedure is derived for the identification of switched linearmodels from inputoutput data. Starting from some initial values of the parameter vectorsthat represent the different submodels, the proposed algorithm alternates between dataassignment to submodels and parameter update. At each time instant, the discrete stateis determined as the index of the submodel that, in terms of the prediction error (or the

    posterior error), appears to have most likely generated the regressor vector observed atthat instant. Given theestimated discrete state, theassociated parameter vector is updatedbased on recursive least squares or any fast adaptive linear identifier. Convergence of the

    whole procedure although not theoretically proved, seems to be easily achieved whenenough rich data are available. It has been also observed that by appropriately choosingthe data assignment criterion, the proposed on-line method can be extended to deal alsowith the identification of piecewise affine models. Finally, performance is tested through

    some computer simulations and the modeling of an open channel system. 2010 Elsevier Ltd. All rights reserved.

    1. Introduction

    Given a mixture of data generated by a set of interacting linear/affine discrete-time submodels, the switched affineregression problem refers to theproblem of estimating the parameter vectors (PVs) associated with each of these submodels.This is known to be a challenging identification problem as it involves both data assignment to submodels and parameterestimation. It can be fairly observed that a common weakness to the majority of the existing contributions is thecomputational complexity. For example, feasibility of the optimization approach reported in [ 1] is restricted to small sizeproblems due to its considerable cost. The Minimum Partition into Feasible Subsystems (Min-PFS) solution proposed in [2]is a multi-step greedy algorithm that is likely to be computationally demanding. The algebraicgeometric method [35]embeds the regression data into a higher-dimensional space whose dimension increases exponentially with respect to thedimensions of the considered switched system. The clustering based identification algorithm [6] constructs a neighboringset for each regressor and so, when the number of training data is large, a computational difficulty may arise. In theBayesian approach [7], each parameter vector is in principle regarded as a random variable and therefore substituted in theidentification procedure for itsprobability density function, which in turn is approximated througha particle filteringmodel.However this latter approximation comes with a significant increase of computational complexity because fine precisionrequires the number of particles to be large.

    Additionally, most of the published methods for switched systems identification are batch mode algorithms (see alsothe survey paper [8] and some more recent techniques presented in [913]) except the algebraic algorithm derived in[14,15] and further extended in [16]. However, this latter method inherits from its batch version which first appeared in [3],

    Corresponding author at: Univ Lille Nord de France, F-59000 Lille, France. Tel.: +33 327 712 127.E-mail addresses: [email protected] (L. Bako), [email protected](K. Boukharouba), [email protected]

    (E. Duviella), [email protected](S. Lecoeuche).

    1751-570X/$ see front matter 2010 Elsevier Ltd. All rights reserved.doi:10.1016/j.nahs.2010.05.003

    http://dx.doi.org/10.1016/j.nahs.2010.05.003http://www.elsevier.com/locate/nahshttp://www.elsevier.com/locate/nahsmailto:[email protected]:[email protected]:[email protected]:[email protected]://dx.doi.org/10.1016/j.nahs.2010.05.003http://dx.doi.org/10.1016/j.nahs.2010.05.003mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.elsevier.com/locate/nahshttp://www.elsevier.com/locate/nahshttp://dx.doi.org/10.1016/j.nahs.2010.05.003
  • 7/29/2019 1-s2.0-S1751570X10000324-main

    2/12

    L. Bako et al. / Nonlinear Analysis: Hybrid Systems 5 (2011) 242253 243

    a problem of dimensionality induced by the polynomial embedding. Beyond the elements of complexity that have beenobserved so far, note that in general, a batch algorithm has a computational cost that depends more than linearly on thenumber of training data. Moreover, batch methods are not convenient for real-time applications in which the data need tobe processed on-line. It is therefore of interest to develop some recursive identification algorithms for hybrid systems.

    In this paper, we present a simple recursive approach to the switched affine regression problem. The data are assumed tobe sequentially acquired. Then, starting from some initial values, we proceed alternately to data classification and parameterupdate on-line. At a given time, the discrete state is inferred based on the information available up to that time and the PVs

    are accordingly refined via, for example, recursive least squares. In comparison with the batch mode methods mentionedabove, it is fair to say that our method, though possibly less effective than some of them, makesit easier to effectively handlehigher-dimensional data or larger amounts of data. Furthermore, it can be used for on-line identification of hybrid systemswhich, to the best of our knowledge, has not received much attention yet. In addition to the references [1416] mentionedabove, the only works which are relevant to the subject of recursive identification for hybrid systems, are the ones reportedin [17,18]. The former runs a bank of parallel linear identifiers and uses a set of decision rules for updating, creating andremoving submodels. The latter reference deals with the problem of recursively separating observations generated by amixture of Gaussian distributions. In contrast to this work, we consider here a set of observations stemming from a numberof interacting linear ARX submodels. Furthermore, we show that with a simple modification of the data assignment criterion,the proposed method is also applicable without much additional cost to the estimation of continuous piecewise affine maps.

    The outline of the paper is as follows. Problem statement is given in Section 2. We describe in Section 3 the proposedalgorithm for recursive identification of switched systems with arbitrary switches. In Section 4 we study the particularcase when the switched model is intended to approximate a continuous nonlinear system. In this scenario the switching

    mechanism takes a particular form that can also be inferred from data along with the submodels parameters. Finally, somenumerical results are shown and discussed in Section 5.

    2. Problem statement

    We consider a switched linear model defined by

    y(t) = tx(t) + et(t), (1)

    wherex(t) Rn is referred to as theregression vector,y(t) R is designated as the output of themodel,t S = {1, . . . , s}is the discrete state and t R

    n is the associated parameter vector (PV). The sequences of errors

    ej(t)

    , j = 1, . . . , s, areassumed to be zero-mean Gaussian, i.i.d random variables. Here, sequences of errors related to two different submodels arenot required to have the same variances. When dealing with the inputoutput behavior of hybrid dynamical systems, thevector x(t) takes sometimes the form

    x(t) =y(t 1) y(t na) u(t 1)

    u(t nb) (2)

    where u(t) Rnu andy(t) R are respectively the input and output of the considered system, na and nb are the orders. Themodel (1) is then designated as a Switched Auto-Regressive eXogenous (SARX) model.

    Problem. Given observations {x(t),y(t)}Nt=1 generated by a switched linear model of the form (1), with x(t) defined as in

    (2), we are interested here in estimating the parameter vectors (PVs)

    js

    j=1under the assumption that the discrete mode

    sequence {t}Nt=1 is not known.

    We start by recalling from Ref. [19] that the problem of inferring a switched model such as (1) from data, admits multiplesolutions so that the identification problem is not well posed. If the structural indexes na and nb are not fixed, then we canfind for example a trivial switched model consisting of one single submodel with sufficiently large orders that fits all the

    finite set of measurements. Even if we assign finite and fixed values to na and nb, there are still infinitely many switchedmodels that explain the data. For example, it can be verified that there is a switched model with s = N submodels that canreproduce the data. In order to alleviate the identifiability issue, we will assume in this paper that the orders na and nb arefinite, equal for all submodels and known a priori. We will also assume (whenever s is unknown) that an upper bound s Non the number of submodels is available.

    3. Algorithm description

    In this section we present the main contribution of the paper. From a conceptual viewpoint, the approach to be presentedbelow is comparable to the Bayesian learning algorithm of Juloski et al. [7]. The method derived therein was fairly argued bythe authors to allow for the incorporation of prior physical knowledge, when available, in the identification procedure. Wewill see that the same argument holds also, at least to some extent, for the method suggested in the present paper but withmuch less computational load. In the Bayesian approach, each parameter vector j is treated as a random variable which is

    characterized by its probability density function pj (). Therefore, instead of estimating directly each j as a single vector,one tries to estimate the density function pj (), which is done by approximating it with a particle filtering model. Based on

  • 7/29/2019 1-s2.0-S1751570X10000324-main

    3/12

    244 L. Bako et al. / Nonlinear Analysis: Hybrid Systems 5 (2011) 242253

    the estimated probability functions, the data are sequentially assigned to the competing submodels. From a computationalviewpoint, the particle filtering approximation is equivalent to replacing each parameter vector j with a number M of

    particles

    j,kM

    k=1. This idea seems to be robust to noise and potential model mismatches. However, it comes with an

    important cost in numerical complexity because for the particle filtering approximation to be accurate, the number M ofparticles needs to be large enough.

    As an alternative to that method, we derive here a simple recursive algorithm that alternates between discrete stateestimation and parameter update via recursive least squares. Unlike the Bayesian procedure, only one parameter vector is

    updated at each time for each submodel. This results in a much less computationally expensive but efficient identificationscheme.

    Given the number s of submodels,1 we can pose the switched system identification problem as the problem of finding a

    set

    js

    j=1 Rn of parameter vectors and an associated switching sequence {t}

    Nt=1 S such that the cost function

    J

    1, . . . , s; 1, . . . , N

    =N

    t=1

    y(t) tx(t)

    2(3)

    is minimized. Alternatively we can eliminate t in the cost function by considering instead

    J

    1, . . . , s

    =N

    t=1

    mini=1,...,s

    y(t) i x(t)

    2(4)

    which is nonconvex in general. Finding directly an optimal solution to this problem may involve searching exhaustively overthe set SN = S S of all possible discrete state paths, an ideal path for which there would be a set of correspondingparameter vectors that fits the data. However, this is a hard combinatorial problem.

    To avoid this difficulty, we propose in this paper a suboptimal recursive procedure. To proceed, we start by introducinga few notations. Wherever used in the paper, the hat symbol put above a variable is used to mean the estimate of that

    variable. For any j = 1, . . . , s, we denote with j(t) the estimate ofj obtained by recursive least squares and based on the

    observations {x(k),y(k)}tk=1 available up to time t. Associated with j(t) is the matrix Lj(t) Rnn whose inverse is the

    correlation matrix of the regressors generated by mode j. It is recursively defined as follows

    L1j

    t|t = j

    = L1j (t 1) +x(t)x(t)

    L1j

    t|t = j

    = L1j (t 1),

    where ]0, 1[ is a forgetting factor. Here, the notation such as L1j t|t = j is used to mean the conditional value ofL1j (t) given the knowledge that t = j. Note that prior knowledge about the values of the parameters, when available,can be incorporated in the choice of the initial values of the vector j(0) and the matrix L

    1j (0) (correlation matrix of the

    regressors generated by submodel j).

    If the pair of datax(t),y(t)

    is generated by the submodel indexed by j, i.e. if t = j, then j and Lj are updated by

    e.g., recursive least squares as follows

    j

    t|t = j

    = j(t 1) + qj(t)j(t)

    Lj

    t|t = j

    = 1Lj(t 1) 1qj(t)x(t)

    Lj(t 1),(5)

    where

    j(t) = y(t) j(t 1)x(t)

    qj(t) =Lj(t 1)x(t)

    +x(t)Lj(t 1)x(t).

    (6)

    But if the discrete state t is known to be different from an index j, then no updating is done for j and Lj,

    j

    t|t = j

    = j(t 1)

    Lj

    t|t = j

    = Lj(t 1).(7)

    Now since t is unknown, we need to replace it by its estimate in the previous equations. The discrete state is thereforeestimated as the index of the submodel that has most presumably generated the pair of data (x(t),y(t)) in the sense of acertain decision criterion Dj(t) that is,

    t = arg minj=1,...,s

    Dj(t). (8)

    1 We will see later that it is enough to know an upper bound s s on the number of submodels.

  • 7/29/2019 1-s2.0-S1751570X10000324-main

    4/12

    L. Bako et al. / Nonlinear Analysis: Hybrid Systems 5 (2011) 242253 245

    It remains to choose the decision criterion Dj(t). This can in fact be done in many different ways. Since no model hasbeen specified for the discrete state dynamics in the SARX (1), Dj(t) is taken here as a function of the fitting prior error

    y(t) x(t)j(t 1) or the posterior error y(t) x(t)j(t|t = j). In this way, two decision functions are studied here:

    The prior error based function

    Dprior

    j (t) = x(t)

    j(t 1)

    j(t 1)2 ,=

    |j(t)|j(t 1)2

    (9)

    withj(t) = 1 j(t) and x(t) = y(t) x(t), The posterior error based function

    Dpost

    j (t) =

    x(t)jt|t = jj

    t|t = j

    2

    =

    |j(t)|

    j(t)

    1 +j(t 1) + j(t)qj(t)2

    2

    (10)

    with j(t) defined by j(t) = +x(t)Lj(t 1)x(t).

    The criteria (9) and (10) can be interpreted as the distances from x(t) to the linear hyperplanes that are normal to the

    vectors j(t 1) and jt|t = j respectively. While the criterion (9) depends only on the prior estimate j(t 1), thedecision function (10) includes also the information storage matrix Lj(t 1), which has an influence on the future evolution

    ofj(t 1).Note that the concept of the method is not restricted to the recursive least squares (RLS) as an identifier. In fact any linearand fast recursive algorithm can be used for the estimation of the parameters. For example we can use algorithms such asthe equation error identifier [20], the Projection algorithm, the Kalman filter based identifier [21]. It might be advisable toendow those routines with forgetting capabilities in such a way that effects of wrong prior decisions concerning the discretemode can be quickly removed.

    Before summarizing the identification algorithm, we need first to make the following two assumptions on the true PVsand the generated data.

    Assumption 1. For any time index t, and for any couple (i,j) S2 of mode indexes,i x(t)i2 =j x(t)j2 i = j, (11)

    where j =

    1 j

    .

    Assumption 2. For any i S, the errors

    i x(t) : t Ii

    , with

    Ii =

    t : i = arg minj=1,...,s

    j x(t)j2

    (12)

    form a realization of the stochastic process {ei(t) : t Ii}.

    Assumption 1 states that any data vector x(t) can be fit with only one linear submodel. The purpose of this assumption isobviously to remove any ambiguity in the inference of the discrete state if the true parameters were exactly known. Strictlyspeaking this assumption does not need however to hold for the algorithm to behave correctly. For, if a vector x(t) happensto match more than one submodels, then it can be assigned arbitrarily to one of those submodels or to all of them. It istherefore made here to just set a clear basis for the upcoming discussions. To some extent, Assumption 2 is intended to

    make sure that if the estimates j, j = 1, . . . , s, had to converge (in some sense) to the true values, then the discrete statewill, despite the presence of noise, remain well estimated after the convergence. In effect this latter assumption implies that

    wrong data assignment do not disrupt the convergence more than the noise related to each submodel would do if therewere no mode assignment task to perform (i.e., if the complete sequence of mode sequence were known).

  • 7/29/2019 1-s2.0-S1751570X10000324-main

    5/12

    246 L. Bako et al. / Nonlinear Analysis: Hybrid Systems 5 (2011) 242253

    Algorithm 1 Recursive identification of SARX models.

    Initialization: Set Lj(0) = p0I, with p0 1 and draw different values for j(0), j = 1, . . . , s at random. Some priorknowledge when available, can also be used in this initialization step.

    For any incoming pairx(t),y(t)

    , do

    Estimate the discrete state as in (8) and update t(t) and Lt(t) using Eq. (5) The other parameter vectors are maintained unchanged:

    j = t,

    j(t), Lj(t)

    =

    j(t 1), Lj(t 1)

    .

    EndFor

    Remark 1. In Algorithm 1, the true number s of submodels need not be known. Given an upper bound s N on thenumber s of submodels, the algorithm is able to operate with s initially presumed submodels and determine s in the end,as the number of modes to which a significant number of data have been affected. After the algorithm converges, the s sunnecessary submodels should stop being assigned data so that in the end, only a negligible amount of data have beenaffected to those submodels. They can therefore be recognized and eliminated.

    By following the identification procedure described above, it is obvious that quality of the data assignment dependsheavily on the number of competing modes and how distinguishable they are from each other. Recall that at the beginning,

    neither the parameters nor the discrete state sequence are known. Therefore, there is such a coupling between the tasks ofmode estimation and parameter identification that initialization may play an important role in the convergence. However,since the estimation of the discrete state results from a comparison between some quantities (involving the estimates

    of the PVs), for the discrete state to be correctly reconstructed, it is not necessary that the estimates i(t), i = 1, . . . , s,exactly lie in the true set {1, . . . , s} of PVs. To see this, assume for example that s = s (for simplicity) and that there is aconstant permutation ofS such that for all i S, and for any t above a certain finite integer T it holds that Di(t) < Dj(t),j = 1, . . . , s, j = i, whenever t = (i). In more details, assume that for any t T and for any i S, it holds that

    x(t)K (i)kx(t) < 0 k = (i)

    x(t)Kij(t)x(t) < 0 j = i

    (13)

    with Kij = aiai aja

    j , ai =

    i

    i2 Rn+1; Kij(t) is defined in a similar way from the estimates

    i(t) andj(t). Then by

    using the decision function (9) or (10), the discrete state will be correctly recovered for t > T. By exact recovery of the

    discrete state we mean that for any i S and any t > T, we will get t = i whenever t = (i). Moreover, by the standardanalysis of the RLS identifier [21], the estimated PVs 1(t) , . . . , s(t) are guaranteed to converge toward the true set of PVsas t provided the regressors related to each discrete state obey a certain persistence of excitation condition (seee.g., [21]). Unfortunately convergence condition such as (13) has a complicated nature that does not lend itself to an easyanalysis. Establishing more clearly how such a condition is related to the data and the switched linear model that generatesit, is a matter that calls for future further investigation.

    4. Application to nonlinear system modeling

    In this section we shall approximate the inputoutput behavior of a smooth dynamical nonlinear system with theswitched linear model described previously (this is in fact the main purpose of PieceWise Affine (PWA) models [ 22]). Weshall particularly provide a simple way of estimating the switching law along with the parameters. The considered nonlinearsystem is represented by a NARX model of the form

    y(t) = f(x(t)) + (t), (14)

    where x(t) is defined as in (2), f is a continuous nonlinear function defined on an open set X Rn, with n = na + nunb. In(14), the term (t) denotes the modeling error which is supposed to have a zero-mean noise structure.

    It is intuitive that trying to capture the global behavior of a nonlinear system such as (14) in a single mathematical modelis likely to result in a highly complicated model structure. As a consequence, such a model may be difficult to exploit inpractice. An alternative is to try to model the nonlinear system as an interaction between a certain number of linear/affinesubmodels, each of which is related to a region of the system operating space. The main advantage in doing so is that linearmodels are simple and there exists an abundant theory for controlling and analysing them.

    Our strategy amounts to making a linearization around some nominal points by taking the first order term in the Taylorseries expansion of the nonlinear function f. Assuming that f C1, the first order Taylor series expansion off is given asfollows

    f(x) = f(cj) +x cj (f)x=cj + (cj,x), (15)

  • 7/29/2019 1-s2.0-S1751570X10000324-main

    6/12

    L. Bako et al. / Nonlinear Analysis: Hybrid Systems 5 (2011) 242253 247

    where (cj,x) accounts for the higher order terms and satisfies limxcj (cj,x) = 0. When x lies in a neighboring ball

    B(cj, j) =x X :

    x cj j of cj (with j > 0 small), the nonlinear term (cj,x) is considered negligible so thatf can be approximated by the linear part of (15). Following this idea, X can be decomposed into a set of a large enoughnumber s of regions B(cj, j), j = 1, . . . , s on which the affine approximation holds. Then, by denoting

    j =

    (f)x=cj f(cj) c

    j (f)x=cj

    , (16)

    the function

    f defined by

    f(x) = j

    [x

    1

    ]ifx B(cj, j) (17)

    approaches f. As such the function f is not well defined as the set of balls B(cj, j), j = 1, . . . , s, is not necessarily a disjointcover ofX. To overcome this problem we redefine j as a function of x in the form j(x) = mini=j x ci so that theeuclidean ball B(cj, j) becomes

    Xj =x X : i = 1, . . . , s,

    x cj2 x ci2 . (18)With this rearrangement, we have sj=1 Xj = X and X

    oi X

    oj = i = j, where X

    oi refers to the interior ofXi. One can

    easily recognize that Xj corresponds indeed to a polyhedron (which is more common in the definition of PWA models [6])of the form

    Xj = x X : Hjx bj , (19)whereHj =

    c1 cj cj1 cj cj+1 cj cs cj

    bj =

    1,j j1,j j+1,j s,j

    ,

    (20)

    with the i,j being defined as i,j =12

    (ci ci c

    j cj). Thenf is modified to be

    f(x) = j

    [x

    1

    ]ifx Xj. (21)

    Thus, Eq. (21) represents a PWAapproximation of thenonlinear continuoussystem defined in (14). Note that in thedefinition(21), ambiguity may still occur on the boundaries ofXj. This issue can be avoided by arbitrarily assigning the points that lieon the common boundaries of any two different sets Xi and Xj, to the one with the smallest index. For now, continuity offunction f in (14) ensures that no numerical consequence is to be feared provided there is a sufficient number of operating

    points cj. However if we want the approximation (21) to also cover the case of discontinuous inputoutput map f, it will benecessary to further revise the definition of the regions Xj that encodes the switching law. To see why, recall that model (21)encodes roughly the fact that if an x in the regression space is close enough to the point cj then f(x) should be close to f(cj)so that the affine tangent to the hypersurface y = f(x) at cj is considered to be a good approximation of the differentiablemap f. Hence this does not hold when f is subject to discontinuities. In order to take into account this kind of situations, wemay heuristically change the sets Xj by incorporating in them an information about the output y. We may hence replacethe sets Xj in (21) with new switching sets defined as

    Xj =

    x X : i = 1, . . . , s,

    [j xx]

    [j cj

    cj

    ] 2

    [i xx]

    [i ci

    ci

    ] 2

    , (22)

    with x =

    x1

    and ci being defined likewise.Now, we use the identification algorithm presented in Section 3 to identify the PWA model (21). This includes estimating

    the parameter vectors j and the points cj, j = 1, . . . , s, and determining afterwards the guardlines parameters Hj and bjthrough Eq. (20). While the PVs j are still updated with the Eqs. (5), we need to estimate t slightly differently. This isbecause, in contrast to the switched model (1), the PWA model imposes the switching mechanism. The following

    t = arg minj=1,...,s

    [Dprior

    j (t) + x(t) cj(t 1)2], (23)

    with > 0 a regularization parameter, enables us to reflect this constraint into the process of deciding the discrete state.For any j, the point cj is recursively estimated as the mean of the regressors vectors x(t) that are or presumed (based on thedecision function) to be relevant to mode j. Together with the parameters j in (5), the operating points cj are updated ateach estimation step as

    cj

    t|t = j

    = cj(t 1) + (1 )x(t). (24)

    Thus, the above formulation of the PWA regression problem allows us to estimate the PVs along with the switching sets

    in a simple one step and recursive manner. This feature of our method is important since the problem of estimating theswitching sets is traditionally handled via batch support vector machines classification techniques.

  • 7/29/2019 1-s2.0-S1751570X10000324-main

    7/12

    248 L. Bako et al. / Nonlinear Analysis: Hybrid Systems 5 (2011) 242253

    5. Applications

    5.1. Discussion on the implementation

    The method described in this paper is relatively easy to implement. We will informally discuss here on the one hand,the role of the user-specified parameters such as and and on the other hand, the influence of the initialization. Whendealing with SARX models of the form (1), only the forgetting factor ]0, 1[ needs to be tuned by the user. This parameter

    plays an obvious role in the convergence speed and in the smoothing of the estimates in highly noisy conditions. Relativelysmall forgetting factor (e.g., 0.7 < < 0.9) tend to accelerate the convergence of the algorithm by quickly removing theeffects of the initialization. But, setting the forgetting factor close to one has the virtue of limiting the fluctuations of theestimated PVs when the level of noise is important.

    In the case of piecewise affine models an additional design parameter (see Eq. (23)) needs to be selected by the user.This regularization parameter is meant to constrain close regressors (in the sense of the euclidean distance) to belong to the

    same affine submodel. Using the

    cjs

    j=1in addition to the

    js

    j=1in deciding the discrete state may however heighten the

    influence of the initialization.We now turn to the initialization step, i.e., the way of choosing the initial values of the matrices Lj(0), the PVs j(0), and

    the operating points cj(0), j = 1, . . . , s. As can be expected, this choice may have an impact on the convergence of theproposed hybrid identifier. When some prior knowledge is available about the modes, it can be used in the initialization. In

    the absence of such a knowledge, the simplest way to choose the initial values j(0) and cj(0) is to draw them at random,which has been done in all the experiments carried out in this paper. The matrices Lj(0) can be taken simply in the form

    Lj(0) = p0In, with p0 1. It turns out that when the submodels are conveniently excited within the identification data, thealgorithm seems to converge almost surely at least in the numerical analysis to be described.

    5.2. Numerical example

    We first apply the proposed identification algorithm to a SISO SARX model composed of three linear submodels of ordertwo. The SARX model is defined by

    y(t) = t

    y(t 1) y(t 2) u(t 1) u(t 2)

    + e(t) (25)

    with t {1, 2, 3} and

    1 = 0.0322 0.8017 1.2878 1.1252

    ,

    2 = 0.1921 0.5917 1.1050 0.0316 ,3 =

    1.4746 0.5286 0.4055 0.2547

    .

    (26)

    In order to generate the identification data, the excitation input is chosen to be a zero-mean signal with normal distributionand variance unity. The noise e(t) is a white Gaussian noise and chosen so that the Signal to Noise Ratio (SNR) is equalto 25 dB with respect to the output signal. The switching signal is a piecewise constant function of time that switchesperiodically (each 5 samples) from its current value to a target value picked randomly in {1, . . . , s}. With 1000 different andindependent realizations of the input and the noise, we generate 1000 data sequences of length 2000 each. Then, startingfrom random initialization, identification is performed on each data sequence with a forgetting factor = 0.9. At each runof the algorithm, the first 1000 points are used for identification and the estimated model is validated on the whole sequenceof length 2000. We borrow from [23] the criterion

    FIT = 1 y y2y y1N2 100% (27)to measure the fitting error between the true output sequenceyand the estimated model output sequence y. In this formula,y is the mean of the true output sequence and 1N is a vector of length N with all entries equal to one. Here, y is reconstructedby simulating the estimated switched model. The discrete state is estimated (over the validation data) from the model andthe true output as follows

    t = arg minj=1,...,s

    |y(t) j x(t)|1 + j

    22

    (28)

    with x(t) = [ y(t 1) y(t 2) u(t 1) u(t 2) ]; the first two values of y being drawn at random. We representin Fig. 1(a) the distribution (in term of histograms) of the above defined fitting error over the 1000 independent runs of the

    algorithm. It can be seen that for all these runs, the performance rate FIT is comprised between 70% and 95% with an averageof 88%. This indicates that the algorithm performs very well despite the significant amount of noise. Convergence occurs for

  • 7/29/2019 1-s2.0-S1751570X10000324-main

    8/12

    L. Bako et al. / Nonlinear Analysis: Hybrid Systems 5 (2011) 242253 249

    50 60 70 80 90

    Numberofruns

    FIT

    0

    50

    100

    150

    200

    FIT

    50

    100

    150

    Numberofruns

    0

    200

    60 70 80 9050

    (a) Two passes (off-line). (b) One pass (on-line).

    Fig. 1. Distribution of the criterion FIT over 100 independent runs of the identification algorithm.

    500 1000 15000 2000

    Samples

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    De

    cisioncriterion

    0

    0.14

    Fig. 2. Average of the minimum value 1/ x(t)2 minj[Dpost

    j (t)] of the decision criterion (10) over 1000 independent runs of the identification algorithm.Recall that the identification dataset contain only 1000 samples. This plot (over 2000 samples) is a concatenation of the result of the first pass and that ofthe second pass.

    almost all the 1000 runs. In fact the results ofFig. 1(a) are obtained by letting the algorithm pass twice over the identificationdata sequence. With only one pass, the results get slightly worse (see Fig. 1(b)) as a few numberof runsdo notseem tohave

    converged to the true PVs. Nevertheless the average FIT is still of 87%. This shows the good trend of the algorithm in findingthe true PVs provided it is run over enough data or repeated sufficiently enough when dealing a small dataset. It is worthrecalling that for all the results presented in the paper, the initial estimates of the PVs have been drawn randomly.

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    500 1000 1500Samples

    0 2000 500 1000 1500Samples

    0 2000

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    500 1000 1500Samples

    0 2000

    1.5

    1

    0.5

    0

    0.5

    1

    1.5

    (a) Entries of1 . (b) Entries of 2. (c) Entries of 3.

    Fig. 3. Evolution of the entries of the estimates during recursive estimation for two passes over the dataset. The identification dataset being of size 1000,this plot (on the interval [1, 2000]) is a concatenation of the results of the first pass and those of the second pass.

    Average values and variances of the estimates over the 1000 simulations (after one pass) are given as

    1 =

    0.0211 0.0570.7950 0.0511.2813 0.0871.1077 0.086

    , 2 =0.1822 0.0380.5886 0.0241.1091 0.045

    0.0253 0.060

    , 3 = 1.4494 0.0600.5031 0.0580.4057 0.099

    0.2541 0.078

    .A comparison with the true PVs in (26) again reveals correctness of the estimates for all the three submodels. All theseobservationslead us to theconclusion that theproposed algorithmis efficient, at least on theconsidered simulation example.

  • 7/29/2019 1-s2.0-S1751570X10000324-main

    9/12

    250 L. Bako et al. / Nonlinear Analysis: Hybrid Systems 5 (2011) 242253

    Fig. 4. Geometric characteristics of the open channel system.

    Table 1

    Test on 1000 randomly generated systems. N = 2000, = 0.95.

    SNR 10 dB 20 dB 30 dB 40 dB 100 dBFIT(%) One pass 59.3 75.8 84.2 87.9 89.4FIT(%) Two passes 60.8 79.4 86.8 89.2 92.0

    We also display in Fig. 2 the average minimum trajectory of the decision criterion (10) (in fact, its normalized version)for mode assignment and in Fig. 3 the evolution of the estimated PVs for the three submodels during recursive estimation.It appears from these plots that as convergence starts occurring in the estimates, the decision criterion is driven to zero.One can notice that there is no monotonicity in the decrease of this criterion. The reason for this behavior is simply that theestimates do not converge at the same time for all the submodels.

    Other experiments (not reported here) show that similarly as in [16], good convergence properties of our recursivealgorithm requires that all the submodels be visited frequently enough. Therefore, in case there is a large minimum dwelltime between consecutive switches, there may arise some convergence problems. A solution may then be to start thealgorithm with a single submodel and progressively increment the number of submodels whenever a switch is detected (seee.g., [24] for more details about this idea). If the algorithm is to be performed off-line, it will be also possible to randomizethe order in which the data sequence is read.

    Test on randomly generated systems. To further assess the ability of the algorithm to recover the parameters of a switchedsystem, we now test it under a statistical perspective. We carry out an experiment consisting of 1000 independent runs.That is, each run is performed over a set of 2000 data generated by a randomly generated SARX of the form (25). For eachpass, half the generated data sequence is used for identification and the entire dataset is used for validation. The results of

    this experiment are expressed in terms of the FIT criterion and reported in Table 1. Of course, averaging the values of the FITcriterion obtained with such completely different systems does not express fairly the performance of the method. For, it islikely that a few of those systems will be highly sensitive to noise, thus corrupting the average value of the FIT. Neverthelessthe results ofTable 1 reveal that the algorithm is capable to yield acceptable values for the PVs even when the amount ofnoise is considerable.

    5.3. Modeling of an open channel system

    In this subsection, we consider the problem of modeling an open channel system with a piecewise affine model. Thesystem of interest [25] consists of a water channel with circular cross-section, radius R = 0.9 m, length X = 946 m andreach slope = 0.26% as represented in Fig. 4. The discharge at any point of the channel is assumed to be bounded belowby qmin = 0.5 m

    3/s and above by qmax = 5 m3/s.

    The problem is to model the water flow between both ends of the channel. Therefore, the input and the output of the

    system are defined to be respectively the discharge u(t) at the upstream end and the discharge y(t) at the downstreamend, all expressed in m3/s (see Fig. 4). Let us denote with q(x, t) the water flow inside the channel, where x refers to a one-dimensional spatial coordinate counted along the axis of the channel and t designates time. Then it is known from openchannel flow theory that the discharge q(x, t) obeys the following PDE known as (simplified) Saint Venant equation [ 26]

    q(x, t)

    t+ c(q,x, t)

    q(x, t)

    x d(q,x, t)

    2q(x, t)

    x2= 0, (29)

    where c(q,x, t) in m/s and d(q,x, t) in m2/s are respectively the celerity and diffusion coefficients. Therefore, the water

    flow in the channel can be viewed as a distributed parameter system or a spatio-temporal system. However, as we areinterested in just looking at the discharges at two fix points (input and output) we can model it as a lumped system, that is,an ordinary difference equation.

    Because of the large dimensions of open channel systems, real data are sometimes difficult to collect. It is hence common

    practice to resort to numerical simulation tools in order to numerically reproduce in laboratory the behavior of thosesystems. Given thegeometric characteristics(profile, dimensions, etc.) of the channel and given an excitation input sequence

  • 7/29/2019 1-s2.0-S1751570X10000324-main

    10/12

    L. Bako et al. / Nonlinear Analysis: Hybrid Systems 5 (2011) 242253 251

    Table 2

    s = 2, = 1, = 0.999, number of passes = 2: FIT = 94.4%.

    1 =

    0.8235 0.1587 0.0105 0.0077 0.0010

    2 =

    0.7900 0.1902 0.0058 0.0111 0.0130

    c1 =

    1.2026 1.2043 1.2060 1.1213 1.1207

    c2 =

    3.9598 3.9598 3.9598 3.8684 3.8709

    Table 3

    s = 3, = 1, = 0.999, number of passes = 2: FIT = 96%.

    1 =

    0.5620 0.4203 0.0055 0.0036 0.0065

    2 =

    0.5143 0.4544 0.0060 0.0335 0.0163

    3 =

    0.6082 0.3681 0.0068 0.0088 0.0349

    c1 =

    1.0720 1.0737 1.0756 1.0174 1.0164

    c2 =

    2.5974 2.5974 2.5975 2.4972 2.4982

    c3 =

    4.0723 4.0717 4.0712 4.0296 4.0317

    {u(t)}, we use the software SIC2 to simulate the corresponding output sequence {y(t)}. A dataset of 3000 inputoutputsamples is generated. Note that the considered open channel system belongs to the general class of diffusive systems whichare known to be subject to time delays. Although some delays have been introduced during the data generation step, wewill ignore the influence of these time delays in the modeling procedure. By applying the ideas of Section 4, we will alsotreat the continuous nonlinear open channel system successively as an SARX and as a PWARX.

    Modeling with a switched model. In a first experiment, we view the open channel system as a switched ARX model of the form(1) whose submodels are assumed to have equal orders na = nb = 2. The regressor vector is taken as in (2) except that thereis a one appended at its end so that its dimension is increased by one. To be more specific, we do not enforce the switchingmechanism to follow a particular model and therefore estimate the discrete state as in (8), i.e., based only on a minimizationof the fitting error. The outcome of this experiment is displayed in Fig. 5 in terms of the true output (blue curve), the switchedmodel output (red curve) and the discrete state (green curve). The first 1500 points are used for identification of the PVs andthe obtained switched model is validated over all the available data sequence whose length is 3000. By successively settingthe number of submodels to 1, 2 and 3, the above defined FIT criterion (see Eq. (27)) takes the values 92.4%, 98.2% and 99.1%

    respectively, hence indicating a good approximation capability of the model on the considered data. However as is apparentfrom Fig. 5, the discrete state is arbitrary in the sense that it does not reflect any physical intuition. For an overview, we givethe numerical values of the estimated PVs as

    1 =

    0.5544 0.4238 0.0260 0.0043 0.0000

    (30)

    when s = 1,

    1 =

    0.4097 0.5579 0.0183 0.0202 0.0045

    2 =

    0.8047 0.1573 0.0330 0.0231 0.0763 (31)

    for the case s = 2 and

    1 = 0.5832 0.3785 0.0386 0.0288 0.1131

    2 = 0.8668 0.1121 0.0118 0.0102 0.00393 =

    0.4469 0.5192 0.0360 0.0035 0.0015

    (32)for the case s = 3.

    Modeling with a piecewise affine model. We now turn to a piecewise affine modeling of the open channel system, as describedin Section 4. We proceed similarly to the previous experiment and plot the results in Fig. 6. A difference here with respect tothe foregoing switched model is that the discrete state is enforced to follow a particular form as in Eq. (21). The regularizationparameter is set to = 1 so as to decide the mode based on both the fitting errors and the proximity of the regressorto the operating points. By then determining the discrete state as in Eq. (23), we get a model with much less switches thanin the case of the switched model. The switching signal is nicely shaped so that the estimated submodels may pretend tosome physical interpretation. The price we pay however for this is a slight loss of accuracy as suggested by the lower values

    2 SIC (Simulation de Canaux dIrrigation) users guide and theoretical concepts. CEMAGREF, Montpellier, 1992. http://canari.montpellier.cemagref.fr/.

    http://canari.montpellier.cemagref.fr/http://canari.montpellier.cemagref.fr/http://canari.montpellier.cemagref.fr/
  • 7/29/2019 1-s2.0-S1751570X10000324-main

    11/12

    252 L. Bako et al. / Nonlinear Analysis: Hybrid Systems 5 (2011) 242253

    0 500 1000 1500 2000 2500 3000

    Samples

    Output

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    (a) s = 1, = 0, = 0.999, number of passes = 1: FIT = 92.4%.

    500 1000 1500 2000 2500

    Output

    500 1000 1500 2000 2500

    1

    2

    Mode

    1

    2

    3

    4

    5

    0 3000

    Samples

    0 3000

    Samples

    0 500 1000 1500 2000 2500 3000

    Samples

    Output

    0 500 1000 1500 2000 2500 3000

    1

    2

    3

    Samples

    Mode

    1

    2

    3

    4

    5

    (b) s = 2, = 0, = 0.999, number of passes = 2: FIT = 98.2%. (c) s = 3, = 0, = 0.999, number of passes = 2: FIT = 99.1%.

    Fig. 5. Approximation of the open channel system with switched linear models. (For interpretation of the references to colour in this figure legend, thereader is referred to the web version of this article.)

    0 500 1000 1500 2000 2500 3000

    Samples

    Output

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    0 500 1000 1500 2000 2500 3000

    Samples

    Output

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    (a) s = 2, = 1, = 0.999, number of passes = 2: FIT = 94.4%. (b) s = 3, = 1, = 0.999, number of passes = 2: FIT = 96%.

    Fig. 6. Approximation of the open channel system with piecewise affine models.

    of the FIT criterion. For the sake of insight into the constituent submodels, we report in Tables 2 and 3 the numerical values

    of the PVs j and the operating points cj when the number of submodels is set equal to 2 and 3. We observe that in bothcases the PVs are close and the operating points have almost equal entries. The fact that the entries of the cj are equal can beunderstood by looking at Fig. 6 where the partition of the regression space corresponds to different intervals of value for theoutput (and also the input), which is piecewise constant. For example mode 2 in Fig. 6(a) corresponds to when the outputattains its higher level. Recalling that c

    jis the average of the regressors pertaining to submodel j, we see why the entries of

    cj are constant since y and u are almost constant over the validity domain of mode j = 2.

  • 7/29/2019 1-s2.0-S1751570X10000324-main

    12/12

    L. Bako et al. / Nonlinear Analysis: Hybrid Systems 5 (2011) 242253 253

    6. Conclusion

    In this paper, we have presented a simple recursive algorithm for identifying dynamical switched linear systemsfrom inputoutput measurements. Starting from some initial values (heuristically estimated or randomly drawn fromsome probability distribution) of the parameter vectors of the different submodels, the discrete state is sequentially andinteractively estimated based on prior or posterior knowledge about the modes. At each time instant, the parameters arethen updated accordingly. Note that the considered hybrid models are very general in the sense that no condition is imposed

    on the switching mechanism. In the particular case when the hybrid model is intended to approach a nonlinear system, wehave shown that the switching lawcan be inferred from data in a simple and recursive manner. Despite goodresults obtainedon both synthetic and experimental data, lack of convergence analysis may still be criticized. However, one should note thatthis is in general a hard task in hybrid system identification because of the strong coupling between mode identification andparameter update. Motivated by the good simulation results obtained here, we may envision in future work to investigatethe mathematical analysis of the algorithm.

    Acknowledgements

    We are very grateful to the anonymous reviewers whose constructive comments on an earlier version of this manuscripthave been of great help to us in improving the presentation.

    References

    [1] J. Roll, A. Bemporad, L. Ljung, Identification of piecewise affine systems via mixed-integer programming, Automatica 40 (2004) 3750.[2] A. Bemporad, A. Garulli, S. Paoletti, A. Vicino, A bounded-error approach to piecewise affine system identification, IEEE Trans. Automat. Control 50

    (2005) 15671580.[3] R. Vidal,S. Soatto, Y. Ma,S. Sastry, An algebraicgeometric approach to theidentificationof a class of linearhybrid systems, in:Conference on Decision

    and Control, Maui, Hawaii, USA, 2003.[4] Y. Ma, R. Vidal, Identification of deterministic switched arx systems via identification of algebraic varieties, in: Hybrid Systems Computation and

    Control, Zurich, Switzerland, 2005.[5] L. Bako, R. Vidal, Algebraic identification of switched MIMO ARX models, in: Hybrid Systems: Control and Computation, St Louis MO, USA, 2008.[6] G. Ferrari-Trecate, M. Muselli, D. Liberati, M. Morari, A clustering technique for the identification of piecewise affine systems, Automatica 39 (2003)

    205217.[7] A.L. Juloski, S. Weiland, W. Heemels, A bayesian approach to identification of hybrid systems, IEEE Trans. Automat. Control 50 (2005) 15201533.[8] S. Paoletti, A. Juloski, G. Ferrari-Trecate, R. Vidal, Identification of hybrid systems: a tutorial, Eur. J. Control 13 (2007) 242260.[9] F. Lauer, G. Bloch, Switched and piecewise nonlinear hybrid system identification, in: International Conference on Hybrid Systems: Computation and

    Control, St-Louis, USA, 2008.[10] N. Ozay, M. Sznaier, C. Lagoa, O. Camps, A sparsification approach to set membership identification of a class of affine hybrid systems, in: Conference

    on Decision and Control, Cancun, Mexico, 2008.

    [11] M. Fliess, C. Join, W. Perruquetti, Real-time estimation for switched linear systems, in: Conference on Decision and Control, Mexico, Cancun, 2008.[12] K.M. Pekpe, S. Lecoeuche, Online classification of switching models based on subspace framework, Nonlinear Anal. Hybrid Syst. 2 (2008) 735749.[13] K. Boukharouba, L. Bako, S. Lecoeuche, Identification of piecewise affine systems based on dempster-shafer theory, in: IFAC Symposium on System

    Identification, Saint Malo, France, 2009.[14] R. Vidal, B.D.O. Anderson, Recursive identification of switched ARX hybrid models: exponential convergence and persistence of excitation,

    in: Conference on Decision and Control, Atlantis, Paradise Island, Bahamas, 2004.[15] Y. Hashambhoy, R. Vidal, Recursive identification of switched ARX models with unknown number of models and unknown orders, in: Conference on

    Decision and Control, Seville, Spain, 2005.[16] R. Vidal, Recursive identification of switched ARX systems, Automatica 44 (2008) 22742287.[17] A. Skeppstedt, L. Ljung, M. Millnert, Construction of composite models from observed data, Internat. J. Control 55 (1992) 141152.[18] T. Gustavi, M. Karasalo, X. Hu, C.F. Martin, Recursive identification of a hybrid system, in: European Control Conference, Budapest, Hungary, 2009.[19] R. Vidal, A. Chiuso, S. Soatto, Observability and identifiability of jump linear systems, in: Conference on Decision and Control, Las Vegas, NV, 2002,

    2002.[20] B.D.O. Anderson, C.J. Richard Johnson, Exponential convergence of adaptive identification and control algorithms, Automatica 18 (1982) 113.[21] G.C. Goodwin, K.S. Sin, Adaptive Filtering, Prediction and Control, Prentice-Hall Inc., Englewoods Cliffs, NJ, 1984.[22] E.D. Sontag, Nonlinear regulation: The piecewise linear approach, IEEE Trans. Automat. Control 26 (1981) 346357.[23] L. Ljung, System Identification Toolbox Users Guide, 7th ed., The MathWorks Inc., Natick, MA, 2009.

    [24] L. Bako, G. Mercre, S. Lecoeuche, Online structured subspace identification withapplication to switched linear systems, Internat. J. Control 82 (2009)14961515.

    [25] E. Duviella, L. Bako, P. Charbonnaud, Gaussian and boolean weighted models to represent variable dynamics of open channel systems, in: Conferenceon Decision and Control, New Orleans, LA, USA, 2007.

    [26] V.T. Chow, D.R. Maidment, L.W. Mays, Applied Hydrology, McGraw-Hill, New York, Paris, 1988.