8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
1/401
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
2/401
This page intentionally left blank
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
3/401
Nonparametric System Identification
Presenting a thorough overview of the theoretical foundations of nonparametric sys-tems identification for nonlinear block-oriented systems, Włodzimierz Greblicki and
Mirosław Pawlak show that nonparametric regression can be successfully applied to
systemidentification, and they highlight what you can achievein doingso.
Starting with the basic ideas behind nonparametric methods, various algorithms for
nonlinear block-oriented systems of cascadeandparallel forms are discussed in detail.
Emphasis isplaced onthemost popular systems, HammersteinandWiener, whichhave
applicationsin engineering, biology, andfinancial modeling.
Algorithms usingtrigonometric, Legendre, Laguerre, and Hermiteseries areinvesti-
gated, andthekernel algorithm, itssemirecursiveversions, andfully recursivemodifica-tionsarecovered.Thetheoriesof modern nonparametricregression, approximation, and
orthogonal expansionsarealsoprovided, asarenew approachestosystemidentification.
Theauthors show how to identify nonlinear subsystems so that their characteristics can
be obtained even when little information exists, which is of particular significancefor
practical application. Detailed information about all the tools used is provided in the
appendices.
This book is aimed at researchers and practitioners insystems theory, signal process-
ing, and communications. It will also appeal toresearchers in fields suchas mechanics,
economics, andbiology, whereexperimental dataareused toobtain models of systems.
Włodzimierz Greblicki is a professor at the Instituteof Computer Engineering, Control,
and Roboticsat theWrocław University of Technology, Poland.
Mirosław Pawlak isaprofessor intheDepartment of Electrical andComputer Engineer-
ingat theUniversity of Manitoba, Canada. HewasawardedhisPh.D. fromtheWrocław
University of Technology, Poland.
Both authors have published extensively over the years in the area of nonparametrictheory andapplications.
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
4/401
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
5/401
Nonparametric System
Identification
WŁODZIMIERZ GREBLICKI
Wrocław University of Technology
MIROSŁAW PAWLAK
University of Manitoba, Canada
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
6/401
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University PressThe Edinburgh Building, Cambridge CB2 8RU, UK
First published in print format
ISBN-13 978-0-521-86804-4
ISBN-13 978-0-511-40982-0
© Cambridge University Press 2008
2008
Information on this title: www.cambridge.org/9780521868044
This publication is in copyright. Subject to statutory exception and to the provision ofrelevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.
Cambridge University Press has no responsibility for the persistence or accuracy of urlsfor external or third-party internet websites referred to in this publication, and does notguarantee that any content on such websites is, or will remain, accurate or appropriate.
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
eBook (NetLibrary)
hardback
http://www.cambridge.org/9780521868044http://www.cambridge.org/http://www.cambridge.org/9780521868044http://www.cambridge.org/
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
7/401
Contents
Preface page ix
1 Introduction 1
2 Discrete-time Hammerstein systems 3
2.1 Thesystem 3
2.2 Nonlinear subsystem 4
2.3 Dynamic subsystemidentification 8
2.4 Bibliographic notes 9
3 Kernel algorithms 113.1 Motivation 11
3.2 Consistency 13
3.3 Applicable kernels 14
3.4 Convergencerate 16
3.5 Themean-squared error 21
3.6 Simulationexample 21
3.7 Lemmas and proofs 24
3.8 Bibliographic notes 29
4 Semirecursive kernel algorithms 30
4.1 Introduction 30
4.2 Consistency and convergencerate 31
4.3 Simulationexample 34
4.4 Proofs and lemmas 35
4.5 Bibliographic notes 43
5 Recursive kernel algorithms 44
5.1 Introduction 445.2 Relation tostochastic approximation 44
5.3 Consistency and convergencerate 46
5.4 Simulation example 49
5.5 Auxiliary results, lemmas, and proofs 51
5.6 Bibliographic notes 58
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
8/401
vi Contents
6 Orthogonal series algorithms 59
6.1 Introduction 59
6.2 Fourier series estimate 616.3 Legendreseries estimate 64
6.4 Laguerreseries estimate 66
6.5 Hermiteseries estimate 68
6.6 Wavelet estimate 69
6.7 Local and global errors 70
6.8 Simulation example 71
6.9 Lemmas and proofs 72
6.10 Bibliographic notes 78
7 Algorithms with ordered observations 80
7.1 Introduction 80
7.2 Kernel estimates 81
7.3 Orthogonal series estimates 85
7.4 Lemmas and proofs 89
7.5 Bibliographic notes 99
8 Continuous-time Hammerstein systems 101
8.1 Identification problem 101
8.2 Kernel algorithm 103
8.3 Orthogonal series algorithms 106
8.4 Lemmas and proofs 108
8.5 Bibliographic notes 112
9 Discrete-time Wiener systems 113
9.1 Thesystem 113
9.2 Nonlinear subsystem 114
9.3 Dynamic subsystemidentification 119
9.4 Lemmas 1219.5 Bibliographic notes 122
10 Kernel and orthogonal series algorithms 123
10.1 Kernel algorithms 123
10.2 Orthogonal series algorithms 126
10.3 Simulationexample 129
10.4 Lemmas and proofs 130
10.5 Bibliographic notes 142
11 Continuous-time Wiener system 143
11.1 Identificationproblem 143
11.2 Nonlinear subsystem 144
11.3 Dynamic subsystem 146
11.4 Lemmas 146
11.5 Bibliographic notes 148
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
9/401
Contents vii
12 Other block-oriented nonlinear systems 149
12.1 Series-parallel, block-oriented systems 149
12.2 Block-oriented systems with nonlinear dynamics 17312.3 Concluding remarks 218
12.4 Bibliographical notes 220
13 Multivariate nonlinear block-oriented systems 222
13.1 Multivariatenonparametric regression 222
13.2 Additivemodeling and regressionanalysis 228
13.3 Multivariatesystems 242
13.4 Concluding remarks 248
13.5 Bibliographic notes 248
14 Semiparametric identification 250
14.1 Introduction 250
14.2 Semiparametric models 252
14.3 Statistical inferencefor semiparametric models 255
14.4 Statistical inferencefor semiparametric Wiener models 264
14.5 Statistical inferencefor semiparametric Hammerstein models 286
14.6 Statistical inferencefor semiparametric parallel models 287
14.7 Direct estimators for semiparametric systems 290
14.8 Concluding remarks 309
14.9 Auxiliary results, lemmas, and proofs 310
14.10 Bibliographical notes 316
A Convolution and kernel functions 319
A.1 Introduction 319
A.2 Convergence 320
A.3 Applications to probability 328
A.4 Lemmas 329
B Orthogonal functions 331
B.1 Introduction 331
B.2 Fourier series 333
B.3 Legendreseries 340
B.4 Laguerreseries 345
B.5 Hermiteseries 351
B.6 Wavelets 355
C Probability and statistics 359
C.1 Whitenoise 359
C.2 Convergenceof randomvariables 361
C.3 Stochastic approximation 364
C.4 Order statistics 365
References 371
Index 387
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
10/401
To my wife, Helena, and my children, Jerzy, Maria, and Magdalena – WG
To my parents and family and those whom I love – MP
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
11/401
Preface
The aim of this book is to show that the nonparametric regression can be applied
successfully tononlinear systemidentification. It gatherswhat hasbeendoneintheareasofar andpresents main ideas, results, andsomenew recentdevelopments.
The study of nonparametric regression estimation began with works published by
Cencov, Watson, and Nadaraya in the 1960s. The history of nonparametric regression
in system identification began about ten years later. Such methods have been applied
to the identification of composite systems consistingof nonlinear memorylesssystems
and linear dynamic ones. Therefore, the approach is strictly connected with so-called
block-oriented methods developed since Narendra and Gallman’s work published in
1966. Hammerstein and Wiener structures are most popular and have received the
greatestattentionin numerous applications. Fundamental for nonparametric methods istheobservationthat theunknowncharacteristic of thenonlinear subsystemor itsinverse
can berepresented as regressionfunctions.
In terms of the a priori information, standard identification methods and algorithms
work when it is parametric, that is, when our knowledge about the system is rather
large; for example, when weknow that thenonlinear subsystemhas apolynomial char-
acteristic. Inthis book, theinformationis muchsmaller, nonparametric. Thementioned
characteristiccanbe,forexample,anyintegrableorboundedor,even,anyBorel function.
It can thus besaid that this book associates block-oriented systemidentificationwith
nonparametric regression estimation and shows how to identify nonlinear subsystems,that is, to recover their characteristics when the a priori information is small. Because
of this, the approach should be of interest not only to researchers but also to people
interested in applications.
Chapters 2–7 aredevoted to discrete-time Hammerstein systems. Chapter 2 presents
basic discussionof theHammerstein systemand itsrelationship with theconcept of the
nonparametricregression. Thenonparametrickernel algorithmispresentedinChapter3,
its semirecursive versions are examined in Chapter 4, and Chapter 5 deals with fully
recursivemodificationsderived fromtheideaof stochastic approximation. Next, Chap-
ter 6 is concerned with the nonparametric orthogonal series method. Algorithms usingtrigonometric, Legendre, Laguerre, and Hermiteseries areinvestigated. Some spaceis
devoted to estimation methods based on wavelets. Nonparametric algorithms based on
ordered observationsarepresented andexamined in Chapter 7. Chapter 8 discusses the
nonparametric algorithms when applied to continuous-timeHammerstein systems.
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
12/401
x Preface
TheWiener systemis identified in Chapters 9–11. Chapter 9 presents themotivation
for nonparametric algorithms that are studied in the next two chapters devoted to the
discrete and continuous-time Wiener systems, respectively. Chapter 12 is concernedwith the generalization of our theory to other block-oriented nonlinear systems. This
includes, among others, parallel models, cascade-parallel models, sandwich models,
and generalized Hammerstein systems possessing local memory. In Chapter 13, the
multivariate versions of block-oriented systems are examined. The common problem
of multivariate systems, that is, the curse of dimensionality, is cured by using low-
dimensional approximations. With respect to this issue, models of the additive form
are introduced and examined. In Chapter 14, we develop identification algorithms for
a semiparametric class of block-oriented systems. Such systems are characterized by a
mixtureof finitedimensional parameters andnonparametric functionsbeingtypically aset of univariatefunctions.
Thereader isencouragedto look into theappendices, inwhichfundamental informa-
tion about tools used in the book is presented in detail. Appendix A is strictly related
to kernel algorithms, and Appendix B is tied with the orthogonal series nonparametric
curve estimates. Appendix C recalls some facts from probability theory and presents
resultsfromthetheory of order statisticsused extensively in Chapter 7.
Over the years, our work has benefited greatly from the advice and support of a
number of friends and colleagues with interest in ideas of nonparametric estimation,
pattern recognition, and nonlinear system modeling. There are too many names to listhere, butspecial mentionisduetoAdamKrzyżak, aswell asDanutaRutkowska, Leszek
Rutkowski, Alexander Georgiev, SimonLiao, PradeepaYahampath, andYongqingXin–
our past Ph.D. students, now professorsat universitiesinCanada, theUnitedStates, and
Poland.Cooperationwiththemhasbeenagreatpleasureandgivenusalotof satisfaction.
We aredeeply indebted to Zygmunt Hasiewicz, Ewaryst Rafajłowicz, Uli Stadtm̈uller,
EwaRafajłowicz, Hajo Holzmann, and Andrzej Kozek, who havecontributed greatly
to our research in the area of nonlinear system identification, pattern recognition, and
nonparametric inference.
Last, butby nomeansleast,wewould liketothank Mount-firstNg for helpinguswitha number of typesetting problems. Ed Shwedyk and January Gnitecki have provided
support for correctingEnglishgrammar.
We also thank Anna Littlewood, fromCambridge University Press, for being a very
supportive and patient editor. Researchpresented in this monograph was partially sup-
ported by research grants from Wrocław University of Technology, Wrocław, Poland,
and NSERC of Canada.
Wrocław, Winnipeg W odzimierz Greblicki, Miros aw Pawlak
February 2008
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
13/401
1 Introduction
Systemidentification, as aparticular process of statistical inference, exploits two types
of information. Thefirstisexperiment;theother,calledapriori, isknownbeforemakingany measurements. In a widesense, the a priori information concernsthe systemitself
and signals entering thesystem. Elementsof theinformationare, for example:
the nature of thesignals, which may be randomor nonrandom, white or correlated,
stationary or not, their distributions can be known in full or partially (up to some
parameters) or completely unknown, general information about the system, which can be, for example, continuous or
discretein thetimedomain, stationary or not, thestructureof thesystem, whichcanbeof theHammersteinorWiener type, orother, the knowledge about subsystems, that is, about nonlinear characteristics and linear
dynamics.
In other words, the a priori information is related to the theory of the phenomena
takingplaceinthesystem(areal physical process) or canbeinterpretedas ahypothesis
(if so, resultsof theidentificationshould benecessarily validated) or can beabstract in
nature.
Thisbook dealswithsystemsconsistingof nonlinear memorylessandlinear dynamic
subsystems, for example, Hammerstein and Wiener systems and other related struc-
tures. With respect to them, the a priori information is understood in a narrow sensebecauseitrelatestothesubsystemsonly andconcernstheapriori knowledgeabouttheir
descriptions. Werefer to suchsystems as block-oriented.
The characteristic of the nonlinear subsystemis recovered with the help of nonpara-
metricregressionestimates.Thekernel andorthogonal seriesmethodsareused.Ordered
statistics arealsoapplied.Bothofflineandonlinealgorithmsareinvestigated.Weexam-
ineonly theseestimationmethodsandnonlinear modelsfor whichweareabletodeliver
fundamental resultsintermsof consistency andconvergencerates. Thereareother tech-
niques, for example, neural networks, which may exhibit a promising performancebut
their statistical accuracy is mostly unknown.For the theory of nonparametric regression, see Efromovich [78], Györfi, Kohler,
Krzyżak, and Walk [140], Härdle [150], Prakasa Rao [241], Simonoff [278], or Wand
and Jones [310]. Nonparametric wavelet estimates are discussed in Antoniadis and
Oppenheim[6], Härdle, Kerkyacharian, Picard, and Tsybakov [151], Ogden [223], and
Walter andShen [308].
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
14/401
2 Introduction
Parametric methods are beyond the scope of this book; nevertheless, we mention
Brockwell and Davies [33], Ljung [198], Norton [221], Zhu [332], and Söderströmand
Stoica[280].Nonlinear systemidentification within the parametric framework is studied by Nells
[218], Westwick and Kearney [316], Marmarelis and Marmarelis [207], Bendat [16],
andMathewsandSicuranza[208]. Thesebookspresentidentification algorithms based
mostly on the theory of Wiener and Volterra expansions of nonlinear systems. A com-
prehensivelistof referencesconcerningnonlinear systemidentificationandapplications
has been given by Giannakis and Serpendin [102], see also the 2005 special issue on
systemidentification of theIEEE Trans. onAutomatic Control [199]. A nonparametric
statistical inference for time series is presented in Bosq [26], Fan and Yao [89], and
Györfi, Härdle, Sarda, and Vieu [139].It should bestressed that nonparametric andparametric methods are supposed to be
appliedindifferentsituations. Thefirstareusedwhentheapriori informationisnonpara-
metric, that is, when wewish to recover an infinite-dimensional object with underlying
assumptions as weak as possible. Clearly, in such a case, parametric methods can only
approximate, but not estimate, the unknown characteristics. When the information is
parametric, parametricmethodsarethenatural choice. If, however, theunknowncharac-
teristic is acomplicated functionof parameters convergenceanalysis becomes difficult.
Moreover, serious computational problems can occur. In such circumstances, one can
resort tononparametric algorithmsbecause, fromthecomputational viewpoint, they arenot discouraging. On thecontrary, they aresimple but consumecomputer memory, be-
cause, for example, kernel estimates requireall datatobestored. Neverthelessit can be
saidthatthetwoapproachesdonotcompetewitheachother sincetheyaredesignedtobe
applied inquitedifferent situations. Thesituations differ fromeachother by theamount
of theapriori information abouttheidentifiedsystem. However, acompromisebetween
thesetwo separateworldscanbemadeby restrictingaclassof nonparametric modelsto
thosethat consistof afinitedimensional parameter andnonlinear characteristics, which
runthroughanonparametric classof univariatefunctions. Such semiparametric models
canbeefficiently identified, andthetheory of semiparametric identification isexaminedin this book. The methodology of semiparametric statistical inference is examined in
Härdle, Müller, Sperlich, and Werwatz [152], Ruppert, Wand, and Carroll [259], and
Yatchev [329].
For two number sequences a n and b n , a n = O (b n ) means that a n /b n is bounded inabsolutevalueas n → ∞. In particular, a n = O (1) denotes that a n is bounded, that is,thatsupn |a n | < ∞. Writinga n ∼ b n , wemeanthata n /b n hasanonzerolimitasn → ∞.
Throughout the book, “almost everywhere” means “almost everywhere with respect
totheLebesguemeasure,” whereas“almosteverywhere(µ)” means“almosteverywhere
withrespect to themeasureµ.”
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
15/401
2 Discrete-time Hammerstein systems
In this chapter, wediscusssome preliminary aspects of the discrete-time Hammerstein
system. In Section2.1 weformtheinput–output equations of thesystem. A fundamen-tal relationship between the system nonlinearity and the nonparametric regression is
established in Section 2.2. The use of the correlation theory for recovering the linear
subsystemis discussed in Section2.3.
2.1 The system
A Hammersteinsystem, showninFigure2.1, consistsof anonlinearmemorylesssubsys-
temwithacharacteristic m (•) followedbyalineardynamiconewithanimpulseresponse{λn }. Theoutputsignal W n of thelinear partisdisturbedby Z n andY n = W n + Z n istheoutput of thewholesystem. Neither V n nor W n is availabletomeasurement. Our goal is
to identify thesystem, that is, torecover both m (•) and {λn }, fromobservations(U 1, Y 1) , (U 2, Y 2) , . . . , (U n , Y n ) , . . . (2.1)
taken at theinput and output of thewhole system.
Signalscomingtothesystem, thatis,theinput{. . . , U −1, U 0, U 1, . . .} anddisturbance{. . . , Z −1, Z 0, Z 1, . . .} are mutually independent stationary whiterandomsignals. Thedisturbancehaszeromeanandfinitevariance, thatis, E Z n = 0andvar[Z n ] = σ
2Z < ∞.
Regarding the nonlinear subsystem, we assume that m (•) is a Borel measurablefunction. Therefore, V n is a randomvariable. The dynamic subsystem is described by
thestateequation X n +1 = A X n + bV n
W n = c T X n ,(2.2)
where X n is astatevector at timen , A is amatrix, b and c arevectors. Thus,
λn = 0, for n = 0, −1, −2, . . .c T An −1b , for n = 1, 2, 3, . . . ,and
W n =n
i =−∞λn −i m (U i ). (2.3)
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
16/401
4 Discrete-time Hammerstein systems
Z n
Y nU n W nV n
{λn}m(•)
Figure 2.1 Thediscrete-timeHammerstein system.
Neitherb norc isknown.Thematrix A anditsdimensionarealsounknown.Nevertheless,
thematrix A is stable, all its eigenvalues liein theunit circle. Therefore, assuming that
Em 2(U ) < ∞, (2.4)the time index at U is dropped, we conclude that both X n as well as W n are random
variables.Clearlyrandomprocesses
{. . . , X −1, X 0, X 1, . . .
}and
{. . . , W −1, W 0, W 1, . . .
}arestationary.Consequently,theoutputprocess{. . . , Y −1, Y 0, Y 1, . . .} isalsoastationarystochastic process. Therefore, theproblemis well posed in thesensethat all signals are
random variables. In the light of this, we estimate both m (•) and {λn } from randomobservations (2.1).
Therestrictionsimposedonthesignalsenteringthesystemandbothsubsystemsapply
whenever the Hammerstein system is concerned. They will not be repeated in further
considerations, neither lemmas nor theorems.
Input randomvariables U n s may havea probability density denoted by f (•) or maybe distributed quitearbitrarily. Nevertheless (2.4) holds. It should be emphasized that,
apart fromfew cases, (2.4) is theonly restrictionin whichthenonlinearity is involved.Assumption (2.4) is irrelevant to identification algorithms and has been imposed for
only onereason: toguaranteethat bothW n andY n arerandomvariables. Neverthelessit
certainly has an influenceontherestrictions imposed onboth m (•) and thedistributionof U to meet (2.4). If, for example, U is bounded, (2.4) is satisfied for any m (•). Therestrictionalso holds, if E U 2 < ∞ and|m (u )| ≤ α + β|u | withany α, β. Inyet anotherexample, EU 4 < ∞ and|m (u )| ≤ α + βu 2. ForGaussianU and|m (u )| ≤ W (u ), whereW is an arbitrary polynomial, (2.4) is also met. Anyway, theapriori informationabout
the characteristic is nonparametric becausem (
•) cannot be represented in aparametric
form. This is becausetheclass of all possible characteristicsis very wide. The family of all stable dynamic subsystems also cannot be parameterized, because
its order is unknown. Therefore, the a priori information about the impulse response
is nonparametric, too. To form a conclusion we infer about both subsystems under
nonparametric apriori information.
In the following chapters, for simplicity, U , W , Y , and Z stand for U n , W n , Y n , and
Z n , respectively.
2.2 Nonlinear subsystem
2.2.1 The problem and the motivation for algorithms
Fix p ≥ 1 and observethat, sinceY p = Z p +p
i =−∞ λp −i m (U i ) and {U n } is a whiteprocess,
E
Y p |U 0 = u = µ(u ),
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
17/401
2.2 Nonlinear subsystem 5
Z n
Y nU n W nV n
{λn}
S n
β
ρ(•)
Figure 2.2 Theequivalent Hammerstein system.
where
µ(u ) = λp m (u ) + α p with αp = E m (U )
∞i =1,i =p λi . Estimatingtheregression E
Y p |U 0 = u
, wethus re-
cover m (•) uptosomeunknownconstants λp and αp . If E m (U ) = 0, whichis thecase,for example, when the distribution of U is symmetrical with respect to zero and m (•)is an even function then α p = 0 and we estimate m (•) only up to the multiplicativeconstant λp .
SinceY p +n = µ(U n ) + ξ p +n + Z p +n withξ p +n =p +n
i =−∞,i =n λ p +n −i m (U i ), it canbesaid that weestimateµ(u ) frompairs
(U 0, Y p ), (U 1, Y p +1), . . . , (U n , Y p +n ), . . . ,
and that the regression µ(u ) is corrupted by the noise Z p +n + ξ p +n . The first compo-nent of noise is white with zero mean. Because of dynamics the other noise compo-
nent is correlated. Its mean E ξ n = αp is usually nonzero and the variance is equal tovar[m (U )]
∞i =1,i =p λ
2i . Thus, main difficulties in the analysis of any estimate of µ(•)
are caused by the correlation of {ξ n }, that is, the system itself but not by the whitedisturbance Z n comingfromoutside.
Every algorithmestimating the nonlinearity in Hammerstein systems studied in this
book,theestimateisdenotedhereasµ̂(U 0, . . . , U n ; Y p , . . . , Y p +n ), islinear withrespecttooutputobservations, whichmeansthat
µ̂(U 0, . . . , U n ; θ p + η p , . . . , θ p +n + ηp +n )= µ̂(U 0, . . . , U n ; θ p , . . . , θ p +n ) + µ̂(U 0, . . . , U n ; η p , . . . , ηp +n ) (2.5)
andhas anatural property that, for any number θ ,
µ̂(U 0, . . . , U n ; θ , . . . , θ ) → θ as n → ∞ (2.6)
in an appropriatestochastic sense. This property, or rather itsconsequence, is exploited
when proving consistency. To explain this, observethat with respect to U n and Y n , the
identified system shown in Figure 2.1 is equivalent to that in Figure 2.2 with nonlin-
earity ρ (u )
=m (u )
− E m (U ) and an additional disturbance β
= E m (U )
∞i
=1 λi . In
theequivalent system, E ρ(U ) = 0 and E {Y p |U 0 = u } = µ(u ). From(2.5) and (2.6), itfollowsthat
µ̂(U 0, . . . , U n ; Y p , . . . , Y p +n ) = µ̂(U 0, . . . , U n ; S p + β , . . . , S p +n + β)= µ̂(U 0, . . . , U n ; S p , . . . , S p +n )
+ µ̂(U 0, . . . , U n ; β , . . . , β)
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
18/401
6 Discrete-time Hammerstein systems
with µ̂(U 0, . . . , U n ; β , . . . , β) → β as n → ∞. Hence, if µ̂(U 0, . . . , U n ; S p , . . . , S p
+n )
→ E
{S p
|U 0
=u }
, as n → ∞
,
wehave
µ̂(U 0, . . . , U n ; Y p , . . . , Y p +n ) → E {Y p |U 0 = u }, as n → ∞,whereconvergenceis understoodin thesamesenseas that in (2.6).
Thus, if theestimaterecovers theregression E {S p |U 0 = u } fromobservations(U 0, S p ), (U 1, S 1+p ), (U 2, S 2+p ), . . . ,
it also recovers E
{Y p
|U 0
=u
}from
(U 0, Y p ), (U 1, Y 1+p ), (U 2, Y 2+p ), . . . .
We can say that if the estimate works properly when applied to the system with input
U n and output S n (in which E ρ(U ) = 0), it behaves properly also when applied to thesystemwithinputU n andoutput Y n (in which Em (U ) may benonzero).
Theresult of thereasoning is given in thefollowing remark:
R E M A R K 2.1 Let an estimate have properties (2.5) and (2.6). I f the estimate is consistent
for Em (U ) = 0, then it i s consistent for E m (U ) = 0, too.
Owing to the remark, with no loss of generality, in all proofs of consistency of algorithms recoveringthenonlinearity, weassumethat E m (U ) = 0.
In parametric problems thenonlinearity is usually apolynomial m (u ) = α0 + α1u +· · · + αq u q of afixeddegreewithunknowntruevaluesof parametersα0, . . . , αq . There-fore, toapply parametric methods, wemust haveagreat deal moreapriori information
about the subsystem. It seems that in many applications, it is impossible to represent
m (•) in aparametric form.Sincethesystemwiththefollowing ARM A typedifferenceequation:
wn + a k −1wn −1 + · · · + a 0wn −k = b k −1m (u n −1) + · · · + b 0m (u n −k )can bedescribed by (2.2), all presented methods can beused torecover thenonlinearity
m (•) in theprevious ARM A system.It will beconvenient todenote
φ(u ) = E
W 2p |U 0 = u
. (2.7)
Since W p =p −1
i =−∞ λi m (U i ), denoting c 0 = E m 2(U )∞
i =1,i =p λ2i + E 2m (U )
(
∞i =1,i =p λi )
2, c 1 = 2λp Em (U )
∞i =1,i =p λi , and c 2 = λ2p , wefind
φ (u ) = c 0 + c 1m (u ) + c 2m 2(u ). To avoid complicated notation, we do not denote explicitly the dependence of the
estimated regressionandother functions on p and simply writeµ(•) and φ(•).Results presented in further chapters can be easily generalized onthe systemshown
inFigure2.3, where{. . . , ξ 0, ξ 1, ξ 2, . . .} isanother zeromeannoise. Moreover,{Z n } can
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
19/401
2.2 Nonlinear subsystem 7
Z n
Y nU n
{λn}m(•)
ξ n
Figure 2.3 Possiblegeneralizationof thesystemshownin Figure2.1.
be correlated, that is, it can be the output of a stable linear dynamic systemstimulated
by whiterandomnoise. So can {ξ n }.It is worth noting that aclass of stochastic processes generated by theoutput process
{Y n } of the Hammerstein systemis different fromthe classof strong mixing processesconsideredextensivelyinthestatistical literatureconcerningthenonparametricinference
fromdependent data, see, for example, [26] and [89]. Indeed, theARMA process {X n }in which X n +1 = a X n + V n , where0
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
20/401
8 Discrete-time Hammerstein systems
cloud of 200 input–output observations, we infer fromis presented in Figure 2.4. The
quality of eachestimate, denoted hereby m̂ (u ), is measured with
MISE = 3−3
(m̂ (u ) − m (u ))2du .
2.3 Dynamic subsystem identification
Passingtothedynamicsubsystem,weuse(2.3)andrecall E Z n = 0tonoticeE {Y i U 0} =
i j =−∞ λi − j E {m (U i )U 0} = λi E {m (U )U }. Denoting κi = λi E {U m (U )} , weobtain
κi = E {Y i U 0} ,
whichcan beestimated in thefollowingway:
κ̂i =1
n
n −i j =1
Y i + j U j .
T HEOR EM 2.1 For any i ,
limn →∞
E (κ̂i −
κi )2
=0.
Proof. The estimate is unbiased, that is, E κ̂i = E {Y i U 0} = κi . Moreover, var[κ̂i ] =P n + Q n + R n with
P n =1
n 2var
n j =1
Z i + j U j
= 1n 2
n j =1
var
Z i + j U j = 1
n σ 2Z EU
2,
Q n =1
n var[W i U 0] ,
and
R n =1
n 2
n j =1
n j =1, j =i
cov
W i + j U j , W i +m U m
= 1n 2
n j =1
(n − j ) covW i + j U j , W i U 0 .Since W i =
i j =−∞ λi − j m (U j ), Q n = n −1λ2i var[m (U )U ]. For the same reason, for
j > 0,
cov
W i + j U j , W i U 0 = i + j
p =−∞
i q =−∞
λi + j −p λi −q cov
m (U p )U j , m (U q )U 0
= E 2{U m (U )}λi + j λi − j
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
21/401
2.4 Bibliographic notes 9
SeeLemmaC.3 in Appendix C, which leads to
|R n | ≤1
n 2 E 2
{U m (U )}n
j =1(n − j )|λi + j λi − j | ≤
1
n E 2
{U m (U )} maxs |λs |∞
j =1|λ j |.
Thus,
E (κ̂i − κi )2 = var[κ̂i ] = O
1
n
(2.8)
whichcompletes theproof.
Thetheoremestablishesconvergenceof thelocal error E (κ̂i − κi )2 tozeroasn → ∞.As an estimate of the whole impulse response
{κ1, κ2, κ3, . . .
}, we take a sequence
{κ̂1, κ̂2, κ̂3, . . . , κ̂N (n ), 0, 0, . . .} andfindthemeansummedsquareerror (MSSE) isequalto
MSSE(κ̂) =N (n )i =1
E (κ̂i − κi )2 +∞
i =N (n )+1κ2i .
From(2.8), it followsthat theerror is not greater than
O
N (n )
n +
∞
i =N (n )+1κ2i .
Therefore, if N (n ) → ∞ as n → ∞ and N (n )/n → 0 as n → ∞,lim
n →∞MSSE(κ̂) = 0.
The identity λs τ = E {Y s U 0}, whereτ = E {U m (U )}, allows us to forma nonpara-metric estimateof the linear subsystemin the frequency domain. Indeed, formation of
theFourier transformof theidentity yields
(ω)τ = S YU (ω), |ω| ≤ π, (2.9)where S YU (ω) = ∞s =−∞ κs e −i s ω is thecross-spectral density functionof theprocesses{Y n } and {U n }. Moreover,
(ω) =∞
s =0λs e
−i s ω
is the transfer function of the linear subsystem. Note also that if λ0 = 1, then τ = κ0.See Chapter 12 for further discussion on the frequency domain identification of linear
systems.
2.4 Bibliographic notes
Various aspects of parametric identification algorithms of discrete-time Hammerstein
systems have been studied by Narendra and Gallman [216]; Haist, Chang, and Luus
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
22/401
10 Discrete-time Hammerstein systems
[142], Thatchachar and Ramaswamy [289], Kaminskas [175], Gallman [92], Billings
[19],BillingsandFakhouri [20,24],ShihandKung[276], KungandShih[190], Liaoand
Sethares[195],VerhaegenandWestwick[301],Giri,Chaoui,andRochidi [103],NinnessandGibson[220], Bai [11,12], andVörös[305].Theanalysisof block–orientedsystems
and, in particular, Hammerstein ones, useful for variousaspects of identificationand its
applicationscanbefoundinBendat [16], Chen[45], MarmarelisandMarmarelis[207],
Mathewsand Sicuranza[208], Nells [218], and Westwick and Kearney [316].
SometimesresultsconcerningHammersteinsystemsaregiven,howevernotexplicitly,
in works devoted to morecomplicated Hammerstein–Wiener or Wiener–Hammerstein
structures, see, for example, Gardiner [94], Billings and Fakhouri [22, 23], Fakhouri,
Billlings,andWormald[86], Hunter andKorenberg[168], KorenbergandHunter [177],
Emara-ShaBaik, Moustafa, andTalaq [79], Boutayeb andDarouach [27], Vandersteen,Rolain, and Schoukens [296], Bai [10], Bershad, Celka, and McLaughlin [18], and
Zhu[333].
The nonparametric approach offers a number of algorithms to recover the charac-
teristics of the nonlinear subsystem. The most popular kernel estimate can be used
in the offline version, see Chapter 3. For semirecursive and fully recursive forms, see
Chapter 4 and Chapter 5, respectively. Nonparametric orthogonal series identification
algorithms,seeChapter 6,utilizetrigonometric, Legendre, Laguerre,Hermitefunctions
or wavelets. Bothclassesof estimatescanbemodifiedtouseorderedinputobservations
(seeChapter 7), whichmakes theminsensitivetotheroughness of theinput density. The Hammerstein model has been used in various and diverse areas. Eskinat, J ohn-
son, and Luyben [82] applied it to describe processes in distillation columns and heat
exchangers.Thehysteresis phenomenoninferriteswasanalyzedby HsuandNgo[166],
pH processes were analyzed by Patwardhan, Lakshminarayanan, and Shah [227], bio-
logical systems werestudiedby Hunter andKorenberg[168], andEmerson, Korenberg,
andCitron[80] described someneuronal processes. Theuseof theHammersteinmodel
for modeling aspects of financial volatility processes is presented in Capobianco [38].
In Giannakis and Serpendin [102] a comprehensive bibliography on nonlinear system
identification is given, see also the 2005 special issue on system identification of theIEEE Trans. onAutomatic Control [199].
It is also worth noting that the concept of the Hammerstein model originates from
thetheory of nonlinear integral equationsdevelopedby Hammersteinin1930[148], see
alsoTricomi [292].
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
23/401
3 Kernel algorithms
Thekernel algorithmisjustthekernel estimateof aregressionfunction. Thisisthemost
popularnonparametricestimationmethodandisvery convenientfromthecomputationalviewpoint. In Section 3.1, an intuitivemotivation for the algorithmis presented and in
Section 3.2, itspointwise consistency is shown. Some results hold for any input signal
density, that is, are density-free; some are even distribution-free, that is, they hold for
any distributionof theinput signal. In Section3.3, theattention is focused on aclass of
applicablekernel functions. Theconvergencerateis studied in Section3.4.
3.1 Motivation
It is obvious that
limh →0
1
2h
u +h u −h
µ(v) f (v)d v = µ(u ) f (u )
at every continuity point u ∈ R of both m (•) and f (•), since µ(u ) = λp m (u ) + α p .Becausetheformulacan berewritten in thefollowingform:
limh →0
µ(v) f (v)1
h K
u − v
h d v = µ(u ) f (u ), (3.1)
where
K (u ) =
1
2, for |u |
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
24/401
12 Kernel algorithms
-1 1
0.5
Figure 3.1 Rectangular kernel (3.2).
converges to
µ(v) f (v)δ(u − v)d v = µ(u ) f (u ) as h → 0.Becauseµ(u ) = E Y p |U 0 = u , weget
µ(u ) f (v)1
h K
u − v
h
d v = 1
h
E
Y p |U 0 = v
K
u − v
h
f (v)d v
= 1h
E Y p K u − U 0h
,whichsuggeststhefollowingestimateof µ(u ) f (u ):
1
nh
n i =1
Y p +i K
u − U i h
.
For similar reasons,
1nh
n i =1
K u − U i h
is agood candidatefor an estimateof
f (v)1
h K
u − v
h
d v,
which converges to f (u ) as h → 0. Thus,
µ̂(u ) =
n i =1
Y p +i K u − U i h n n
i =1K
u − U i
h n
(3.3)with h n tending to zero, is a kernel estimate of µ(u ). The parameter h n is called a
bandwidth. Notethat theaboveformulaisof theratio formandwealwaystreat thecase
0/0 as 0.
In light of this, crucial problems are the choice of the kernel K (•) and the number
sequence {h n }. Fromnow on, wedenote g (u ) = µ(u ) f (u ).Itisworthmentioningthatthereisawiderangeof kernel estimates[88,140,172]avail-ablefor findingacurveindata. Themostprominentare: theclassical Nadaraya–Watson
estimator, defined in (3.3), local linear and polynomial kernel estimates, convolution
type kernel estimates, andvarious recursivekernel methods. Some of thesetechniques
arethoroughly examined in this book.
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
25/401
3.2 Consistency 13
3.2 Consistency
On thekernel function, thefollowingrestrictions areimposed:
sup−∞ 0.
Thenext theoremis the“almost everywhere” versionof Theorem3.1. Therestriction
imposed on the kernel and number sequenceare the same as in Theorem 3.1 with the
only exceptionthat (3.6) holds withsomeε > 0 but not with ε = 0. T HE OR E M 3.2 Let U have a probabil ity density f (•) and let Em 2(U ) < ∞. Let the Borel measurable satisfy (3.4), (3.5), and (3.6) with some ε > 0. Let the sequence {h n }of positive numbers satisfy (3.7) and (3.8). Then, convergence (3.9) takes place at every
Lebesgue point u ∈ R of both m (•) and f (•), where f (u ) > 0, and, a forti ori , at almost every u where f (u ) > 0, that i s, at almost every u belonging to support of f (•).Proof. Theproof is very muchlikethat of Theorem3.1. Thedifferenceis that weapply
LemmaA.9 rather than LemmaA.8.
The algorithm converges also when the input signal has not a density, when the
distribution of U is of any shape. Theproof of thetheoremis in Section3.7.1.
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
26/401
14 Kernel algorithms
T HEOR EM 3.3 Let E m 2(U ) < ∞. Let H (•) be a nonnegative nonincreasing Borel func- tion defined on [0, ∞), continuous and positi ve at t = 0 and such that
t H (t ) → 0 as t → ∞.
Let, for some c 1 and c 2,
c 1H (|u |) ≤ K (u ) ≤ c 2H (|u |).
Let the sequence {h n } of positive numbers satisfy (3.7) and (3.8). Then convergence(3.9) takes place at almost every (ζ ) u ∈ R, where ζ i s the probabil ity measure of U .
Restrictions (3.7) and (3.8) are satisfied by a wide class of number sequences. If
h n = cn −δ withc > 0, they aresatisfiedfor 0 < δ 0. The other does it for every Lebesgue point of both m (
•) and f (
•), that is,
for almost every (with respect to the Lebesgue measure) u where f (u ) > 0, that is, atalmost every (ζ ) point. In Theorem3.3 the kernel satisfies restrictions (3.4), (3.5), and
(3.6) with ε = 0. In Theorems 3.1and 3.2, (3.6) holdswith ε > 0.If both m (•) and f (•) are bounded and continuous, we can apply kernels satisfying
only (3.4) and (3.5), see Remark 3.1. In Theorem3.3, U has an arbitrary distribution,
whichmeans that it may nothaveadensity.
Inthelightof thistoachieveconvergenceatLebesguepointsand, afortiori, continuity
points, wecan apply thefollowingkernel functions:
therectangular kernel (3.2), thetriangle kernel
K (u ) =
1− |u |, for |u |
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
27/401
3.3 Applicable kernels 15
-10 10
0.3
Figure 3.2 Gauss–Weierstrass kernel (3.10).
thePoisson kernel
K (u ) =1
π
1
1+ u 2, theFejér kernel (seeFigure3.3)
K (u ) = 1π
sin2 u
u 2 , (3.11)
theLebesguekernel
K (u )=
1
2
e −|u |.
All thesekernels satisfy (3.4), (3.5), and (3.6) for someε > 0. Thekernel
K (u ) =
1
4e , for |u | ≤ e
1
4|u | ln2 |u | , otherwise,(3.12)
satisfies (3.4), (3.5), and (3.6) withε = 0 only. In turn, kernels
K (u ) =1
π
sinu
u , (3.13)
(seeFigure3.4) and
K (u ) =
2
πcosu 2, (3.14)
-10 10
0.3
Figure 3.3 Fejér kernel (3.11).
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
28/401
16 Kernel algorithms
-10 10
-0.1
0.3
Figure 3.4 Kernel (3.13).
(seeFigure3.5), satisfy (3.4) and(3.5), but not(3.6), even withε = 0. Forall presentedkernels, K (u )du = 1.Observethattheycanbecontinuousornotandcanhavecompactor unbounded support.
Noticethat Theorem3.3 admitsthefollowingone:
K (u ) =
1
e , for |u | ≤ e
1
|u | ln|u | , otherwise,
for which
K (u )du = ∞. Restrictions imposed by the theoremareillustrated in Fig-
ure3.6.
3.4 Convergence rate
In this section, both the characteristic m (•) and an input density f (•) are smoothfunctions and have q derivatives. Proper selection of the kernel and number sequence
increases thespeed wheretheestimateconverges. Wenow findtheconvergencerate.
In our analysis, thekernel satisfies thefollowingadditional restrictions:
vi K (v)d v = 0, for i = 1, 2, . . . , q − 1, (3.15)and
|vq −1/2K (v)|d v < ∞, (3.16)
-2 2
-0.4
0.4
Figure 3.5 Kernel (3.14).
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
29/401
3.4 Convergence rate 17
K (u)
c2H (u)
c1H (u)
Figure 3.6 A kernel satisfyingrestrictions of Theorem3.3.
seetheanalysis in Section A.2.2. For simplicity of notation,
K (v)d v = 1. For afixed
u , weget
E f̂ (u ) = 1h n
f (v)K
u − v
h n
d v =
f (u + vh n )K (−v)d v,
whichyields
bias[ f̂ (u )] = E f̂ (u ) − f (u ) =
( f (u + vh n ) − f (u ))K (−v)d v.
Assuming that f (q )(•) is square integrable and applying (A.17), we find bias[ f̂ (u )]
= O (h q
−1/2
n ). Wenext recall (3.27) and writevar[ ˆf (u )] = O (1/nh n ), which leads to
E ( f̂ (u ) − f (u ))2 = O (h 2q −1n ) + O
1
nh n
.
Thus, selecting
h n ∼ n −1/2q , (3.17)
wefinally obtain
E ( ˆf (u ) − f (u ))
2
= O (n −1
+1/2q
).Needless to say that if the q th derivative of g (u ) is square integrable, for the same
reasons, E (ĝ (u ) − g (u ))2 is of the sameorder. Hence, applying Lemma C.9, wefinallyobtain thefollowingconvergencerate:
P {|µ̂(u ) − µ(u )| > ε|µ(u )|} = O (n −1+1/2q )
for any ε > 0, and
|µ̂(u )
−µ(u )
| = O (n −1/2+1/4q ) as n
→ ∞in probability.
If f (q )(u ) is bounded, bias[ f̂ (u )] = O (h q n ), see(A.18); and, for
h n ∼ n −1/(2q +1),
E ( f̂ (u ) − f (u ))2 = O (n −1+1/(2q +1)).
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
30/401
18 Kernel algorithms
-4 -2 2 4
0.6
Figure 3.7 Kernel G 4.
If, in addition, the q th derivative of g (u ) is bounded, E (ĝ (u ) − g (u ))2 is of the sameorder and, as aconsequence,
P {|µ̂(u ) − µ(u )| > ε|µ(u )|} = O (n −1+1/(2q +1))
for any ε > 0, and
|µ̂(u ) − µ(u )| = O (n −1/2+1/(4q +2)) as n → ∞ in probability,
whichmeans that therateis slightly better.
The rate O (n −q /(2q +1)) in probability, obtained aboveis known to be optimal withintheclass of q differentiableinputdensitiesandnonlinear characteristics, see[285].
It is not difficult to construct kernels satisfying (3.15) such that K (v)d v = 1. Forexample, starting from the Gauss–Weierstrass kernel (3.10) denoted now as G (•) weobserve that
u i G (u )du = 0 for odd i , and u i G (u )du = 1× 3× · · · × (i − 1) for
even i . Thus, for
G 2(u ) = G (u ) =1√ 2π
e −u 2/2,
(3.15) is satisfied for q = 2. For thesamereasons, for
G 4(u ) = 12
(3− u 2)G (u ) = 12√
2π(3− u 2)e −u 2/2, (3.18)
(seeFigure3.7), and
G 6(u ) =1
8(15− 10u 2 + u 4)G (u ) = 1
8√
2π(15− 10u 2 + u 4)e −u 2/2
(3.15) hold for q = 4 and q = 6, respectively.In turn, for rectanglekernel (3.2) denoted now as W (
•), u i W (u )du equals zero forodd i and 1/(i + 1) for even i . Thus for W 2(u ) = W (u ), (3.15) holdswith q = 2, while
for
W 4(u ) =1
4(9− 15u 2)W (u ) =
18
(9− 15u 2), for |u | ≤ 10, otherwise,
(3.19)
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
31/401
3.4 Convergence rate 19
-1 1
-1
1
Figure 3.8 Kernel W 4.
with q = 4. For q = 6, wefind
W 6(u ) = 564
(45− 210u 2 + 189u 4)W (u ) (3.20)
=
5
128
45− 210u 2 + 189u 4 , for |u | ≤ 1
0, otherwise.
Kernels W 4(u ) and W 6(u ) areshownin Figures 3.8 and 3.9, respectively.
There is a formal way of generating kernel functions satisfying Conditions (3.15)
and (3.16) for an arbitrary value of q . This technique relies on the theory of or-
thogonal polynomials that is examined in Chapter 6. In particular, if one wishes toobtain kernels defined on a compact interval then we can use a class of Legendre
orthogonal polynomials, see Section 6.3 for various properties of this class. Hence,
let {p (u ); 0 ≤ ≤ ∞} be a set of the orthonormal Legendre polynomials defined on[−1, 1], that is, 1−1 p (u )p j (u )du = δ j , δ j being the Kronecker delta function andp (u ) =
2+1
2 P (u ), where P (u ) is thethorder Legendrepolynomial.
The following lemma describes the procedure for generation of a kernel function of
order q withasupportdefined on[−1, 1].
L E M M A 3.1 A kernel function
K (u ) =q −1 j =0
p j (0)p j (u ), |u | ≤ 1 (3.21)
satisfies Condition (3.15).
-1 1
-0.5
1.5
Figure 3.9 Kernel W 6.
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
32/401
20 Kernel algorithms
Proof. For i ≤ q − 1consider 1−1 u i K (u )du . Sinceu i canbeexpandedintotheLegen-dreseries, that is,u i =
i =0 a p (u ), wherea =
1
−1 u
i p (u )du thenfor K (u ) defined
in (3.21), wehave 1−1
u i K (u )d v =i
=0
q −1 j =0
a p j (0) 1
−1p (u )p j (u )du
=i
=0a p (0) = 0i =
1 if i = 00 if i = 1, 2, . . . , q − 1. .
Theproof of Lemma3.1 has been completed.
It is worth noting that P (0) = 0 for = 1, 3, 5, . . . and P (−u ) = P (u ) for =0, 2, 4, . . .. Consequently, thekernel in (3.21) is symmetric and all terms in (3.21) with
odd values of j areequal zero.
Since p 0(u ) =
12
and p 2(u ) =
52
32
u 2 − 12
, it is easy to verify that thekernel in
(3.21) withq = 4 is given by
K (u ) =
9
8− 15
8u 2
, |u | ≤ 1.
This confirms theformof thekernel W 4(v) given in (3.19).
Theresult of Lemma3.1 can beextended to alarger class of orthogonal polynomials
defined ontheset S , that is, when wehavethesystemof functions {p (u ); 0 ≤ ≤ ∞}defined on S , whichsatisfies
S
p (u )p j (u )w(u )du = δ j ,
where w(u ) is the weight function being positive on S and such that w(0) = 1. Thenformula(3.21) takes thefollowingmodified form:
K (u ) =q −1 j =0
p j (0)p j (u )w(u ). (3.22)
In particular, if w(u ) = e −u 2, −∞
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
33/401
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
34/401
22 Kernel algorithms
1.5
0.5
-0.5
-1.5
-3 -2 -1 0 1 2 3
m(u)
n = 40
n = 80
n = 320
n = 1280
Figure 3.10 Realizationsof theestimate; a = 0.5, h n = n −2/5 (examplein Section3.6).
1
0.5
0
0 200 400 600 800 1000 1200
1.5
a
n
0
0.25
0.5
0.75
Figure 3.11 MISE versus n , variousa ; h n = n −2/5 (examplein Section3.6).
4
3
2
1
0
0 200 400 600 800 1000 1200
var(Z)
0
0.25
0.5
1
n
Figure 3.12 MISE versus n , variousvar(Z ); h n = n −2/5 (example in Section3.6).
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
35/401
3.6 Simulation example 23
4
3
2
1
0
0 1 2 3 4
n
10
20
40
80
160
320
640
1280
h n
Figure 3.13 MISE versus h n , variousn ; a = 0.0 (exampleinSection3.6).
4
3
2
1
0
0 1 2 3 4
n
10
20
40
80
160
320
640
1280
hn
Figure 3.14 MISE versus h n , variousn ; a = 0.25 (examplein Section3.6).
0 1 2 3 4
4
3
2
1
0
5
10
20
40
80
160
320
640
1280
hn
Figure 3.15 MISE versus h n , variousn ; a = 0.5 (exampleinSection3.6).
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
36/401
24 Kernel algorithms
0 1 2 3 4
4
3
2
1
0
5
hn
20
40
80
160
320
640
1280
Figure 3.16 MISE versus h n , variousn ; a = 0.75 (examplein Section3.6).
3.7 Lemmas and proofs
3.7.1 Lemmas
In Lemma3.2, U has adensity, in Lemma3.3, thedistributionof U is arbitrary.
L E M M A 3.2 Let U have a probabil i ty density. Let Em (U ) = 0, var[m (U )] < ∞. Let the kernel K (•) satisfy (3.4), (3.5). I f (3.6) holds with ε = 0, then, for i = 0,
suph >0
covW p +i 1h K
u − U i h
, W p
1
h K
u − U 0
h
≤ (|λp λ p +i | + |λ p λp −i | + |λ p +i λp −i |)ω(u ),
where ω(u ) is finite at every continuity point u of both m (•) and f (•). I f ε > 0, the property holds at al most every u ∈ R.Proof. Weprovethecontinuousversionof thelemma. The“almosteverywhere” version
can beverified in asimilar way.
4
3
2
1
0
5
-0.2 0 0.2 0.4 0.6 0.8 1 1.2
n
10 20 40
80 160 320
640 1280
Figure 3.17 MISE versus δ, h n = n −δ, variousn ; a = 0.5 (examplein Section3.6).
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
37/401
3.7 Lemmas and proofs 25
Since W p +i =p +i
q =−∞ λp +i −q m (U q ) and W p =p
r =−∞ λ p −r m (U r ), the covariancein theassertion equals
p +i q =−∞
p r =−∞
λ p +i −q λp −r cov
m (U q )1
h K
u − U i
h
, m (U r )
1
h K
u − U 0
h
.
Applying LemmaC.2, wefind that theaboveformulais equal to
(λp λ p +i + λp λp −i )1
h E
K
u − U
h n
1
h E
m 2(U )K
u − U
h
+λp
+i λp
−i
1
h 2
E 2m (U )K u − U
h .Let u beapoint whereboth m (•) and f (•) arecontinuous. It suffices toapply LemmasA.8and A.9to find that thefollowing formulas
suph >0
1h E K
u − U h
, suph >0
E
m (U )1h K
u − U h
,suph >0
E
m 2(U )
1
h K
u − U
h
,
arefinite.
In thenext lemma, U has an arbitrary distribution.
L E M M A 3.3 Let E m (U ) = 0and var[m (U )] < ∞. I f the kernel satisfies the restr icti ons of Theorem 3.3, then
limsuph →0
cov
W p +i K
u − U i
h
, W p K
u − U 0
h
E 2K
u − U h
≤ (|λp λp +i | + |λp λ p −i | + |λp +i λp −i |)θ (u ),
where some θ (u ) is finite at almost every (ζ ) u ∈ R, where ζ i s the distribution of U .
Proof. The proof is similar to that of Lemma 3.2. Lemma A.10, rather than Lemmas
A.8 and A.9, should beemployed.
3.7.2 Proofs
ProofofTheorem3.1For thesakeof theproof E m (U ) = 0, seeRemark 2.1. Observethat µ̂(u ) = ĝ (u )/ f̂ (u )with
ĝ (u ) = 1nh n
n i =1
Y p +i K
u − U i h n
(3.23)
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
38/401
26 Kernel algorithms
and
ˆf (u ) =
1
nh n
n
i =1 K
u
−U i
h n . (3.24)Fix u ∈ R andsupposethat bothm (•) and f (•) arecontinuous at thepoint.
Wewill now show that
ĝ (u ) → g (u )
K (v)d v → 0 as n → ∞ in probability, (3.25)
where, werecall, g (u ) = µ(u ) f (u ). Since
E ĝ (u )
=1
h n E E Y p | U 0 K
u − U 0h n =
1
h n E µ(U )K
u − U h n ,
applying LemmaA.8, weconcludethat
E ĝ (u ) → g (u )
K (v)d v as n → ∞.
In turn, sinceY n = W n + Z n ,var[ĝ (u )] = P n (u ) + Q n (u ) + R n (u ),
where
P n (u ) = 1nh n
σ 2Z 1
h n E K 2
u − U h n
,
Q n (u ) =1
nh n
1
h n var
W p K
u − U 0
h n
,
and
R n (u ) =1
n 2h 2n
n i =1
n j =1 j =
i
cov
W p +i K
u − U i
h n
, W p + j K
u − U j
h n
= 2n 2h 2n
n i =1
(n − i ) cov
W p +i K u − U i
h n
, W p K
u − U 0h n
.
In view of LemmaA.8,
nh n P n (u ) → σ 2Z f (u )
K 2(v)d v as n → ∞.
Since
varW p K u − U 0h n = E
W 2p K
2
u − U 0
h n
− E 2
W p K
u − U 0
h n
= E
φ (U ) K 2
u − U
h
− E 2
µ(U )K
u − U
h
, (3.26)
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
39/401
3.7 Lemmas and proofs 27
whereφ(•) is as in (2.7), by LemmaA.8,
nh n Q n (u ) → φ (u ) f (u ) K 2(v)d v as n → ∞.Passingto R n (u ), weapply Lemma3.2to obtain
|R n (u )| ≤ 2ω(u ) 1
n 2
n i =1
(n − i )(|λp λ p +i | + |λ p λp −i | + |λp +i λp −i |)
≤ 6ω(u )(maxn
|λn |)1
n
∞i =1
|λi | = O
1
n
.
Finally,
nh n var[ĝ (u )] →
σ 2Z + φ(u )
f (u )
K 2(v)d v as n → ∞.
In this way, wehaveverified (3.25).
Using similar arguments, weshow that E f̂ (u ) → f (u ) K (v)d v as n → ∞ andnh n var[ f̂ (u )] → f (u )
K 2(v)d v as n → ∞, (3.27)
andthen weconcludethat f̂ (u ) → f (u ) K (v)d v → 0 as n → ∞ in probability. Theproof has been completed.
ProofofTheorem3.3
In general, the ideaof theproof is similar tothat of Theorem3.1. Some modifications,
however, arenecessary.
Recalling Remark 2.1, with no loss of generality, we assume that E m (U ) = 0 andbegin with theobservation that µ̂(u ) = ξ̂ (u )/η̂(u ), where
ξ̂ (u ) = 1n E K
u −U
h n
n i =1
Y p +i K u − U i h n
and
η̂(u ) = 1n E K
u −U
h n
n i =1
K
u − U i
h n
.
Obviously,
E ξ̂ (u ) =E Y 1K u − U 0
h n
E K
u − U
h n
= E µ(U )K u − U h n E K
u − U
h n
,which, by LemmaA.10, converges toµ(u ) as n → ∞ for almost every (ζ ) u ∈ R .
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
40/401
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
41/401
3.8 Bibliographic notes 29
3.8 Bibliographic notes
The kernel regression estimate has been proposed independently by Nadaraya [215]and Watson [312] and was the subject of studies performed by Rosenblatt [257],
Collomb [55], Greblicki [105], Greblicki and Krzẏzak [121], Chu and Marron [51],
Fan [87], Müller and Song [212], J ones, Davies, and Parkand [172], and many oth-
ers. A comprehensive overview of various kernel methods is presented in Wand and
Jones [310]. At first, thedensity of U was assumed to exist. SinceStone[284], consis-
tency for any distribution has been examined. Later, distribution-free properties were
studied by Spiegelman and Sacks [282], Devroye and Wagner [73,74], Devroye [71],
Krzyżak and Pawlak [187, 188], Greblicki, Krzyżak, and Pawlak [122], Kozek and
Pawlak [179], among others. In particular, themonograph by Györfi, Kohler, Krzyżak,and Walk [140] examines the problem of a distribution-free theory of nonparametric
regression.
Thekernel regressionestimatehasbeenderivedinanatural way fromthekernel esti-
mate(3.24) of aprobabilitydensity functionintroduced by Parzen [226], generalizedto
multivariate cases by Cacoullos [37] and examined by a number of authors, see, for
example, Rosenblatt [256], Van Ryzin [297, 298], Deheuvels [65], Wahba [306],
Devroyeand Wagner [72], Devroyeand Györfi [68], and Csörgo and Mielniczuk [58].
Seealso Härdle[150], PrakasaRao [241], or Silverman [277] and papers cited therein.
In all mentioned works, however, the kernel estimate is of form (3.3) with p = 0,whileindependent observations (U i , Y i )s comefromamodel Y n = m (U n ) + Z n . In thecontext of theHammerstein system, it means that dynamicsis justmissing becausethe
linear subsystemis reduced toasimpledelay.
Thenonparametric kernel regressionestimatehasbeenappliedto recover thenonlin-
ear characteristic inaHammerstein systemby Greblicki and Pawlak[126]. InGreblicki
and Pawlak [129], the input signal has an arbitrary distribution. Not a state equation,
but aconvolutiontodescribethedynamic subsystem, has beenappliedinGreblicki and
Pawlak [127]. The kernel estimate has also been discussed in Krzyżak [182,183], as
well as Krzyżak and Partyka [185]. For very specific distributions of the input signal,thenonparametric kernel regressionestimatehas been studied by Lang[193].
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
42/401
4 Semirecursive kernel algorithms
This chapter is devoted to semirecursive kernel algorithms, modifications of those ex-
amined in Chapter 3. Their numerators and denominators can becalculated online. Weshow consistency and examine convergencerate. Theresultsfor all input densities and
all inputdistributionsareestablished.
4.1 Introduction
Weexaminethefollowingsemirecursivekernel estimates:
µ̃n (u ) =
n
i =1
1
h i Y p +i K u − U i h i n
i =1
1
h i K
u − U i
h i
(4.1)and
µ̄n (u ) =
n i =1
Y p +i K
u − U i h i
n
i =1K u − U i h i
, (4.2)
modifications of (3.3). To demonstrate recursiveness, we observe that µ̃n (u ) =g̃ n (u )/ f̃ n (u ), where
g̃ n (u ) =1
n
n i =1
Y p +i 1
h i K
u − U i
h i
and
f̃ n (u ) = 1n
n i =1
1h i
K u − U i h i
. Therefore,
g̃ n (u ) = g̃ n −1(u ) −1
n
g̃ n −1(u ) − Y p +n
1
h n K
u − U n
h n
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
43/401
4.2 Consistency and convergence rate 31
and
f̃ n
(u )=
f̃ n −
1(u )−
1
n f̃ n −1(u ) − 1h n K u − U n h n .For theother estimate, µ̄n (u ) = ḡ n (u )/ f̄ n (u ) with
ḡ n (u ) =1n
i =1 h i
n i =1
Y p +i K
u − U i h i
and
f̄ n (u ) =1
n i
=1 h i
n
i =1K
u − U i
h i
.
Both ḡ n (u ) and f̄ n (u ) can becalculated with thefollowingrecurrenceformulas:
ḡ n (u ) = ḡ n −1(u ) −h n n i =1 h i
ḡ n −1(u ) −
1
h n Y p +n K
u − U n
h n
and
f̄ n (u ) = ḡ n −1(u ) −h n n i =1 h i
f̄ n −1(u ) −
1
h n K
u − U n
h n
.
In both estimates, thestartingpoints
g̃ 1(u ) = ḡ 1(u ) =1
h 1Y p +1K
u − U 1h 1
and
f̃ 1(u ) = f̄ 1(u ) =1
h 1K
u − U 1
h 1
are thesame.
Thus, both estimates are semirecursive because their numerators and denominators
can becalculated recursively, but not they themselves.
4.2 Consistency and convergence rate
In Theorems 4.1 and 4.2, theinput signal has adensity; in Theorem4.3, itsdistribution
is arbitrary.
T HE OR E M 4.1 Let U have a density f (•) and let Em 2(U ) < ∞. Let the Borel measur- able kernel K (•) satisfy (3.4), (3.5), and (3.6) with ε = 0. Let the sequence {h n } satisfy the following restri ctions:
h n → 0 as n → ∞, (4.3)
1
n 2
n i =1
1
h i → 0 as n → ∞. (4.4)
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
44/401
32 Semirecursive kernel algorithms
Then,
µ̃n (u )→
µ(u ) as n → ∞
in probabil ity. (4.5)
at every u ∈ R where both m (•) and f (•) are continuous and f (u ) > 0. I f, (3.6) holds for some ε > 0, then the convergence takes place at every Lebesgue point u ∈ R of both m (•) and f (•), such that f (u ) > 0; a forti ori , at almost every u belonging to support of f (•).
T HEOR EM 4.2 Let U have a density f (•) and let Em 2(U ) < ∞. Let the Borel measur- able kernel K (•) satisfy (3.4), (3.5), and (3.6) with ε = 0. Let the sequence {h n } satisfy (4.3) and
∞n =1
h i = ∞. (4.6)
Then,
µ̄n (u ) → µ(u ) as n → ∞ in probabil ity. (4.7)at every u ∈ R where both m (•) and f (•) are continuous and f (u ) > 0. I f, (3.6) holds for some ε > 0, then the convergence takes place at every Lebesgue point u ∈ R of both m (•) and f (•), such that f (u ) > 0; a forti ori , at almost every u belonging to support of f (•).
Estimate(4.2)isconsistentnotonly forU havingadensitybutalsoforanydistribution.
In thenext theorem, thekernel is thesameas in Theorem3.3.
T HEOR EM 4.3 Let Em 2(U ) < ∞. Let the kernel K (•) satisfy therestri ctionsof Theorem 3.3. Let the sequence {h n } of positive numbers satisfy (4.3) and (4.6). Then, convergence (4.7) takes place at almost every (ζ ) point u ∈ R, where ζ is the probability measure of U .
Estimate (4.1) converges if the number sequence satisfies (4.3) and (4.4), while
(4.2) if (4.3) and (4.6) hold. Thus, for h n = cn −δ with c > 0, both converge if 0 <δ
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
45/401
4.2 Consistency and convergence rate 33
andfind
bias[ ˜f n (u )] = E
˜f n (u ) − f (u ) K (v)d v = 1n
n
i =1 ( f (u + vh i ) − f (u ))K (−v)d v.Applying(A.17), weobtain
bias[ f̃ n (u )] =1
n
n i =1
O
h q −1/2i
= O
1
n
n i =1
h q −1/2i
.
Recalling(4.11), wefind
E ( f̃ n (u )−
f (u ))2
= O
1
n 2 n
i =1 h q −1/2i 2
+ O 1
n 2
n
i =11
h i withthefirst termincurred by squared biasand theother by variance. Hence, for
h n ∼ n −1/2q , (4.8)that is, thesameas in (3.17) applied in theofflineestimate,
E ( f̂ (u ) − f (u ))2 = O (n −1+1/2q ),Sincethesamerateholds for ḡ n (u ), that is, E (ĝ (u ) − g (u ))2 = O (n −1+1/2q ), wefinally
obtainP {|µ̃(u ) − µ(u )| > ε|µ(u )|} = O (n −1+1/2q )
for any ε > 0, and
|µ̃(u ) − µ(u )| = O (n −1/2+1/4q ) as n → ∞ in probability.Considering estimate(4.2) next, for obvious reasons, wewrite
bias[ f̄ n (u )] =1
n i =1 h i n
i =1O (h
q +1/2i ) = O
n i =1 h
q +1/2i
n i =1 h i and, dueto(4.12),
E ( f̄ n (u ) − f (u ))2 = O
n
i =1 h q +1/2i
2n
i =1 h i 2
+ O 1n i =1 h i
,
which, for h n selected as in (4.8), becomes
E ( f̄ n (u ) − f (u ))2 = O (n −1+1/2q ).
Since E (ḡ n (u ) − g (u ))2 = O (n −1+1/2q ), wecometotheconclusionthatP {|µ̄n (u ) − µ(u )| > ε|µ(u )|} = O (n −1+1/2q )
for any ε > 0, and
|µ̄n (u ) − µ(u )| = O (n −1/2+1/4q ) as n → ∞ in probability.
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
46/401
34 Semirecursive kernel algorithms
If theq thderivatives of both f (u ) and g (u ) arebounded, using (A.18), weobtain
P {|
µ̃(u )−
µ(u )|
> ε|µ(u )
|} = O (n −1+1/(2q +1))
for any ε > 0, and
|µ̃(u ) − µ(u )| = O (n −1/2+1/(4q +2)) as n → ∞ in probability,
that is, somewhat faster convergence. Thesamerateholdsalsofor µ̄n (u ).
4.3 Simulation example
In the system as in Section 2.2.2, a = 0.5 and Z n = 0. Since µ(u ) = m (u ), we justestimate m (u ) and rewritethemin thefollowing forms:
m̃ n (u ) =
n i =1
1
h i Y 1+i K
u − U i
h i
n
i =1
1
h i K
u − U i
h i
, (4.9)and
m̄ n (u ) =
n i =1
Y 1+i K
u − U i h i
n
i =1K
u − U i
h i
. (4.10)
For the rectangular kernel and h n = n −1/5, the MISE for both estimates is showninFigure5.5 in Section 5.4. For h n = n −δ with δ varyingin theinterval [−0.25, 1.5], theerror is shownin Figures 4.1and 4.2.
4
3
2
1
0
5
-0.25 0 0.25 0.5 0.75 1 1.25 1.5
n
10 20 40
80 160 320
640 1280
δ
Figure 4.1 Estimate(4.9); MISE versus δ, variousn ; h n = n −δ (Section4.3).
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
47/401
4.4 Proofs and lemmas 35
4
3
2
1
0
5
-0.25 0 0.25 0.5 0.75 1 1.25 1.5
n
10 20 40
80 160 320
640 1280
δ
Figure 4.2 Estimate(4.10); MISE versus δ, variousn ; h n = n −δ (Section4.3).
4.4 Proofs and lemmas
4.4.1 Lemmas
ThesystemL E M M A 4.1 Let U have a probabili ty density f (•). Let Em (U ) = 0 and var[m (U )]< ∞. Let n = 0. Let kernel satisfy (3.4), (3.5). I f (3.6) holds with ε = 0, then,
suph >0,H >0
covW p +i 1h K
u − U i h
, W p
1
H K
u − U 0
H
≤ (|λp λp +i − j | + |λp λp −i + j | + |λp +i − j λ p −i + j |)ρ(u ),
where ρ(u ) is finite at every continuity point u of both m (•) and f (•). I f ε > 0, the property holds at almost every u ∈ R.
Proof. As W p +i = p +i q =−∞ λp +i −q m (U q ) and W p = p r =−∞ λp −r m (U r ), the covari-ancein theassertion equals (seeLemmaC.2)
p +i q =−∞
p r =−∞
λp +i −q λ p −r cov
m (U q )1
h K
u − U i
h
, m (U r )
1
H K
u − U 0
H
,
whichis equal to
= λp λp +i − j 1
h E K u − U h 1H E m 2(U )K u − U H + λ p λp −i + j
1
h E
K
u − U
h
1
h E
m 2(U )K
u − U
h
+ λ p +i − j λp −i + j
1
h E
m (U )K
u − U
h
1
H E
m (U )K
u − U
H
.
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
48/401
36 Semirecursive kernel algorithms
Let u beapointwherebothm (•) and f (•) arecontinuous. Itsufficestoapply LemmaA.8to find that thefollowing formulas
suph =0
1h E K u − U h ,suph =0
E
m (U )1h K
u − U h
,suph =0
E
m 2(U )1h K
u − U h
,are finite at every continuity point of both m (•) and f (•). The “almost everywhere”versionof thelemmacan beverified in asimilar way.
In thenext lemma, U has an arbitrary distribution.
L E M M A 4.2 Let E m (U ) = 0and var[m (U )] < ∞. I f the kernel satisfies the restr icti ons of Theorem 3.3, then
suph >0,H >0
cov
W n +p K
u − U n
h
, W p K
u − U 0
H
E K
u − U
h E K
u − U
H
≤ (|λp λp +i − j | + |λ p λp −i + j | + |λp +i − j λp −i + j |)η(u ),
where η(u ) is finite at almost every (ζ ) u ∈ R, where ζ i s the distribution of U .Proof. It suffices toapply argumentsused in theproof of Lemma3.3.
Number sequencesL E M M A 4.3 If (4.3) and (4.4) hold, then
limn →∞
1
n 1
n 2
n i =1
1
h i
= 0.
Proof. From
n 2 =
n i =1
h 1/2i
1
h 1/2i
2≤
n i =1
h i
n i =1
1
h i
it followsthat
1n
1
n 2
n i =1
1
h i
≤ 1n
n i =1
h i ,
whichconverges tozero as n → ∞.
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
49/401
4.4 Proofs and lemmas 37
L E M M A 4.4( TOEPLITZ) I f n
i =1 a n → ∞ and x n → x as n → ∞, then
n i =1
a n x n
n i =1
a n
→ x as n → ∞.
Proof. The proof is immediate. For any ε > 0, there exists N such that |x n | < ε forn > N . Hence,
n i =1 a n x n n i =1 a n
− x = N i =1 a n (x n − x )n i =1 a n
+ n i =N +1 a n (x n − x )n i =1 a n
,
wherethefirsttermisboundedinabsolutevalueby c /n
i =1 a n forsomec , andtheotherby ε.
4.4.2 Proofs
Proof of Theorem4.1Wegivethecontinuousversionof theproof. Toverify the“almost everywhere” version,
it suffices toapply LemmaA.9 rather than LemmaA.8.Supposethat bothm (•) and f (•) arecontinuous at u ∈ R . Westart fromtheobserva-tion that
E g̃ n (u ) =1
n
n i =1
1
h i E
E
Y p | U 0
K
u − U 0
h i
= 1n
n i =1
1
h i E
µ(U )K
u − U
h i
.
Since
1
h i E
µ(U )K
u − U
h i
→ g (u )
K (v)d v as i → ∞,
(see Lemma A.8) we conclude that E g̃ n (u ) → g (u )
K (v)d v as n → ∞, where,according toour notation, g (u ) = µ(u ) f (u ).
To examinevariance, wewritevar[g̃ n (u )] = P n (u ) + Q n (u ) + R n (u ) with
P n (u ) = σ 2Z 1n 2n
i =1
1h 2i
varK u − U h i
,
Q n (u ) =1
n 2
n i =1
var
W p
1
h i K
u − U 0
h i
,
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
50/401
38 Semirecursive kernel algorithms
and
R n (u ) =1
n 2
n
i =1
n
j =1 j =i
covW p +i 1h i K u − U i h i , W p + j 1h j K u − U j h j = 1
n 2
n i =1
n j =1 j =i
cov
W p +i − j
1
h i K
u − U i − j
h i
, W p
1
h j K
u − U 0
h j
.
Since
P n (u ) = σ 2Z 1
n 2
n
i =11
h i 1
h i E K 2
u − U
h i − h i
1
h 2i E 2K
u − U
h i ,
usingLemmaA.8, wefindthequantity insquarebracketsconvergesto f (u )
K 2(v)d v
asi → ∞.Noticingthat∞n =1 h −1n = ∞ andapplyingToeplitzLemma4.4,weconcludethat
11
n 2
n i =1
1h i
P n (u ) → σ 2Z f (u )
K 2(v)d v as n → ∞.
For thesamereasons, observing
Q n (u ) =1
n 2
n i =1
1
h i 1h i E φ(U )K 2u − U h i − h i 1h 2i E 2K u − U h i ,
whereφ(•) is as in (2.7), weobtain1
1n 2
n i =1
1h i
Q n (u ) → φ(u ) f (u )
K 2(v)d v as n → ∞.
Moreover, usingLemma4.1,
|R n (u )
| ≤1
n 2ρ(u )
n
i =1n
j =1(|λ p λp +i − j | + |λp λp −i + j | + |λp +i − j λp −i + j |)≤ 1
n ρ(u )(max
n |λn |)
∞n =1
|λn | = O
1
n
.
Using Lemma 4.3, we conclude that R n (u ) vanishes faster than both P n (u ) and Q n (u )
andthen weobtain
11
n 2 n i =1
1h i
var[g̃ n (u )] → (σ 2Z + φ(u ) f (u ))
K 2(v)d v as n → ∞. (4.11)
For similar reasons, E f̃ n (u ) → f (u ) K (v)d v as n → ∞, and1
1n 2
n i =1
1h i
var[ f̃ n (u )] → f (u )
K 2(v)d v as n → ∞,
whichcompletes theproof.
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
51/401
4.4 Proofs and lemmas 39
Proof of Theorem4.2Supposethat bothm (•) and f (•) arecontinuous at apoint u ∈ R . Evidently,
E ḡ n (u ) = 1n i =1 h i
n i =1
h i 1
h i E
E
Y p | U 0 K u − U 0h i
= 1n
i =1 h i
n i =1
h i 1
h i E
µ(U )K
u − U
h i
.
Since(4.6) holds and
1
h i E
µ(U )K
u − U
h i
→ g (u )
K (v)d v as n → ∞,
(seeLemmaA.8) an applicationof Toeplitz lemma4.4 gives
E ḡ n (u ) → g (u )
K (v)d v as n → ∞.
To examinevariance, wewritevar[ḡ n (u )] = P n (u ) + Q n (u ) + R n (u ), where
P n (u ) = σ 2Z 1n
i =1 h i 2 n
i =1var
K
u − U
h i
,
Q n (u ) =1n
i =1 h i 2
n
i =1 varW p K
u
−U 0
h i ,and
R n (u ) =1n
i =1 h i 2 n
i =1
n j =1 j =i
cov
W p +i K
u − U i
h i
, W p + j K
u − U j
h j
= 1
n i
=1 h i
2
n
i =1n
j =1 j =i cov
W p +i − j K
u − U i − j
h i
, W p K
u − U 0
h j
.
Since
P n (u ) = σ 2Z 1n
i =1 h i P 1n (u )
with
P 1n (u ) =1
n i =1 h i
n i =1
h i
1
h i E K 2
u − U
h i
− h i
1
h 2i E 2K
u − U
h i
converging, dueto(4.6) and Toeplitz lemma4.4, tothesamelimit as1
h n E K 2
u − U
h n
− h n
1
h 2n E 2K
u − U
h n
,
weget P n (u )n
i =1h i → σ 2Z f (u )
K 2(v)d v as n → ∞.
8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification
52/401
40 Semirecursive kernel algorithms
For thesamereasons, observing
Q n (u ) = 1n i =1 h i
n i =1
h i 1h i
E φ(U )K 2u − U h i
− h i 1h 2i
E 2K u − U h i
,whereφ(•) is as in (2.7), weobtain Q n (u )
n i =1
h i → φ(u ) f (u )
K 2(v)d v as n → ∞.Applying Lemma4.1, weget
R n (u )n
i =1h i ≤ ρ(u )
1
n i
=1 h i
n
i =1h i
n
j =1h j (|λ p λp +i − j | + |λp λ p −i + j |
+|λp +i − j λp −i + j |) ≤ 3ρ(u )(maxn
h n )(maxn
|λn |) 1n
i =1 h i
n i =1
h i αi ,
where αi =∞
j =i −p |λi |. Since limi →∞ αi = 0, applying Toeplitz lemma 4.4, we getlimn →∞ R n (u )
n i =1 h i = 0, which means that R n (u ) vanishes faster than both P n (u )
and Q n (u ). Finally,
var[ḡ n (u )]n
i =1 h i → σ 2Z + φ(u ) f (u ) K 2(v)d v as n → ∞. (4.12)Since, for the same reasons, E f̄ n (u ) → f (u )
K (v)d v as n → ∞ and
var[ f̄ n (u )]n
i =1 h i → f (u )
K 2(v)d v as n → ∞, theproof has been completed.
Proof of Theorem4.3Eachconvergencein theproof holds for almost every (ζ ) u ∈ R . In apreparatory step,weshow that
∞n =