[Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

8/16/2019 [Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

1/401


2/401

This page intentionally left blank


3/401

Nonparametric System Identification

Presenting a thorough overview of the theoretical foundations of nonparametric sys-tems identification for nonlinear block-oriented systems, Włodzimierz Greblicki and

Mirosław Pawlak show that nonparametric regression can be successfully applied to

systemidentification, and they highlight what you can achievein doingso.

Starting with the basic ideas behind nonparametric methods, various algorithms for

nonlinear block-oriented systems of cascadeandparallel forms are discussed in detail.

Emphasis isplaced onthemost popular systems, HammersteinandWiener, whichhave

applicationsin engineering, biology, andfinancial modeling.

Algorithms usingtrigonometric, Legendre, Laguerre, and Hermiteseries areinvesti-

gated, andthekernel algorithm, itssemirecursiveversions, andfully recursivemodifica-tionsarecovered.Thetheoriesof modern nonparametricregression, approximation, and

orthogonal expansionsarealsoprovided, asarenew approachestosystemidentification.

Theauthors show how to identify nonlinear subsystems so that their characteristics can

be obtained even when little information exists, which is of particular significancefor

practical application. Detailed information about all the tools used is provided in the

appendices.

This book is aimed at researchers and practitioners insystems theory, signal process-

ing, and communications. It will also appeal toresearchers in fields suchas mechanics,

economics, andbiology, whereexperimental dataareused toobtain models of systems.

Włodzimierz Greblicki is a professor at the Instituteof Computer Engineering, Control,

and Roboticsat theWrocław University of Technology, Poland.

Mirosław Pawlak isaprofessor intheDepartment of Electrical andComputer Engineer-

ingat theUniversity of Manitoba, Canada. HewasawardedhisPh.D. fromtheWrocław

University of Technology, Poland.

Both authors have published extensively over the years in the area of nonparametrictheory andapplications.


4/401


5/401

Nonparametric System

Identification

WŁODZIMIERZ GREBLICKI

Wrocław University of Technology

MIROSŁAW PAWLAK

University of Manitoba, Canada


6/401

CAMBRIDGE UNIVERSITY PRESS

Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo

Cambridge University PressThe Edinburgh Building, Cambridge CB2 8RU, UK

First published in print format

ISBN-13 978-0-521-86804-4

ISBN-13 978-0-511-40982-0

© Cambridge University Press 2008

2008

Information on this title: www.cambridge.org/9780521868044

This publication is in copyright. Subject to statutory exception and to the provision ofrelevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.

Cambridge University Press has no responsibility for the persistence or accuracy of urlsfor external or third-party internet websites referred to in this publication, and does notguarantee that any content on such websites is, or will remain, accurate or appropriate.

Published in the United States of America by Cambridge University Press, New York

www.cambridge.org

eBook (NetLibrary)

hardback

http://www.cambridge.org/9780521868044http://www.cambridge.org/http://www.cambridge.org/9780521868044http://www.cambridge.org/


7/401

Contents

Preface page ix

1 Introduction 1

2 Discrete-time Hammerstein systems 3

2.1 Thesystem 3

2.2 Nonlinear subsystem 4

2.3 Dynamic subsystemidentification 8

2.4 Bibliographic notes 9

3 Kernel algorithms 113.1 Motivation 11

3.2 Consistency 13

3.3 Applicable kernels 14

3.4 Convergencerate 16

3.5 Themean-squared error 21

3.6 Simulationexample 21

3.7 Lemmas and proofs 24


4 Semirecursive kernel algorithms 30

4.1 Introduction 30

4.2 Consistency and convergencerate 31


4.4 Proofs and lemmas 35


5 Recursive kernel algorithms 44

5.1 Introduction 445.2 Relation tostochastic approximation 44

5.3 Consistency and convergencerate 46

5.4 Simulation example 49

5.5 Auxiliary results, lemmas, and proofs 51



8/401

vi Contents

6 Orthogonal series algorithms 59

6.1 Introduction 59

6.2 Fourier series estimate 616.3 Legendreseries estimate 64

6.4 Laguerreseries estimate 66

6.5 Hermiteseries estimate 68

6.6 Wavelet estimate 69

6.7 Local and global errors 70




7 Algorithms with ordered observations 80

7.1 Introduction 80

7.2 Kernel estimates 81

7.3 Orthogonal series estimates 85



8 Continuous-time Hammerstein systems 101

8.1 Identification problem 101

8.2 Kernel algorithm 103

8.3 Orthogonal series algorithms 106



9 Discrete-time Wiener systems 113

9.1 Thesystem 113


9.3 Dynamic subsystemidentification 119

9.4 Lemmas 1219.5 Bibliographic notes 122

10 Kernel and orthogonal series algorithms 123

10.1 Kernel algorithms 123

10.2 Orthogonal series algorithms 126




11 Continuous-time Wiener system 143

11.1 Identificationproblem 143


11.3 Dynamic subsystem 146

11.4 Lemmas 146



9/401

Contents vii

12 Other block-oriented nonlinear systems 149

12.1 Series-parallel, block-oriented systems 149

12.2 Block-oriented systems with nonlinear dynamics 17312.3 Concluding remarks 218

12.4 Bibliographical notes 220

13 Multivariate nonlinear block-oriented systems 222

13.1 Multivariatenonparametric regression 222

13.2 Additivemodeling and regressionanalysis 228

13.3 Multivariatesystems 242

13.4 Concluding remarks 248


14 Semiparametric identification 250

14.1 Introduction 250

14.2 Semiparametric models 252

14.3 Statistical inferencefor semiparametric models 255

14.4 Statistical inferencefor semiparametric Wiener models 264

14.5 Statistical inferencefor semiparametric Hammerstein models 286

14.6 Statistical inferencefor semiparametric parallel models 287

14.7 Direct estimators for semiparametric systems 290

14.8 Concluding remarks 309

14.9 Auxiliary results, lemmas, and proofs 310

14.10 Bibliographical notes 316

A Convolution and kernel functions 319

A.1 Introduction 319

A.2 Convergence 320

A.3 Applications to probability 328

A.4 Lemmas 329

B Orthogonal functions 331

B.1 Introduction 331

B.2 Fourier series 333

B.3 Legendreseries 340

B.4 Laguerreseries 345

B.5 Hermiteseries 351

B.6 Wavelets 355

C Probability and statistics 359

C.1 Whitenoise 359

C.2 Convergenceof randomvariables 361

C.3 Stochastic approximation 364

C.4 Order statistics 365

References 371

Index 387


10/401

To my wife, Helena, and my children, Jerzy, Maria, and Magdalena – WG

To my parents and family and those whom I love – MP


11/401

Preface

The aim of this book is to show that the nonparametric regression can be applied

successfully tononlinear systemidentification. It gatherswhat hasbeendoneintheareasofar andpresents main ideas, results, andsomenew recentdevelopments.

The study of nonparametric regression estimation began with works published by

Cencov, Watson, and Nadaraya in the 1960s. The history of nonparametric regression

in system identification began about ten years later. Such methods have been applied

to the identification of composite systems consistingof nonlinear memorylesssystems

and linear dynamic ones. Therefore, the approach is strictly connected with so-called

block-oriented methods developed since Narendra and Gallman’s work published in

1966. Hammerstein and Wiener structures are most popular and have received the

greatestattentionin numerous applications. Fundamental for nonparametric methods istheobservationthat theunknowncharacteristic of thenonlinear subsystemor itsinverse

can berepresented as regressionfunctions.

In terms of the a priori information, standard identification methods and algorithms

work when it is parametric, that is, when our knowledge about the system is rather

large; for example, when weknow that thenonlinear subsystemhas apolynomial char-

acteristic. Inthis book, theinformationis muchsmaller, nonparametric. Thementioned

characteristiccanbe,forexample,anyintegrableorboundedor,even,anyBorel function.

It can thus besaid that this book associates block-oriented systemidentificationwith

nonparametric regression estimation and shows how to identify nonlinear subsystems,that is, to recover their characteristics when the a priori information is small. Because

of this, the approach should be of interest not only to researchers but also to people

interested in applications.

Chapters 2–7 aredevoted to discrete-time Hammerstein systems. Chapter 2 presents

basic discussionof theHammerstein systemand itsrelationship with theconcept of the

nonparametricregression. Thenonparametrickernel algorithmispresentedinChapter3,

its semirecursive versions are examined in Chapter 4, and Chapter 5 deals with fully

recursivemodificationsderived fromtheideaof stochastic approximation. Next, Chap-

ter 6 is concerned with the nonparametric orthogonal series method. Algorithms usingtrigonometric, Legendre, Laguerre, and Hermiteseries areinvestigated. Some spaceis

devoted to estimation methods based on wavelets. Nonparametric algorithms based on

ordered observationsarepresented andexamined in Chapter 7. Chapter 8 discusses the

nonparametric algorithms when applied to continuous-timeHammerstein systems.


12/401

x Preface

TheWiener systemis identified in Chapters 9–11. Chapter 9 presents themotivation

for nonparametric algorithms that are studied in the next two chapters devoted to the

discrete and continuous-time Wiener systems, respectively. Chapter 12 is concernedwith the generalization of our theory to other block-oriented nonlinear systems. This

includes, among others, parallel models, cascade-parallel models, sandwich models,

and generalized Hammerstein systems possessing local memory. In Chapter 13, the

multivariate versions of block-oriented systems are examined. The common problem

of multivariate systems, that is, the curse of dimensionality, is cured by using low-

dimensional approximations. With respect to this issue, models of the additive form

are introduced and examined. In Chapter 14, we develop identification algorithms for

a semiparametric class of block-oriented systems. Such systems are characterized by a

mixtureof finitedimensional parameters andnonparametric functionsbeingtypically aset of univariatefunctions.

Thereader isencouragedto look into theappendices, inwhichfundamental informa-

tion about tools used in the book is presented in detail. Appendix A is strictly related

to kernel algorithms, and Appendix B is tied with the orthogonal series nonparametric

curve estimates. Appendix C recalls some facts from probability theory and presents

resultsfromthetheory of order statisticsused extensively in Chapter 7.

Over the years, our work has benefited greatly from the advice and support of a

number of friends and colleagues with interest in ideas of nonparametric estimation,

pattern recognition, and nonlinear system modeling. There are too many names to listhere, butspecial mentionisduetoAdamKrzyżak, aswell asDanutaRutkowska, Leszek

Rutkowski, Alexander Georgiev, SimonLiao, PradeepaYahampath, andYongqingXin–

our past Ph.D. students, now professorsat universitiesinCanada, theUnitedStates, and

Poland.Cooperationwiththemhasbeenagreatpleasureandgivenusalotof satisfaction.

We aredeeply indebted to Zygmunt Hasiewicz, Ewaryst Rafajłowicz, Uli Stadtm̈uller,

EwaRafajłowicz, Hajo Holzmann, and Andrzej Kozek, who havecontributed greatly

to our research in the area of nonlinear system identification, pattern recognition, and

nonparametric inference.

Last, butby nomeansleast,wewould liketothank Mount-firstNg for helpinguswitha number of typesetting problems. Ed Shwedyk and January Gnitecki have provided

support for correctingEnglishgrammar.

We also thank Anna Littlewood, fromCambridge University Press, for being a very

supportive and patient editor. Researchpresented in this monograph was partially sup-

ported by research grants from Wrocław University of Technology, Wrocław, Poland,

and NSERC of Canada.

Wrocław, Winnipeg W odzimierz Greblicki, Miros aw Pawlak

February 2008


13/401

1 Introduction

Systemidentification, as aparticular process of statistical inference, exploits two types

of information. Thefirstisexperiment;theother,calledapriori, isknownbeforemakingany measurements. In a widesense, the a priori information concernsthe systemitself

and signals entering thesystem. Elementsof theinformationare, for example:

the nature of thesignals, which may be randomor nonrandom, white or correlated,

stationary or not, their distributions can be known in full or partially (up to some

parameters) or completely unknown, general information about the system, which can be, for example, continuous or

discretein thetimedomain, stationary or not, thestructureof thesystem, whichcanbeof theHammersteinorWiener type, orother, the knowledge about subsystems, that is, about nonlinear characteristics and linear

dynamics.

In other words, the a priori information is related to the theory of the phenomena

takingplaceinthesystem(areal physical process) or canbeinterpretedas ahypothesis

(if so, resultsof theidentificationshould benecessarily validated) or can beabstract in

nature.

Thisbook dealswithsystemsconsistingof nonlinear memorylessandlinear dynamic

subsystems, for example, Hammerstein and Wiener systems and other related struc-

tures. With respect to them, the a priori information is understood in a narrow sensebecauseitrelatestothesubsystemsonly andconcernstheapriori knowledgeabouttheir

descriptions. Werefer to suchsystems as block-oriented.

The characteristic of the nonlinear subsystemis recovered with the help of nonpara-

metricregressionestimates.Thekernel andorthogonal seriesmethodsareused.Ordered

statistics arealsoapplied.Bothofflineandonlinealgorithmsareinvestigated.Weexam-

ineonly theseestimationmethodsandnonlinear modelsfor whichweareabletodeliver

fundamental resultsintermsof consistency andconvergencerates. Thereareother tech-

niques, for example, neural networks, which may exhibit a promising performancebut

their statistical accuracy is mostly unknown.For the theory of nonparametric regression, see Efromovich [78], Györfi, Kohler,

Krzyżak, and Walk [140], Härdle [150], Prakasa Rao [241], Simonoff [278], or Wand

and Jones [310]. Nonparametric wavelet estimates are discussed in Antoniadis and

Oppenheim[6], Härdle, Kerkyacharian, Picard, and Tsybakov [151], Ogden [223], and

Walter andShen [308].


14/401

2 Introduction

Parametric methods are beyond the scope of this book; nevertheless, we mention

Brockwell and Davies [33], Ljung [198], Norton [221], Zhu [332], and Söderströmand

Stoica[280].Nonlinear systemidentification within the parametric framework is studied by Nells

[218], Westwick and Kearney [316], Marmarelis and Marmarelis [207], Bendat [16],

andMathewsandSicuranza[208]. Thesebookspresentidentification algorithms based

mostly on the theory of Wiener and Volterra expansions of nonlinear systems. A com-

prehensivelistof referencesconcerningnonlinear systemidentificationandapplications

has been given by Giannakis and Serpendin [102], see also the 2005 special issue on

systemidentification of theIEEE Trans. onAutomatic Control [199]. A nonparametric

statistical inference for time series is presented in Bosq [26], Fan and Yao [89], and

Györfi, Härdle, Sarda, and Vieu [139].It should bestressed that nonparametric andparametric methods are supposed to be

appliedindifferentsituations. Thefirstareusedwhentheapriori informationisnonpara-

metric, that is, when wewish to recover an infinite-dimensional object with underlying

assumptions as weak as possible. Clearly, in such a case, parametric methods can only

approximate, but not estimate, the unknown characteristics. When the information is

parametric, parametricmethodsarethenatural choice. If, however, theunknowncharac-

teristic is acomplicated functionof parameters convergenceanalysis becomes difficult.

Moreover, serious computational problems can occur. In such circumstances, one can

resort tononparametric algorithmsbecause, fromthecomputational viewpoint, they arenot discouraging. On thecontrary, they aresimple but consumecomputer memory, be-

cause, for example, kernel estimates requireall datatobestored. Neverthelessit can be

saidthatthetwoapproachesdonotcompetewitheachother sincetheyaredesignedtobe

applied inquitedifferent situations. Thesituations differ fromeachother by theamount

of theapriori information abouttheidentifiedsystem. However, acompromisebetween

thesetwo separateworldscanbemadeby restrictingaclassof nonparametric modelsto

thosethat consistof afinitedimensional parameter andnonlinear characteristics, which

runthroughanonparametric classof univariatefunctions. Such semiparametric models

canbeefficiently identified, andthetheory of semiparametric identification isexaminedin this book. The methodology of semiparametric statistical inference is examined in

Härdle, Müller, Sperlich, and Werwatz [152], Ruppert, Wand, and Carroll [259], and

Yatchev [329].

For two number sequences a n and b n , a n = O (b n ) means that a n /b n is bounded inabsolutevalueas n → ∞. In particular, a n = O (1) denotes that a n is bounded, that is,thatsupn |a n | < ∞. Writinga n ∼ b n , wemeanthata n /b n hasanonzerolimitasn → ∞.

Throughout the book, “almost everywhere” means “almost everywhere with respect

totheLebesguemeasure,” whereas“almosteverywhere(µ)” means“almosteverywhere

withrespect to themeasureµ.”


15/401

2 Discrete-time Hammerstein systems

In this chapter, wediscusssome preliminary aspects of the discrete-time Hammerstein

system. In Section2.1 weformtheinput–output equations of thesystem. A fundamen-tal relationship between the system nonlinearity and the nonparametric regression is

established in Section 2.2. The use of the correlation theory for recovering the linear

subsystemis discussed in Section2.3.

2.1 The system

A Hammersteinsystem, showninFigure2.1, consistsof anonlinearmemorylesssubsys-

temwithacharacteristic m (•) followedbyalineardynamiconewithanimpulseresponse{λn }. Theoutputsignal W n of thelinear partisdisturbedby Z n andY n = W n + Z n istheoutput of thewholesystem. Neither V n nor W n is availabletomeasurement. Our goal is

to identify thesystem, that is, torecover both m (•) and {λn }, fromobservations(U 1, Y 1) , (U 2, Y 2) , . . . , (U n , Y n ) , . . . (2.1)

taken at theinput and output of thewhole system.

Signalscomingtothesystem, thatis,theinput{. . . , U −1, U 0, U 1, . . .} anddisturbance{. . . , Z −1, Z 0, Z 1, . . .} are mutually independent stationary whiterandomsignals. Thedisturbancehaszeromeanandfinitevariance, thatis, E Z n = 0andvar[Z n ] = σ

2Z < ∞.

Regarding the nonlinear subsystem, we assume that m (•) is a Borel measurablefunction. Therefore, V n is a randomvariable. The dynamic subsystem is described by

thestateequation X n +1 = A X n + bV n

W n = c T X n ,(2.2)

where X n is astatevector at timen , A is amatrix, b and c arevectors. Thus,

λn = 0, for n = 0, −1, −2, . . .c T An −1b , for n = 1, 2, 3, . . . ,and

W n =n

i =−∞λn −i m (U i ). (2.3)


16/401


Z n

Y nU n W nV n

{λn}m(•)

Figure 2.1 Thediscrete-timeHammerstein system.

Neitherb norc isknown.Thematrix A anditsdimensionarealsounknown.Nevertheless,

thematrix A is stable, all its eigenvalues liein theunit circle. Therefore, assuming that

Em 2(U ) < ∞, (2.4)the time index at U is dropped, we conclude that both X n as well as W n are random

variables.Clearlyrandomprocesses

{. . . , X −1, X 0, X 1, . . .

}and

{. . . , W −1, W 0, W 1, . . .

}arestationary.Consequently,theoutputprocess{. . . , Y −1, Y 0, Y 1, . . .} isalsoastationarystochastic process. Therefore, theproblemis well posed in thesensethat all signals are

random variables. In the light of this, we estimate both m (•) and {λn } from randomobservations (2.1).

Therestrictionsimposedonthesignalsenteringthesystemandbothsubsystemsapply

whenever the Hammerstein system is concerned. They will not be repeated in further

considerations, neither lemmas nor theorems.

Input randomvariables U n s may havea probability density denoted by f (•) or maybe distributed quitearbitrarily. Nevertheless (2.4) holds. It should be emphasized that,

apart fromfew cases, (2.4) is theonly restrictionin whichthenonlinearity is involved.Assumption (2.4) is irrelevant to identification algorithms and has been imposed for

only onereason: toguaranteethat bothW n andY n arerandomvariables. Neverthelessit

certainly has an influenceontherestrictions imposed onboth m (•) and thedistributionof U to meet (2.4). If, for example, U is bounded, (2.4) is satisfied for any m (•). Therestrictionalso holds, if E U 2 < ∞ and|m (u )| ≤ α + β|u | withany α, β. Inyet anotherexample, EU 4 < ∞ and|m (u )| ≤ α + βu 2. ForGaussianU and|m (u )| ≤ W (u ), whereW is an arbitrary polynomial, (2.4) is also met. Anyway, theapriori informationabout

the characteristic is nonparametric becausem (

•) cannot be represented in aparametric

form. This is becausetheclass of all possible characteristicsis very wide. The family of all stable dynamic subsystems also cannot be parameterized, because

its order is unknown. Therefore, the a priori information about the impulse response

is nonparametric, too. To form a conclusion we infer about both subsystems under

nonparametric apriori information.

In the following chapters, for simplicity, U , W , Y , and Z stand for U n , W n , Y n , and

Z n , respectively.

2.2 Nonlinear subsystem

2.2.1 The problem and the motivation for algorithms

Fix p ≥ 1 and observethat, sinceY p = Z p +p

i =−∞ λp −i m (U i ) and {U n } is a whiteprocess,

E

Y p |U 0 = u = µ(u ),


17/401


Z n

Y nU n W nV n

{λn}

S n

β

ρ(•)

Figure 2.2 Theequivalent Hammerstein system.

where

µ(u ) = λp m (u ) + α p with αp = E m (U )

∞i =1,i =p λi . Estimatingtheregression E

Y p |U 0 = u

, wethus re-

cover m (•) uptosomeunknownconstants λp and αp . If E m (U ) = 0, whichis thecase,for example, when the distribution of U is symmetrical with respect to zero and m (•)is an even function then α p = 0 and we estimate m (•) only up to the multiplicativeconstant λp .

SinceY p +n = µ(U n ) + ξ p +n + Z p +n withξ p +n =p +n

i =−∞,i =n λ p +n −i m (U i ), it canbesaid that weestimateµ(u ) frompairs

(U 0, Y p ), (U 1, Y p +1), . . . , (U n , Y p +n ), . . . ,

and that the regression µ(u ) is corrupted by the noise Z p +n + ξ p +n . The first compo-nent of noise is white with zero mean. Because of dynamics the other noise compo-

nent is correlated. Its mean E ξ n = αp is usually nonzero and the variance is equal tovar[m (U )]

∞i =1,i =p λ

2i . Thus, main difficulties in the analysis of any estimate of µ(•)

are caused by the correlation of {ξ n }, that is, the system itself but not by the whitedisturbance Z n comingfromoutside.

Every algorithmestimating the nonlinearity in Hammerstein systems studied in this

book,theestimateisdenotedhereasµ̂(U 0, . . . , U n ; Y p , . . . , Y p +n ), islinear withrespecttooutputobservations, whichmeansthat

µ̂(U 0, . . . , U n ; θ p + η p , . . . , θ p +n + ηp +n )= µ̂(U 0, . . . , U n ; θ p , . . . , θ p +n ) + µ̂(U 0, . . . , U n ; η p , . . . , ηp +n ) (2.5)

andhas anatural property that, for any number θ ,

µ̂(U 0, . . . , U n ; θ , . . . , θ ) → θ as n → ∞ (2.6)

in an appropriatestochastic sense. This property, or rather itsconsequence, is exploited

when proving consistency. To explain this, observethat with respect to U n and Y n , the

identified system shown in Figure 2.1 is equivalent to that in Figure 2.2 with nonlin-

earity ρ (u )

=m (u )

− E m (U ) and an additional disturbance β

= E m (U )

∞i

=1 λi . In

theequivalent system, E ρ(U ) = 0 and E {Y p |U 0 = u } = µ(u ). From(2.5) and (2.6), itfollowsthat

µ̂(U 0, . . . , U n ; Y p , . . . , Y p +n ) = µ̂(U 0, . . . , U n ; S p + β , . . . , S p +n + β)= µ̂(U 0, . . . , U n ; S p , . . . , S p +n )

+ µ̂(U 0, . . . , U n ; β , . . . , β)


18/401


with µ̂(U 0, . . . , U n ; β , . . . , β) → β as n → ∞. Hence, if µ̂(U 0, . . . , U n ; S p , . . . , S p

+n )

→ E

{S p

|U 0

=u }

, as n → ∞

,

wehave

µ̂(U 0, . . . , U n ; Y p , . . . , Y p +n ) → E {Y p |U 0 = u }, as n → ∞,whereconvergenceis understoodin thesamesenseas that in (2.6).

Thus, if theestimaterecovers theregression E {S p |U 0 = u } fromobservations(U 0, S p ), (U 1, S 1+p ), (U 2, S 2+p ), . . . ,

it also recovers E

{Y p

|U 0

=u

}from

(U 0, Y p ), (U 1, Y 1+p ), (U 2, Y 2+p ), . . . .

We can say that if the estimate works properly when applied to the system with input

U n and output S n (in which E ρ(U ) = 0), it behaves properly also when applied to thesystemwithinputU n andoutput Y n (in which Em (U ) may benonzero).

Theresult of thereasoning is given in thefollowing remark:

R E M A R K 2.1 Let an estimate have properties (2.5) and (2.6). I f the estimate is consistent

for Em (U ) = 0, then it i s consistent for E m (U ) = 0, too.

Owing to the remark, with no loss of generality, in all proofs of consistency of algorithms recoveringthenonlinearity, weassumethat E m (U ) = 0.

In parametric problems thenonlinearity is usually apolynomial m (u ) = α0 + α1u +· · · + αq u q of afixeddegreewithunknowntruevaluesof parametersα0, . . . , αq . There-fore, toapply parametric methods, wemust haveagreat deal moreapriori information

about the subsystem. It seems that in many applications, it is impossible to represent

m (•) in aparametric form.Sincethesystemwiththefollowing ARM A typedifferenceequation:

wn + a k −1wn −1 + · · · + a 0wn −k = b k −1m (u n −1) + · · · + b 0m (u n −k )can bedescribed by (2.2), all presented methods can beused torecover thenonlinearity

m (•) in theprevious ARM A system.It will beconvenient todenote

φ(u ) = E

W 2p |U 0 = u

. (2.7)

Since W p =p −1

i =−∞ λi m (U i ), denoting c 0 = E m 2(U )∞

i =1,i =p λ2i + E 2m (U )

(

∞i =1,i =p λi )

2, c 1 = 2λp Em (U )

∞i =1,i =p λi , and c 2 = λ2p , wefind

φ (u ) = c 0 + c 1m (u ) + c 2m 2(u ). To avoid complicated notation, we do not denote explicitly the dependence of the

estimated regressionandother functions on p and simply writeµ(•) and φ(•).Results presented in further chapters can be easily generalized onthe systemshown

inFigure2.3, where{. . . , ξ 0, ξ 1, ξ 2, . . .} isanother zeromeannoise. Moreover,{Z n } can


19/401


Z n

Y nU n

{λn}m(•)

ξ n

Figure 2.3 Possiblegeneralizationof thesystemshownin Figure2.1.

be correlated, that is, it can be the output of a stable linear dynamic systemstimulated

by whiterandomnoise. So can {ξ n }.It is worth noting that aclass of stochastic processes generated by theoutput process

{Y n } of the Hammerstein systemis different fromthe classof strong mixing processesconsideredextensivelyinthestatistical literatureconcerningthenonparametricinference

fromdependent data, see, for example, [26] and [89]. Indeed, theARMA process {X n }in which X n +1 = a X n + V n , where0


20/401


cloud of 200 input–output observations, we infer fromis presented in Figure 2.4. The

quality of eachestimate, denoted hereby m̂ (u ), is measured with

MISE = 3−3

(m̂ (u ) − m (u ))2du .

2.3 Dynamic subsystem identification

Passingtothedynamicsubsystem,weuse(2.3)andrecall E Z n = 0tonoticeE {Y i U 0} =

i j =−∞ λi − j E {m (U i )U 0} = λi E {m (U )U }. Denoting κi = λi E {U m (U )} , weobtain

κi = E {Y i U 0} ,

whichcan beestimated in thefollowingway:

κ̂i =1

n

n −i j =1

Y i + j U j .

T HEOR EM 2.1 For any i ,

limn →∞

E (κ̂i −

κi )2

=0.

Proof. The estimate is unbiased, that is, E κ̂i = E {Y i U 0} = κi . Moreover, var[κ̂i ] =P n + Q n + R n with

P n =1

n 2var

n j =1

Z i + j U j

= 1n 2

n j =1

var

Z i + j U j = 1

n σ 2Z EU

2,

Q n =1

n var[W i U 0] ,

and

R n =1

n 2

n j =1

n j =1, j =i

cov

W i + j U j , W i +m U m

= 1n 2

n j =1

(n − j ) covW i + j U j , W i U 0 .Since W i =

i j =−∞ λi − j m (U j ), Q n = n −1λ2i var[m (U )U ]. For the same reason, for

j > 0,

cov

W i + j U j , W i U 0 = i + j

p =−∞

i q =−∞

λi + j −p λi −q cov

m (U p )U j , m (U q )U 0

= E 2{U m (U )}λi + j λi − j


21/401


SeeLemmaC.3 in Appendix C, which leads to

|R n | ≤1

n 2 E 2

{U m (U )}n

j =1(n − j )|λi + j λi − j | ≤

1

n E 2

{U m (U )} maxs |λs |∞

j =1|λ j |.

Thus,

E (κ̂i − κi )2 = var[κ̂i ] = O

1

n

(2.8)

whichcompletes theproof.

Thetheoremestablishesconvergenceof thelocal error E (κ̂i − κi )2 tozeroasn → ∞.As an estimate of the whole impulse response

{κ1, κ2, κ3, . . .

}, we take a sequence

{κ̂1, κ̂2, κ̂3, . . . , κ̂N (n ), 0, 0, . . .} andfindthemeansummedsquareerror (MSSE) isequalto

MSSE(κ̂) =N (n )i =1

E (κ̂i − κi )2 +∞

i =N (n )+1κ2i .

From(2.8), it followsthat theerror is not greater than

O

N (n )

n +

∞

i =N (n )+1κ2i .

Therefore, if N (n ) → ∞ as n → ∞ and N (n )/n → 0 as n → ∞,lim

n →∞MSSE(κ̂) = 0.

The identity λs τ = E {Y s U 0}, whereτ = E {U m (U )}, allows us to forma nonpara-metric estimateof the linear subsystemin the frequency domain. Indeed, formation of

theFourier transformof theidentity yields

(ω)τ = S YU (ω), |ω| ≤ π, (2.9)where S YU (ω) = ∞s =−∞ κs e −i s ω is thecross-spectral density functionof theprocesses{Y n } and {U n }. Moreover,

(ω) =∞

s =0λs e

−i s ω

is the transfer function of the linear subsystem. Note also that if λ0 = 1, then τ = κ0.See Chapter 12 for further discussion on the frequency domain identification of linear

systems.

2.4 Bibliographic notes

Various aspects of parametric identification algorithms of discrete-time Hammerstein

systems have been studied by Narendra and Gallman [216]; Haist, Chang, and Luus


22/401


[142], Thatchachar and Ramaswamy [289], Kaminskas [175], Gallman [92], Billings

[19],BillingsandFakhouri [20,24],ShihandKung[276], KungandShih[190], Liaoand

Sethares[195],VerhaegenandWestwick[301],Giri,Chaoui,andRochidi [103],NinnessandGibson[220], Bai [11,12], andVörös[305].Theanalysisof block–orientedsystems

and, in particular, Hammerstein ones, useful for variousaspects of identificationand its

applicationscanbefoundinBendat [16], Chen[45], MarmarelisandMarmarelis[207],

Mathewsand Sicuranza[208], Nells [218], and Westwick and Kearney [316].

SometimesresultsconcerningHammersteinsystemsaregiven,howevernotexplicitly,

in works devoted to morecomplicated Hammerstein–Wiener or Wiener–Hammerstein

structures, see, for example, Gardiner [94], Billings and Fakhouri [22, 23], Fakhouri,

Billlings,andWormald[86], Hunter andKorenberg[168], KorenbergandHunter [177],

Emara-ShaBaik, Moustafa, andTalaq [79], Boutayeb andDarouach [27], Vandersteen,Rolain, and Schoukens [296], Bai [10], Bershad, Celka, and McLaughlin [18], and

Zhu[333].

The nonparametric approach offers a number of algorithms to recover the charac-

teristics of the nonlinear subsystem. The most popular kernel estimate can be used

in the offline version, see Chapter 3. For semirecursive and fully recursive forms, see

Chapter 4 and Chapter 5, respectively. Nonparametric orthogonal series identification

algorithms,seeChapter 6,utilizetrigonometric, Legendre, Laguerre,Hermitefunctions

or wavelets. Bothclassesof estimatescanbemodifiedtouseorderedinputobservations

(seeChapter 7), whichmakes theminsensitivetotheroughness of theinput density. The Hammerstein model has been used in various and diverse areas. Eskinat, J ohn-

son, and Luyben [82] applied it to describe processes in distillation columns and heat

exchangers.Thehysteresis phenomenoninferriteswasanalyzedby HsuandNgo[166],

pH processes were analyzed by Patwardhan, Lakshminarayanan, and Shah [227], bio-

logical systems werestudiedby Hunter andKorenberg[168], andEmerson, Korenberg,

andCitron[80] described someneuronal processes. Theuseof theHammersteinmodel

for modeling aspects of financial volatility processes is presented in Capobianco [38].

In Giannakis and Serpendin [102] a comprehensive bibliography on nonlinear system

identification is given, see also the 2005 special issue on system identification of theIEEE Trans. onAutomatic Control [199].

It is also worth noting that the concept of the Hammerstein model originates from

thetheory of nonlinear integral equationsdevelopedby Hammersteinin1930[148], see

alsoTricomi [292].


23/401

3 Kernel algorithms

Thekernel algorithmisjustthekernel estimateof aregressionfunction. Thisisthemost

popularnonparametricestimationmethodandisvery convenientfromthecomputationalviewpoint. In Section 3.1, an intuitivemotivation for the algorithmis presented and in

Section 3.2, itspointwise consistency is shown. Some results hold for any input signal

density, that is, are density-free; some are even distribution-free, that is, they hold for

any distributionof theinput signal. In Section3.3, theattention is focused on aclass of

applicablekernel functions. Theconvergencerateis studied in Section3.4.

3.1 Motivation

It is obvious that

limh →0

1

2h

u +h u −h

µ(v) f (v)d v = µ(u ) f (u )

at every continuity point u ∈ R of both m (•) and f (•), since µ(u ) = λp m (u ) + α p .Becausetheformulacan berewritten in thefollowingform:

limh →0

µ(v) f (v)1

h K

u − v

h d v = µ(u ) f (u ), (3.1)

where

K (u ) =

1

2, for |u |


24/401

12 Kernel algorithms

-1 1

0.5

Figure 3.1 Rectangular kernel (3.2).

converges to

µ(v) f (v)δ(u − v)d v = µ(u ) f (u ) as h → 0.Becauseµ(u ) = E Y p |U 0 = u , weget

µ(u ) f (v)1

h K

u − v

h

d v = 1

h

E

Y p |U 0 = v

K

u − v

h

f (v)d v

= 1h

E Y p K u − U 0h

,whichsuggeststhefollowingestimateof µ(u ) f (u ):

1

nh

n i =1

Y p +i K

u − U i h

.

For similar reasons,

1nh

n i =1

K u − U i h

is agood candidatefor an estimateof

f (v)1

h K

u − v

h

d v,

which converges to f (u ) as h → 0. Thus,

µ̂(u ) =

n i =1

Y p +i K u − U i h n n

i =1K

u − U i

h n

(3.3)with h n tending to zero, is a kernel estimate of µ(u ). The parameter h n is called a

bandwidth. Notethat theaboveformulaisof theratio formandwealwaystreat thecase

0/0 as 0.

In light of this, crucial problems are the choice of the kernel K (•) and the number

sequence {h n }. Fromnow on, wedenote g (u ) = µ(u ) f (u ).Itisworthmentioningthatthereisawiderangeof kernel estimates[88,140,172]avail-ablefor findingacurveindata. Themostprominentare: theclassical Nadaraya–Watson

estimator, defined in (3.3), local linear and polynomial kernel estimates, convolution

type kernel estimates, andvarious recursivekernel methods. Some of thesetechniques

arethoroughly examined in this book.


25/401

3.2 Consistency 13

3.2 Consistency

On thekernel function, thefollowingrestrictions areimposed:

sup−∞ 0.

Thenext theoremis the“almost everywhere” versionof Theorem3.1. Therestriction

imposed on the kernel and number sequenceare the same as in Theorem 3.1 with the

only exceptionthat (3.6) holds withsomeε > 0 but not with ε = 0. T HE OR E M 3.2 Let U have a probabil ity density f (•) and let Em 2(U ) < ∞. Let the Borel measurable satisfy (3.4), (3.5), and (3.6) with some ε > 0. Let the sequence {h n }of positive numbers satisfy (3.7) and (3.8). Then, convergence (3.9) takes place at every

Lebesgue point u ∈ R of both m (•) and f (•), where f (u ) > 0, and, a forti ori , at almost every u where f (u ) > 0, that i s, at almost every u belonging to support of f (•).Proof. Theproof is very muchlikethat of Theorem3.1. Thedifferenceis that weapply

LemmaA.9 rather than LemmaA.8.

The algorithm converges also when the input signal has not a density, when the

distribution of U is of any shape. Theproof of thetheoremis in Section3.7.1.


26/401


T HEOR EM 3.3 Let E m 2(U ) < ∞. Let H (•) be a nonnegative nonincreasing Borel function defined on [0, ∞), continuous and positi ve at t = 0 and such that

t H (t ) → 0 as t → ∞.

Let, for some c 1 and c 2,

c 1H (|u |) ≤ K (u ) ≤ c 2H (|u |).

Let the sequence {h n } of positive numbers satisfy (3.7) and (3.8). Then convergence(3.9) takes place at almost every (ζ ) u ∈ R, where ζ i s the probabil ity measure of U .

Restrictions (3.7) and (3.8) are satisfied by a wide class of number sequences. If

h n = cn −δ withc > 0, they aresatisfiedfor 0 < δ 0. The other does it for every Lebesgue point of both m (

•) and f (

•), that is,

for almost every (with respect to the Lebesgue measure) u where f (u ) > 0, that is, atalmost every (ζ ) point. In Theorem3.3 the kernel satisfies restrictions (3.4), (3.5), and

(3.6) with ε = 0. In Theorems 3.1and 3.2, (3.6) holdswith ε > 0.If both m (•) and f (•) are bounded and continuous, we can apply kernels satisfying

only (3.4) and (3.5), see Remark 3.1. In Theorem3.3, U has an arbitrary distribution,

whichmeans that it may nothaveadensity.

Inthelightof thistoachieveconvergenceatLebesguepointsand, afortiori, continuity

points, wecan apply thefollowingkernel functions:

therectangular kernel (3.2), thetriangle kernel

K (u ) =

1− |u |, for |u |


27/401

3.3 Applicable kernels 15

-10 10

0.3

Figure 3.2 Gauss–Weierstrass kernel (3.10).

thePoisson kernel

K (u ) =1

π

1

1+ u 2, theFejér kernel (seeFigure3.3)

K (u ) = 1π

sin2 u

u 2 , (3.11)

theLebesguekernel

K (u )=

1

2

e −|u |.

All thesekernels satisfy (3.4), (3.5), and (3.6) for someε > 0. Thekernel

K (u ) =

1

4e , for |u | ≤ e

1

4|u | ln2 |u | , otherwise,(3.12)

satisfies (3.4), (3.5), and (3.6) withε = 0 only. In turn, kernels

K (u ) =1

π

sinu

u , (3.13)

(seeFigure3.4) and

K (u ) =

2

πcosu 2, (3.14)

-10 10

0.3

Figure 3.3 Fejér kernel (3.11).


28/401


-10 10

-0.1

0.3

Figure 3.4 Kernel (3.13).

(seeFigure3.5), satisfy (3.4) and(3.5), but not(3.6), even withε = 0. Forall presentedkernels, K (u )du = 1.Observethattheycanbecontinuousornotandcanhavecompactor unbounded support.

Noticethat Theorem3.3 admitsthefollowingone:

K (u ) =

1

e , for |u | ≤ e

1

|u | ln|u | , otherwise,

for which

K (u )du = ∞. Restrictions imposed by the theoremareillustrated in Fig-

ure3.6.

3.4 Convergence rate

In this section, both the characteristic m (•) and an input density f (•) are smoothfunctions and have q derivatives. Proper selection of the kernel and number sequence

increases thespeed wheretheestimateconverges. Wenow findtheconvergencerate.

In our analysis, thekernel satisfies thefollowingadditional restrictions:

vi K (v)d v = 0, for i = 1, 2, . . . , q − 1, (3.15)and

|vq −1/2K (v)|d v < ∞, (3.16)

-2 2

-0.4

0.4

Figure 3.5 Kernel (3.14).


29/401

3.4 Convergence rate 17

K (u)

c2H (u)

c1H (u)

Figure 3.6 A kernel satisfyingrestrictions of Theorem3.3.

seetheanalysis in Section A.2.2. For simplicity of notation,

K (v)d v = 1. For afixed

u , weget

E f̂ (u ) = 1h n

f (v)K

u − v

h n

d v =

f (u + vh n )K (−v)d v,

whichyields

bias[ f̂ (u )] = E f̂ (u ) − f (u ) =

( f (u + vh n ) − f (u ))K (−v)d v.

Assuming that f (q )(•) is square integrable and applying (A.17), we find bias[ f̂ (u )]

= O (h q

−1/2

n ). Wenext recall (3.27) and writevar[ ˆf (u )] = O (1/nh n ), which leads to

E ( f̂ (u ) − f (u ))2 = O (h 2q −1n ) + O

1

nh n

.

Thus, selecting

h n ∼ n −1/2q , (3.17)

wefinally obtain

E ( ˆf (u ) − f (u ))

2

= O (n −1

+1/2q

).Needless to say that if the q th derivative of g (u ) is square integrable, for the same

reasons, E (ĝ (u ) − g (u ))2 is of the sameorder. Hence, applying Lemma C.9, wefinallyobtain thefollowingconvergencerate:

P {|µ̂(u ) − µ(u )| > ε|µ(u )|} = O (n −1+1/2q )

for any ε > 0, and

|µ̂(u )

−µ(u )

| = O (n −1/2+1/4q ) as n

→ ∞in probability.

If f (q )(u ) is bounded, bias[ f̂ (u )] = O (h q n ), see(A.18); and, for

h n ∼ n −1/(2q +1),

E ( f̂ (u ) − f (u ))2 = O (n −1+1/(2q +1)).


30/401


-4 -2 2 4

0.6

Figure 3.7 Kernel G 4.

If, in addition, the q th derivative of g (u ) is bounded, E (ĝ (u ) − g (u ))2 is of the sameorder and, as aconsequence,

P {|µ̂(u ) − µ(u )| > ε|µ(u )|} = O (n −1+1/(2q +1))

for any ε > 0, and

|µ̂(u ) − µ(u )| = O (n −1/2+1/(4q +2)) as n → ∞ in probability,

whichmeans that therateis slightly better.

The rate O (n −q /(2q +1)) in probability, obtained aboveis known to be optimal withintheclass of q differentiableinputdensitiesandnonlinear characteristics, see[285].

It is not difficult to construct kernels satisfying (3.15) such that K (v)d v = 1. Forexample, starting from the Gauss–Weierstrass kernel (3.10) denoted now as G (•) weobserve that

u i G (u )du = 0 for odd i , and u i G (u )du = 1× 3× · · · × (i − 1) for

even i . Thus, for

G 2(u ) = G (u ) =1√ 2π

e −u 2/2,

(3.15) is satisfied for q = 2. For thesamereasons, for

G 4(u ) = 12

(3− u 2)G (u ) = 12√

2π(3− u 2)e −u 2/2, (3.18)

(seeFigure3.7), and

G 6(u ) =1

8(15− 10u 2 + u 4)G (u ) = 1

8√

2π(15− 10u 2 + u 4)e −u 2/2

(3.15) hold for q = 4 and q = 6, respectively.In turn, for rectanglekernel (3.2) denoted now as W (

•), u i W (u )du equals zero forodd i and 1/(i + 1) for even i . Thus for W 2(u ) = W (u ), (3.15) holdswith q = 2, while

for

W 4(u ) =1

4(9− 15u 2)W (u ) =

18

(9− 15u 2), for |u | ≤ 10, otherwise,

(3.19)


31/401

3.4 Convergence rate 19

-1 1

-1

1

Figure 3.8 Kernel W 4.

with q = 4. For q = 6, wefind

W 6(u ) = 564

(45− 210u 2 + 189u 4)W (u ) (3.20)

=

5

128

45− 210u 2 + 189u 4 , for |u | ≤ 1

0, otherwise.

Kernels W 4(u ) and W 6(u ) areshownin Figures 3.8 and 3.9, respectively.

There is a formal way of generating kernel functions satisfying Conditions (3.15)

and (3.16) for an arbitrary value of q . This technique relies on the theory of or-

thogonal polynomials that is examined in Chapter 6. In particular, if one wishes toobtain kernels defined on a compact interval then we can use a class of Legendre

orthogonal polynomials, see Section 6.3 for various properties of this class. Hence,

let {p (u ); 0 ≤ ≤ ∞} be a set of the orthonormal Legendre polynomials defined on[−1, 1], that is, 1−1 p (u )p j (u )du = δ j , δ j being the Kronecker delta function andp (u ) =

2+1

2 P (u ), where P (u ) is thethorder Legendrepolynomial.

The following lemma describes the procedure for generation of a kernel function of

order q withasupportdefined on[−1, 1].

L E M M A 3.1 A kernel function

K (u ) =q −1 j =0

p j (0)p j (u ), |u | ≤ 1 (3.21)

satisfies Condition (3.15).

-1 1

-0.5

1.5

Figure 3.9 Kernel W 6.


32/401


Proof. For i ≤ q − 1consider 1−1 u i K (u )du . Sinceu i canbeexpandedintotheLegen-dreseries, that is,u i =

i =0 a p (u ), wherea =

1

−1 u

i p (u )du thenfor K (u ) defined

in (3.21), wehave 1−1

u i K (u )d v =i

=0

q −1 j =0

a p j (0) 1

−1p (u )p j (u )du

=i

=0a p (0) = 0i =

1 if i = 00 if i = 1, 2, . . . , q − 1. .

Theproof of Lemma3.1 has been completed.

It is worth noting that P (0) = 0 for = 1, 3, 5, . . . and P (−u ) = P (u ) for =0, 2, 4, . . .. Consequently, thekernel in (3.21) is symmetric and all terms in (3.21) with

odd values of j areequal zero.

Since p 0(u ) =

12

and p 2(u ) =

52

32

u 2 − 12

, it is easy to verify that thekernel in

(3.21) withq = 4 is given by

K (u ) =

9

8− 15

8u 2

, |u | ≤ 1.

This confirms theformof thekernel W 4(v) given in (3.19).

Theresult of Lemma3.1 can beextended to alarger class of orthogonal polynomials

defined ontheset S , that is, when wehavethesystemof functions {p (u ); 0 ≤ ≤ ∞}defined on S , whichsatisfies

S

p (u )p j (u )w(u )du = δ j ,

where w(u ) is the weight function being positive on S and such that w(0) = 1. Thenformula(3.21) takes thefollowingmodified form:

K (u ) =q −1 j =0

p j (0)p j (u )w(u ). (3.22)

In particular, if w(u ) = e −u 2, −∞


33/401


34/401


1.5

0.5

-0.5

-1.5

-3 -2 -1 0 1 2 3

m(u)

n = 40

n = 80

n = 320

n = 1280

Figure 3.10 Realizationsof theestimate; a = 0.5, h n = n −2/5 (examplein Section3.6).

1

0.5

0

0 200 400 600 800 1000 1200

1.5

a

n

0

0.25

0.5

0.75

Figure 3.11 MISE versus n , variousa ; h n = n −2/5 (examplein Section3.6).

4

3

2

1

0

0 200 400 600 800 1000 1200

var(Z)

0

0.25

0.5

1

n

Figure 3.12 MISE versus n , variousvar(Z ); h n = n −2/5 (example in Section3.6).


35/401


4

3

2

1

0

0 1 2 3 4

n

10

20

40

80

160

320

640

1280

h n

Figure 3.13 MISE versus h n , variousn ; a = 0.0 (exampleinSection3.6).

4

3

2

1

0

0 1 2 3 4

n

10

20

40

80

160

320

640

1280

hn

Figure 3.14 MISE versus h n , variousn ; a = 0.25 (examplein Section3.6).

0 1 2 3 4

4

3

2

1

0

5

10

20

40

80

160

320

640

1280

hn

Figure 3.15 MISE versus h n , variousn ; a = 0.5 (exampleinSection3.6).


36/401


0 1 2 3 4

4

3

2

1

0

5

hn

20

40

80

160

320

640

1280

Figure 3.16 MISE versus h n , variousn ; a = 0.75 (examplein Section3.6).

3.7 Lemmas and proofs

3.7.1 Lemmas

In Lemma3.2, U has adensity, in Lemma3.3, thedistributionof U is arbitrary.

L E M M A 3.2 Let U have a probabil i ty density. Let Em (U ) = 0, var[m (U )] < ∞. Let the kernel K (•) satisfy (3.4), (3.5). I f (3.6) holds with ε = 0, then, for i = 0,

suph >0

covW p +i 1h K

u − U i h

, W p

1

h K

u − U 0

h

≤ (|λp λ p +i | + |λ p λp −i | + |λ p +i λp −i |)ω(u ),

where ω(u ) is finite at every continuity point u of both m (•) and f (•). I f ε > 0, the property holds at al most every u ∈ R.Proof. Weprovethecontinuousversionof thelemma. The“almosteverywhere” version

can beverified in asimilar way.

4

3

2

1

0

5

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

n

10 20 40

80 160 320

640 1280

Figure 3.17 MISE versus δ, h n = n −δ, variousn ; a = 0.5 (examplein Section3.6).


37/401


Since W p +i =p +i

q =−∞ λp +i −q m (U q ) and W p =p

r =−∞ λ p −r m (U r ), the covariancein theassertion equals

p +i q =−∞

p r =−∞

λ p +i −q λp −r cov

m (U q )1

h K

u − U i

h

, m (U r )

1

h K

u − U 0

h

.

Applying LemmaC.2, wefind that theaboveformulais equal to

(λp λ p +i + λp λp −i )1

h E

K

u − U

h n

1

h E

m 2(U )K

u − U

h

+λp

+i λp

−i

1

h 2

E 2m (U )K u − U

h .Let u beapoint whereboth m (•) and f (•) arecontinuous. It suffices toapply LemmasA.8and A.9to find that thefollowing formulas

suph >0

1h E K

u − U h

, suph >0

E

m (U )1h K

u − U h

,suph >0

E

m 2(U )

1

h K

u − U

h

,

arefinite.

In thenext lemma, U has an arbitrary distribution.

L E M M A 3.3 Let E m (U ) = 0and var[m (U )] < ∞. I f the kernel satisfies the restr icti ons of Theorem 3.3, then

limsuph →0

cov

W p +i K

u − U i

h

, W p K

u − U 0

h

E 2K

u − U h

≤ (|λp λp +i | + |λp λ p −i | + |λp +i λp −i |)θ (u ),

where some θ (u ) is finite at almost every (ζ ) u ∈ R, where ζ i s the distribution of U .

Proof. The proof is similar to that of Lemma 3.2. Lemma A.10, rather than Lemmas

A.8 and A.9, should beemployed.

3.7.2 Proofs

ProofofTheorem3.1For thesakeof theproof E m (U ) = 0, seeRemark 2.1. Observethat µ̂(u ) = ĝ (u )/ f̂ (u )with

ĝ (u ) = 1nh n

n i =1

Y p +i K

u − U i h n

(3.23)


38/401


and

ˆf (u ) =

1

nh n

n

i =1 K

u

−U i

h n . (3.24)Fix u ∈ R andsupposethat bothm (•) and f (•) arecontinuous at thepoint.

Wewill now show that

ĝ (u ) → g (u )

K (v)d v → 0 as n → ∞ in probability, (3.25)

where, werecall, g (u ) = µ(u ) f (u ). Since

E ĝ (u )

=1

h n E E Y p | U 0 K

u − U 0h n =

1

h n E µ(U )K

u − U h n ,

applying LemmaA.8, weconcludethat

E ĝ (u ) → g (u )

K (v)d v as n → ∞.

In turn, sinceY n = W n + Z n ,var[ĝ (u )] = P n (u ) + Q n (u ) + R n (u ),

where

P n (u ) = 1nh n

σ 2Z 1

h n E K 2

u − U h n

,

Q n (u ) =1

nh n

1

h n var

W p K

u − U 0

h n

,

and

R n (u ) =1

n 2h 2n

n i =1

n j =1 j =

i

cov

W p +i K

u − U i

h n

, W p + j K

u − U j

h n

= 2n 2h 2n

n i =1

(n − i ) cov

W p +i K u − U i

h n

, W p K

u − U 0h n

.

In view of LemmaA.8,

nh n P n (u ) → σ 2Z f (u )

K 2(v)d v as n → ∞.

Since

varW p K u − U 0h n = E

W 2p K

2

u − U 0

h n

− E 2

W p K

u − U 0

h n

= E

φ (U ) K 2

u − U

h

− E 2

µ(U )K

u − U

h

, (3.26)


39/401


whereφ(•) is as in (2.7), by LemmaA.8,

nh n Q n (u ) → φ (u ) f (u ) K 2(v)d v as n → ∞.Passingto R n (u ), weapply Lemma3.2to obtain

|R n (u )| ≤ 2ω(u ) 1

n 2

n i =1

(n − i )(|λp λ p +i | + |λ p λp −i | + |λp +i λp −i |)

≤ 6ω(u )(maxn

|λn |)1

n

∞i =1

|λi | = O

1

n

.

Finally,

nh n var[ĝ (u )] →

σ 2Z + φ(u )

f (u )

K 2(v)d v as n → ∞.

In this way, wehaveverified (3.25).

Using similar arguments, weshow that E f̂ (u ) → f (u ) K (v)d v as n → ∞ andnh n var[ f̂ (u )] → f (u )

K 2(v)d v as n → ∞, (3.27)

andthen weconcludethat f̂ (u ) → f (u ) K (v)d v → 0 as n → ∞ in probability. Theproof has been completed.

ProofofTheorem3.3

In general, the ideaof theproof is similar tothat of Theorem3.1. Some modifications,

however, arenecessary.

Recalling Remark 2.1, with no loss of generality, we assume that E m (U ) = 0 andbegin with theobservation that µ̂(u ) = ξ̂ (u )/η̂(u ), where

ξ̂ (u ) = 1n E K

u −U

h n

n i =1

Y p +i K u − U i h n

and

η̂(u ) = 1n E K

u −U

h n

n i =1

K

u − U i

h n

.

Obviously,

E ξ̂ (u ) =E Y 1K u − U 0

h n

E K

u − U

h n

= E µ(U )K u − U h n E K

u − U

h n

,which, by LemmaA.10, converges toµ(u ) as n → ∞ for almost every (ζ ) u ∈ R .


40/401


41/401


3.8 Bibliographic notes

The kernel regression estimate has been proposed independently by Nadaraya [215]and Watson [312] and was the subject of studies performed by Rosenblatt [257],

Collomb [55], Greblicki [105], Greblicki and Krzẏzak [121], Chu and Marron [51],

Fan [87], Müller and Song [212], J ones, Davies, and Parkand [172], and many oth-

ers. A comprehensive overview of various kernel methods is presented in Wand and

Jones [310]. At first, thedensity of U was assumed to exist. SinceStone[284], consis-

tency for any distribution has been examined. Later, distribution-free properties were

studied by Spiegelman and Sacks [282], Devroye and Wagner [73,74], Devroye [71],

Krzyżak and Pawlak [187, 188], Greblicki, Krzyżak, and Pawlak [122], Kozek and

Pawlak [179], among others. In particular, themonograph by Györfi, Kohler, Krzyżak,and Walk [140] examines the problem of a distribution-free theory of nonparametric

regression.

Thekernel regressionestimatehasbeenderivedinanatural way fromthekernel esti-

mate(3.24) of aprobabilitydensity functionintroduced by Parzen [226], generalizedto

multivariate cases by Cacoullos [37] and examined by a number of authors, see, for

example, Rosenblatt [256], Van Ryzin [297, 298], Deheuvels [65], Wahba [306],

Devroyeand Wagner [72], Devroyeand Györfi [68], and Csörgo and Mielniczuk [58].

Seealso Härdle[150], PrakasaRao [241], or Silverman [277] and papers cited therein.

In all mentioned works, however, the kernel estimate is of form (3.3) with p = 0,whileindependent observations (U i , Y i )s comefromamodel Y n = m (U n ) + Z n . In thecontext of theHammerstein system, it means that dynamicsis justmissing becausethe

linear subsystemis reduced toasimpledelay.

Thenonparametric kernel regressionestimatehasbeenappliedto recover thenonlin-

ear characteristic inaHammerstein systemby Greblicki and Pawlak[126]. InGreblicki

and Pawlak [129], the input signal has an arbitrary distribution. Not a state equation,

but aconvolutiontodescribethedynamic subsystem, has beenappliedinGreblicki and

Pawlak [127]. The kernel estimate has also been discussed in Krzyżak [182,183], as

well as Krzyżak and Partyka [185]. For very specific distributions of the input signal,thenonparametric kernel regressionestimatehas been studied by Lang[193].


42/401

4 Semirecursive kernel algorithms

This chapter is devoted to semirecursive kernel algorithms, modifications of those ex-

amined in Chapter 3. Their numerators and denominators can becalculated online. Weshow consistency and examine convergencerate. Theresultsfor all input densities and

all inputdistributionsareestablished.

4.1 Introduction

Weexaminethefollowingsemirecursivekernel estimates:

µ̃n (u ) =

n

i =1

1

h i Y p +i K u − U i h i n

i =1

1

h i K

u − U i

h i

(4.1)and

µ̄n (u ) =

n i =1

Y p +i K

u − U i h i

n

i =1K u − U i h i

, (4.2)

modifications of (3.3). To demonstrate recursiveness, we observe that µ̃n (u ) =g̃ n (u )/ f̃ n (u ), where

g̃ n (u ) =1

n

n i =1

Y p +i 1

h i K

u − U i

h i

and

f̃ n (u ) = 1n

n i =1

1h i

K u − U i h i

. Therefore,

g̃ n (u ) = g̃ n −1(u ) −1

n

g̃ n −1(u ) − Y p +n

1

h n K

u − U n

h n


43/401

4.2 Consistency and convergence rate 31

and

f̃ n

(u )=

f̃ n −

1(u )−

1

n f̃ n −1(u ) − 1h n K u − U n h n .For theother estimate, µ̄n (u ) = ḡ n (u )/ f̄ n (u ) with

ḡ n (u ) =1n

i =1 h i

n i =1

Y p +i K

u − U i h i

and

f̄ n (u ) =1

n i

=1 h i

n

i =1K

u − U i

h i

.

Both ḡ n (u ) and f̄ n (u ) can becalculated with thefollowingrecurrenceformulas:

ḡ n (u ) = ḡ n −1(u ) −h n n i =1 h i

ḡ n −1(u ) −

1

h n Y p +n K

u − U n

h n

and

f̄ n (u ) = ḡ n −1(u ) −h n n i =1 h i

f̄ n −1(u ) −

1

h n K

u − U n

h n

.

In both estimates, thestartingpoints

g̃ 1(u ) = ḡ 1(u ) =1

h 1Y p +1K

u − U 1h 1

and

f̃ 1(u ) = f̄ 1(u ) =1

h 1K

u − U 1

h 1

are thesame.

Thus, both estimates are semirecursive because their numerators and denominators

can becalculated recursively, but not they themselves.

4.2 Consistency and convergence rate

In Theorems 4.1 and 4.2, theinput signal has adensity; in Theorem4.3, itsdistribution

is arbitrary.

T HE OR E M 4.1 Let U have a density f (•) and let Em 2(U ) < ∞. Let the Borel measurable kernel K (•) satisfy (3.4), (3.5), and (3.6) with ε = 0. Let the sequence {h n } satisfy the following restri ctions:

h n → 0 as n → ∞, (4.3)

1

n 2

n i =1

1

h i → 0 as n → ∞. (4.4)


44/401


Then,

µ̃n (u )→

µ(u ) as n → ∞

in probabil ity. (4.5)

at every u ∈ R where both m (•) and f (•) are continuous and f (u ) > 0. I f, (3.6) holds for some ε > 0, then the convergence takes place at every Lebesgue point u ∈ R of both m (•) and f (•), such that f (u ) > 0; a forti ori , at almost every u belonging to support of f (•).

T HEOR EM 4.2 Let U have a density f (•) and let Em 2(U ) < ∞. Let the Borel measurable kernel K (•) satisfy (3.4), (3.5), and (3.6) with ε = 0. Let the sequence {h n } satisfy (4.3) and

∞n =1

h i = ∞. (4.6)

Then,

µ̄n (u ) → µ(u ) as n → ∞ in probabil ity. (4.7)at every u ∈ R where both m (•) and f (•) are continuous and f (u ) > 0. I f, (3.6) holds for some ε > 0, then the convergence takes place at every Lebesgue point u ∈ R of both m (•) and f (•), such that f (u ) > 0; a forti ori , at almost every u belonging to support of f (•).

Estimate(4.2)isconsistentnotonly forU havingadensitybutalsoforanydistribution.

In thenext theorem, thekernel is thesameas in Theorem3.3.

T HEOR EM 4.3 Let Em 2(U ) < ∞. Let the kernel K (•) satisfy therestri ctionsof Theorem 3.3. Let the sequence {h n } of positive numbers satisfy (4.3) and (4.6). Then, convergence (4.7) takes place at almost every (ζ ) point u ∈ R, where ζ is the probability measure of U .

Estimate (4.1) converges if the number sequence satisfies (4.3) and (4.4), while

(4.2) if (4.3) and (4.6) hold. Thus, for h n = cn −δ with c > 0, both converge if 0 <δ


45/401

4.2 Consistency and convergence rate 33

andfind

bias[ ˜f n (u )] = E

˜f n (u ) − f (u ) K (v)d v = 1n

n

i =1 ( f (u + vh i ) − f (u ))K (−v)d v.Applying(A.17), weobtain

bias[ f̃ n (u )] =1

n

n i =1

O

h q −1/2i

= O

1

n

n i =1

h q −1/2i

.

Recalling(4.11), wefind

E ( f̃ n (u )−

f (u ))2

= O

1

n 2 n

i =1 h q −1/2i 2

+ O 1

n 2

n

i =11

h i withthefirst termincurred by squared biasand theother by variance. Hence, for

h n ∼ n −1/2q , (4.8)that is, thesameas in (3.17) applied in theofflineestimate,

E ( f̂ (u ) − f (u ))2 = O (n −1+1/2q ),Sincethesamerateholds for ḡ n (u ), that is, E (ĝ (u ) − g (u ))2 = O (n −1+1/2q ), wefinally

obtainP {|µ̃(u ) − µ(u )| > ε|µ(u )|} = O (n −1+1/2q )

for any ε > 0, and

|µ̃(u ) − µ(u )| = O (n −1/2+1/4q ) as n → ∞ in probability.Considering estimate(4.2) next, for obvious reasons, wewrite

bias[ f̄ n (u )] =1

n i =1 h i n

i =1O (h

q +1/2i ) = O

n i =1 h

q +1/2i

n i =1 h i and, dueto(4.12),

E ( f̄ n (u ) − f (u ))2 = O

n

i =1 h q +1/2i

2n

i =1 h i 2

+ O 1n i =1 h i

,

which, for h n selected as in (4.8), becomes

E ( f̄ n (u ) − f (u ))2 = O (n −1+1/2q ).

Since E (ḡ n (u ) − g (u ))2 = O (n −1+1/2q ), wecometotheconclusionthatP {|µ̄n (u ) − µ(u )| > ε|µ(u )|} = O (n −1+1/2q )

for any ε > 0, and

|µ̄n (u ) − µ(u )| = O (n −1/2+1/4q ) as n → ∞ in probability.


46/401


If theq thderivatives of both f (u ) and g (u ) arebounded, using (A.18), weobtain

P {|

µ̃(u )−

µ(u )|

> ε|µ(u )

|} = O (n −1+1/(2q +1))

for any ε > 0, and

|µ̃(u ) − µ(u )| = O (n −1/2+1/(4q +2)) as n → ∞ in probability,

that is, somewhat faster convergence. Thesamerateholdsalsofor µ̄n (u ).

4.3 Simulation example

In the system as in Section 2.2.2, a = 0.5 and Z n = 0. Since µ(u ) = m (u ), we justestimate m (u ) and rewritethemin thefollowing forms:

m̃ n (u ) =

n i =1

1

h i Y 1+i K

u − U i

h i

n

i =1

1

h i K

u − U i

h i

, (4.9)and

m̄ n (u ) =

n i =1

Y 1+i K

u − U i h i

n

i =1K

u − U i

h i

. (4.10)

For the rectangular kernel and h n = n −1/5, the MISE for both estimates is showninFigure5.5 in Section 5.4. For h n = n −δ with δ varyingin theinterval [−0.25, 1.5], theerror is shownin Figures 4.1and 4.2.

4

3

2

1

0

5

-0.25 0 0.25 0.5 0.75 1 1.25 1.5

n

10 20 40

80 160 320

640 1280

δ

Figure 4.1 Estimate(4.9); MISE versus δ, variousn ; h n = n −δ (Section4.3).


47/401


4

3

2

1

0

5

-0.25 0 0.25 0.5 0.75 1 1.25 1.5

n

10 20 40

80 160 320

640 1280

δ

Figure 4.2 Estimate(4.10); MISE versus δ, variousn ; h n = n −δ (Section4.3).

4.4 Proofs and lemmas

4.4.1 Lemmas

ThesystemL E M M A 4.1 Let U have a probabili ty density f (•). Let Em (U ) = 0 and var[m (U )]< ∞. Let n = 0. Let kernel satisfy (3.4), (3.5). I f (3.6) holds with ε = 0, then,

suph >0,H >0

covW p +i 1h K

u − U i h

, W p

1

H K

u − U 0

H

≤ (|λp λp +i − j | + |λp λp −i + j | + |λp +i − j λ p −i + j |)ρ(u ),

where ρ(u ) is finite at every continuity point u of both m (•) and f (•). I f ε > 0, the property holds at almost every u ∈ R.

Proof. As W p +i = p +i q =−∞ λp +i −q m (U q ) and W p = p r =−∞ λp −r m (U r ), the covari-ancein theassertion equals (seeLemmaC.2)

p +i q =−∞

p r =−∞

λp +i −q λ p −r cov

m (U q )1

h K

u − U i

h

, m (U r )

1

H K

u − U 0

H

,

whichis equal to

= λp λp +i − j 1

h E K u − U h 1H E m 2(U )K u − U H + λ p λp −i + j

1

h E

K

u − U

h

1

h E

m 2(U )K

u − U

h

+ λ p +i − j λp −i + j

1

h E

m (U )K

u − U

h

1

H E

m (U )K

u − U

H

.


48/401


Let u beapointwherebothm (•) and f (•) arecontinuous. Itsufficestoapply LemmaA.8to find that thefollowing formulas

suph =0

1h E K u − U h ,suph =0

E

m (U )1h K

u − U h

,suph =0

E

m 2(U )1h K

u − U h

,are finite at every continuity point of both m (•) and f (•). The “almost everywhere”versionof thelemmacan beverified in asimilar way.

In thenext lemma, U has an arbitrary distribution.

L E M M A 4.2 Let E m (U ) = 0and var[m (U )] < ∞. I f the kernel satisfies the restr icti ons of Theorem 3.3, then

suph >0,H >0

cov

W n +p K

u − U n

h

, W p K

u − U 0

H

E K

u − U

h E K

u − U

H

≤ (|λp λp +i − j | + |λ p λp −i + j | + |λp +i − j λp −i + j |)η(u ),

where η(u ) is finite at almost every (ζ ) u ∈ R, where ζ i s the distribution of U .Proof. It suffices toapply argumentsused in theproof of Lemma3.3.

Number sequencesL E M M A 4.3 If (4.3) and (4.4) hold, then

limn →∞

1

n 1

n 2

n i =1

1

h i

= 0.

Proof. From

n 2 =

n i =1

h 1/2i

1

h 1/2i

2≤

n i =1

h i

n i =1

1

h i

it followsthat

1n

1

n 2

n i =1

1

h i

≤ 1n

n i =1

h i ,

whichconverges tozero as n → ∞.


49/401


L E M M A 4.4( TOEPLITZ) I f n

i =1 a n → ∞ and x n → x as n → ∞, then

n i =1

a n x n

n i =1

a n

→ x as n → ∞.

Proof. The proof is immediate. For any ε > 0, there exists N such that |x n | < ε forn > N . Hence,

n i =1 a n x n n i =1 a n

− x = N i =1 a n (x n − x )n i =1 a n

+ n i =N +1 a n (x n − x )n i =1 a n

,

wherethefirsttermisboundedinabsolutevalueby c /n

i =1 a n forsomec , andtheotherby ε.

4.4.2 Proofs

Proof of Theorem4.1Wegivethecontinuousversionof theproof. Toverify the“almost everywhere” version,

it suffices toapply LemmaA.9 rather than LemmaA.8.Supposethat bothm (•) and f (•) arecontinuous at u ∈ R . Westart fromtheobserva-tion that

E g̃ n (u ) =1

n

n i =1

1

h i E

E

Y p | U 0

K

u − U 0

h i

= 1n

n i =1

1

h i E

µ(U )K

u − U

h i

.

Since

1

h i E

µ(U )K

u − U

h i

→ g (u )

K (v)d v as i → ∞,

(see Lemma A.8) we conclude that E g̃ n (u ) → g (u )

K (v)d v as n → ∞, where,according toour notation, g (u ) = µ(u ) f (u ).

To examinevariance, wewritevar[g̃ n (u )] = P n (u ) + Q n (u ) + R n (u ) with

P n (u ) = σ 2Z 1n 2n

i =1

1h 2i

varK u − U h i

,

Q n (u ) =1

n 2

n i =1

var

W p

1

h i K

u − U 0

h i

,


50/401


and

R n (u ) =1

n 2

n

i =1

n

j =1 j =i

covW p +i 1h i K u − U i h i , W p + j 1h j K u − U j h j = 1

n 2

n i =1

n j =1 j =i

cov

W p +i − j

1

h i K

u − U i − j

h i

, W p

1

h j K

u − U 0

h j

.

Since

P n (u ) = σ 2Z 1

n 2

n

i =11

h i 1

h i E K 2

u − U

h i − h i

1

h 2i E 2K

u − U

h i ,

usingLemmaA.8, wefindthequantity insquarebracketsconvergesto f (u )

K 2(v)d v

asi → ∞.Noticingthat∞n =1 h −1n = ∞ andapplyingToeplitzLemma4.4,weconcludethat

11

n 2

n i =1

1h i

P n (u ) → σ 2Z f (u )

K 2(v)d v as n → ∞.

For thesamereasons, observing

Q n (u ) =1

n 2

n i =1

1

h i 1h i E φ(U )K 2u − U h i − h i 1h 2i E 2K u − U h i ,

whereφ(•) is as in (2.7), weobtain1

1n 2

n i =1

1h i

Q n (u ) → φ(u ) f (u )

K 2(v)d v as n → ∞.

Moreover, usingLemma4.1,

|R n (u )

| ≤1

n 2ρ(u )

n

i =1n

j =1(|λ p λp +i − j | + |λp λp −i + j | + |λp +i − j λp −i + j |)≤ 1

n ρ(u )(max

n |λn |)

∞n =1

|λn | = O

1

n

.

Using Lemma 4.3, we conclude that R n (u ) vanishes faster than both P n (u ) and Q n (u )

andthen weobtain

11

n 2 n i =1

1h i

var[g̃ n (u )] → (σ 2Z + φ(u ) f (u ))

K 2(v)d v as n → ∞. (4.11)

For similar reasons, E f̃ n (u ) → f (u ) K (v)d v as n → ∞, and1

1n 2

n i =1

1h i

var[ f̃ n (u )] → f (u )

K 2(v)d v as n → ∞,

whichcompletes theproof.


51/401


Proof of Theorem4.2Supposethat bothm (•) and f (•) arecontinuous at apoint u ∈ R . Evidently,

E ḡ n (u ) = 1n i =1 h i

n i =1

h i 1

h i E

E

Y p | U 0 K u − U 0h i

= 1n

i =1 h i

n i =1

h i 1

h i E

µ(U )K

u − U

h i

.

Since(4.6) holds and

1

h i E

µ(U )K

u − U

h i

→ g (u )

K (v)d v as n → ∞,

(seeLemmaA.8) an applicationof Toeplitz lemma4.4 gives

E ḡ n (u ) → g (u )

K (v)d v as n → ∞.

To examinevariance, wewritevar[ḡ n (u )] = P n (u ) + Q n (u ) + R n (u ), where

P n (u ) = σ 2Z 1n

i =1 h i 2 n

i =1var

K

u − U

h i

,

Q n (u ) =1n

i =1 h i 2

n

i =1 varW p K

u

−U 0

h i ,and

R n (u ) =1n

i =1 h i 2 n

i =1

n j =1 j =i

cov

W p +i K

u − U i

h i

, W p + j K

u − U j

h j

= 1

n i

=1 h i

2

n

i =1n

j =1 j =i cov

W p +i − j K

u − U i − j

h i

, W p K

u − U 0

h j

.

Since

P n (u ) = σ 2Z 1n

i =1 h i P 1n (u )

with

P 1n (u ) =1

n i =1 h i

n i =1

h i

1

h i E K 2

u − U

h i

− h i

1

h 2i E 2K

u − U

h i

converging, dueto(4.6) and Toeplitz lemma4.4, tothesamelimit as1

h n E K 2

u − U

h n

− h n

1

h 2n E 2K

u − U

h n

,

weget P n (u )n

i =1h i → σ 2Z f (u )

K 2(v)d v as n → ∞.


52/401


For thesamereasons, observing

Q n (u ) = 1n i =1 h i

n i =1

h i 1h i

E φ(U )K 2u − U h i

− h i 1h 2i

E 2K u − U h i

,whereφ(•) is as in (2.7), weobtain Q n (u )

n i =1

h i → φ(u ) f (u )

K 2(v)d v as n → ∞.Applying Lemma4.1, weget

R n (u )n

i =1h i ≤ ρ(u )

1

n i

=1 h i

n

i =1h i

n

j =1h j (|λ p λp +i − j | + |λp λ p −i + j |

+|λp +i − j λp −i + j |) ≤ 3ρ(u )(maxn

h n )(maxn

|λn |) 1n

i =1 h i

n i =1

h i αi ,

where αi =∞

j =i −p |λi |. Since limi →∞ αi = 0, applying Toeplitz lemma 4.4, we getlimn →∞ R n (u )

n i =1 h i = 0, which means that R n (u ) vanishes faster than both P n (u )

and Q n (u ). Finally,

var[ḡ n (u )]n

i =1 h i → σ 2Z + φ(u ) f (u ) K 2(v)d v as n → ∞. (4.12)Since, for the same reasons, E f̄ n (u ) → f (u )

K (v)d v as n → ∞ and

var[ f̄ n (u )]n

i =1 h i → f (u )

K 2(v)d v as n → ∞, theproof has been completed.

Proof of Theorem4.3Eachconvergencein theproof holds for almost every (ζ ) u ∈ R . In apreparatory step,weshow that

∞n =

[Włodzimierz Greblicki; M Pawlak] Nonparametric System Identification

Documents