Structural Damage Classification Hoon Sohn Using Extreme ...ssslab.kaist.ac.kr/article/pdf/2005_Structural.pdf@DOI: 10.1115/1.1849240# Keywords: Extreme Value Statistics, Novelty Detection,

Hoon Sohn

David W. Allen

Engineering Sciences and Applications Division,Weapons Response Group,

Los Alamos National Laboratory,Los Alamos, NM 87545, USA

Keith WordenDynamics Research Group,

Department of Mechanical Engineering,University of Sheffield,

Sheffield, UK

Charles R. FarrarEngineering Sciences and Applications Division,

Weapons Response Group,Los Alamos National Laboratory,

Los Alamos, NM 87545, USA

Structural Damage ClassificationUsing Extreme Value StatisticsThe first and most important objective of any damage identification algorithm is to as-certain with confidence if damage is present or not. Many methods have been proposedfor damage detection based on ideas of novelty detection founded in pattern recognitionand multivariate statistics. The philosophy of novelty detection is simple. Features arefirst extracted from a baseline system to be monitored, and subsequent data are thencompared to see if the new features are outliers, which significantly depart from the restof population. In damage diagnosis problems, the assumption is that outliers are gener-ated from a damaged condition of the monitored system. This damage classification ne-cessitates the establishment of a decision boundary. Choosing this threshold value is oftenbased on the assumption that the parent distribution of data is Gaussian in nature. Whilethe problem of novelty detection focuses attention on the outlier or extreme values of thedata, i.e., those points in the tails of the distribution, the threshold selection using thenormality assumption weights the central population of data. Therefore, this normalityassumption might impose potentially misleading behavior on damage classification, andis likely to lead the damage diagnosis astray. In this paper, extreme value statistics isintegrated with the novelty detection to specifically model the tails of the distribution ofinterest. Finally, the proposed technique is demonstrated on simulated numerical data andtime series data measured from an eight degree-of-freedom spring-mass system.@DOI: 10.1115/1.1849240#

Keywords: Extreme Value Statistics, Novelty Detection, Damage Detection, Time SeriesAnalysis, Vibration Test

1 IntroductionThis paper is concerned withnovelty detectionin an unsuper-

vised learning mode, which is the first level of damage identifica-tion. When applied to structural health monitoring, unsupervisedlearning means that data from the damaged condition are notavailable to aid in the damage detection process. The objective ofunsupervised novelty detection is to establish a model of the sys-tem or structure’s normal condition and thereafter to signal sig-nificant departures from this normal condition. In many ways, thetechnology of novelty detection encompasses traditional conditionmonitoring. However, the new term is a convenient means of rec-ognizing the significant inputs to the field from multivariate sta-tistics and pattern recognition that have recently occurred.

The first objective of novelty detection is to establish a modelof the normal system condition based on the damage-sensitivefeatures extracted from measured system response data. This ob-jective can be accomplished in several ways. The more directmethods seek to model the probability distribution of the normalcondition using a priori training data. One of the simplest, theoutlier approach@1#, assumes a Gaussian distribution for the dam-age sensitive features and parametrizes the model distribution us-ing the estimated mean vector and covariance matrix. More so-phisticated approaches use Gaussian mixture models@2,3# orkernel density estimates@4#. The main limitation of all of thesemethods is that they make unwarranted assumptions about thenature of the feature distribution tails. These assumptions are po-tentially hazardous, as the extreme events that reside in the tails ofthe normal condition are likely to be misinterpreted. More specifi-cally, novelty detection constructs a model based entirely oncen-tral statistics ~the mean vector and covariance matrix! and theanalysis is largely insensitive to the structure of the tails. Anotherway of regarding this problem is as a question of setting an ap-

propriate threshold for novelty. If the true distribution of the struc-tural normal condition is heavy tailed, there are likely to be manyfalse positives, indicating damage when the structure is in realityundamaged.

The major problems with modeling the undamaged condition ofa system are that the functional form of the distribution is un-known and there are an infinite number of candidate distributionsthat may be appropriate for the prediction applications. Further-more, in some cases, only extreme values of events may be re-corded due to sensor or storage limitations. Therefore, modelingthe data as a parent distribution could also bring about erroneousresults. For example, seismic stations are primarily interested inrecordingstrong ground motion, motion beyond certain magni-tude with sufficient strength to affect people and their environ-ment @5#. In addition, the measurements of peak strains or accel-erations are enough to monitor the base isolation systems ofbuildings and bridges@6#. Currently, a choice among the infinitedistributions is made by a knowledgeable operator and then esti-mate parameters are based on training data. This process is largelysubjective. Any choice of distribution and parameters will alsoconstrain the behavior of the tails to that prescribed distribution.

In fact, there is a large body of statistical theory that is explic-itly concerned with modeling the tails of distributions, and thesestatistical procedures can be applied to the problem of noveltydetection. The relevant field is referred to asextreme value statis-tics (EVS), a branch oforder statistics. There are many excellenttextbooks and monographs in this field. Some are considered clas-sics @7,8#, and others are more recent@9–11#. Castillo @12# isnotable in its concern with engineering problems in fields likemeteorology, hydrology, ocean engineering, pollution studies,strength of materials, etc. Although EVS has been widely applied,there has been little application of these techniques to noveltydetection. Roberts@2,3# introduced the ideas of EVS into noveltydetection in the biosignal processing context. This report illus-trates the use of EVS in their own right and not as another way oflooking at Gaussian distributions in an effort to avoid such as-sumptions.

Contributed by the Dynamic Systems, Measurement, and Control Division of THEAMERICAN SOCIETY OF MECHANICAL ENGINEERS for publication in the ASMEJOURNAL OF DYNAMIC SYSTEMS, MEASUREMENT, AND CONTROL. Manuscriptfinal revision, December 26, 2003. Review conducted by: F. Ghorbel.

Journal of Dynamic Systems, Measurement, and Control MARCH 2005, Vol. 127 Õ 125Copyright © 2005 by ASME

Downloaded 03 May 2011 to 143.248.122.162. Redistribution subject to ASME license or copyright; see http://www.asme.org/terms/Terms_Use.cfm

The layout of this paper is as follows: Section 2 provides a briefdescription of theories used in this paper. Section 2.1 presents afeature extraction procedure based on time series analysis, Sec.2.2 addresses data normalization issue, and Sec. 2.3 describes thenovelty index. Section 2.4 provides an introduction to EVS, fol-lowed by Sec. 2.5, which describes parameter estimation tech-niques for fitting EVS distributions to extreme value data. Section3 shows a comparison between thresholds of the novelty indexcalculated using Gaussian assumption with those calculated usingEVS for three different distributions. Section 4 explores the inte-gration of EVS into damage detection for an 8 degree-of-freedom~DOF! spring-mass system. Section 5 finishes the paper with thesummary and conclusions of the work.

2 Theories

2.1 Time Series Analysis. A linear prediction model com-bining auto-regressive~AR! and auto-regressive with exogenousinputs ~ARX! models is employed to compute input parametersfor the subsequent analysis of an auto-associative neural networkpresented in Sec. 2.2. First, all time signals are standardized priorto fitting an AR model such that

x5x2mx

sx, (1)

where x is the standardized signal, andmx and sx are the meanand standard deviation ofx, respectively. This standardization pro-cedure is applied to all signals employed in this study.~However,for simplicity, x is used to denotex hereafter.!

For a given time signalx(t), an AR model with r auto-regressive terms is constructed. An AR(r ) model can be written as@13#

x~ t !5(j 51

r

fx jx~ t2 j !1ex~ t ! (2)

The AR order is set to be 30 for the experimental study presentedin Sec. 4 based on a partial autocorrelation analysis described byBox et al. @13#. For the construction of a two-stage predictionmodel proposed in this study, it is assumed that the error betweenthe measurement and the prediction obtained by the AR model@ex(t) in Eq. ~2!# is mainly caused by the unknown external input.Based on this assumption, an ARX model is employed to recon-struct the input/output relationship betweenex(t) andx(t);

x~ t !5(i 51

p

a ix~ t2 i !1(j 51

q

b jex~ t2 j !1«x~ t ! (3)

where«x(t) is the residual error after fitting the ARX(p,q) modelto theex(t) andx(t) pair. The feature for damage diagnosis willlater be related to this quantity«x(t). Note that this AR-ARXmodeling is similar to a linear approximation method of an Auto-Regressive Moving-Average~ARMA ! model presented in Ref.@14# and references therein. Ljung@14# suggests keeping the sumof p andq smaller thanr (p1q<r ). Although thep andq valuesof the ARX model are set rather arbitrarily, similar results areobtained for different combinations ofp andq values as long asthe sum ofp andq is kept smaller thanr. The a i andb j coeffi-cients of the ARX model are used as input parameters for thefollowing analysis of the auto-associative neural network.ARX~5,5! is used for this specific experimental study. A moredetailed discussion of AR-ARX modeling can be found in Ref.@15#.

2.2 Data Normalization. In the previous section, a timeprediction model called an AR-ARX model is developed to ex-tract damage-sensitive features. Then, a nonlinear principal com-ponent analysis~NLPCA! is employed here for data normaliza-tion, which separates the effect of damage on the extractedfeatures from those caused by the environmental and vibration

variations of the system. In reality, structures are subject to chang-ing environmental and operational conditions that affect measuredsignals, and environmental and operational variations of the sys-tem can often mask subtle changes in the system’s vibration signalcaused by damage@16#.

A conventional principal component analysis~PCA! has beenproven to facilitate many types of multivariate data analysis in-cluding data reduction and visualization, data validation, fault de-tection, and correlation analysis@17#. Similar to PCA, NLPCA isused as an aid to multivariate data analysis. While PCA is re-stricted on mapping only linear correlations among variables,NLPCA can reveal the nonlinear correlations presented in data. Ifnonlinear correlations exist among variables in the original data,NLPCA can reproduce the original data with greater accuracyand/or with fewer factors than PCA. This NLPCA can be realizedby training a feedforward neural network to perform the identitymapping, where the network outputs are simply the reproductionof network inputs. For this reason, this special kind of neuralnetwork is named as anauto-associative neural network~Fig. 1!.The network consists of an internal ‘‘bottleneck’’ layer, two addi-tional hidden layers, and one output layer. The bottleneck layercontains fewer nodes than input or output layers, forcing the net-work to develop a compact representation of the input data. Moredetailed discussions on PCA, NLPCA, and auto-associative net-works can be found from Fukunaga@18#, Kramer@19#, Rumelhartand McClelland@20#, respectively.

Using the previously extracted features, which are the param-eters of the AR-ARX model corresponding to the normal condi-tions, as inputs, the auto-associative neural network is trained tocharacterize the underlying dependency of the extracted featureson the unmeasured environmental and operational variations bytreating these environmental and operational conditions as hiddenintrinsic variables in the neural network. When a new time signalis recorded from an unknown state of the system, the parametersof the time prediction model are computed for the new data setand are fed to the trained neural network. When the structureundergoes structural degradation, it is expected that the predictionerrors of the neural network will increase for the damage case.Based on this premise, a damage classifier is constructed usingnovel detection described in the following section to identify dam-age.

2.3 Novelty Detection. The objective of the present noveltydetection is to eschew the physics-based model approaches suchas finite element analysis, and therefore pave the way for signal-based techniques applicable to systems of arbitrary complexity.However, the present novelty detection provides an indicationonly about the presence of damage in a system of interest. Thismethod does not give information about the location and extent of

Fig. 1 A schematic presentation of an auto-associative neuralnetwork

126 Õ Vol. 127, MARCH 2005 Transactions of the ASME


the damage. That is, the novelty detection only identifies if a newpattern differs from previously obtained patterns in some signifi-cant respect. Although the damage assessment problem can beposed with several levels of complexity, the detection of damagepresence is arguably the most important step. Once the existenceof damage is confirmed, the system can be taken out of serviceand subjected to detailed inspection to locate and quantify dam-age. The concept of novelty detection is not entirely new andapplications in other fields can be found in the literature@1,21,22#.

For the current specific application of our interest, the auto-associative neural network will be trained using features extractedfrom the healthy baseline system and the threshold value for thenovelty index will be established accordingly. When damage oc-curs in the system, the damage will alter the dynamic character-istics of the system and consequently the novelty indicator willsignal fault. One of the biggest challenges here is to identify sig-nificant system changes such as structural damage and degrada-tion that cannot be attributed to natural fluctuations in the systemresponses caused by changing environmental and operation varia-tions. As described above, the auto-associative neural network isforced to learn the underlying dependency of the extracted fea-tures on these natural variations. Therefore, when the auto-associative network is fed with the inputs obtained from an un-precedented state of the system, for example, a damage state ofthe system, the novelty index (NI), which is defined as the Eu-clidean distance between the target outputs and the outputs of theneural network, will increases@23#;

NI~y!5iy2 yi (4)

where y and y are the input and output vectors of the auto-associate neural network shown Fig. 1. If the learning has beensuccessful,y'y andNI(y)'0 for all data in the training data set.However, if y were acquired after damage is introduced to thesystem,NI(y) would noticeably depart from zero, providing anindication of an abnormal condition of the system.

The novelty index can also be defined using the Mahalanobisdistance measure between the target outputs and the network out-puts @24#;

NI~y!5A~y2 y!TS21~y2 y! (5)

whereS is the sample covariance matrix of the training data. Thiscovariance matrix can be calculated with or without the potentialoutlier in the sample, depending upon whether inclusive or exclu-sive measures are preferred@25#. In this study, the first definitionof the novelty index is employed.

2.4 Extreme Value Statistics. The Gaussian distributionoccupies its central place in statistics for a number of reasons; notleast is the central limit theorem@26#. The central limit theoremstates that if$X1 ,X2 , . . . ,Xn% is a set of random variables witharbitrary distributions, the sum variableXS5X11X21¯1Xnwill have a Gaussian distribution asn→`. Although this theory isarguably the most important limiting theorem in statistics, it is notthe only one. If the problem at hand is concerned with the tails ofdistributions, there is another theorem that is more appropriate.

Suppose that one is given a vector of samples$X1 ,X2 , . . . ,Xn%from an arbitraryparent distribution. The most relevant statisticfor studying the tails of the parent distribution is the maximumoperator, max($X1,X2, . . . ,Xn%), which selects the point of maxi-mum value from the sample vector. Note that this statistic is rel-evant for the right tail of a univariate distribution only. For the lefttail, the minimum should be used. The pivotal theorem of EVSstates that in the limit as the number of vector samples tends toinfinity, the induced distribution on the maxima of the samplescan only take one of three forms: Gumbel, Weibull, or Frechet@27#.

Frechet: F~x!5H expF2S d

x2l D bG if x>l

0 otherwise

(6)

Weibull: F~x!5H 1 if x>l

expF2S l2x

d D bG otherwise(7)

Gumbel: F~x!5expF2expS 2x2l

d D G2`,x,`

and d.0 (8)

In a similar fashion, there are only three types of distribution forthe minima of the samples:

Frechet: F~x!5H 12expF2S d

l2xD bG if x<l

1 otherwise

(9)

Weibull: F~x!5H 0 x<l

12expF2S x2l

d D bG x.l(10)

Gumbel: F~x!512expF2expS x2l

d D G2`,x,`

and d.0 (11)

where l, a, and b are the model parameters, which should beestimated from the data.

Now given samples of maximum or minimum data from a num-ber of n-point populations, it is possible to select an appropriatelimit distribution and fit a parametric model to the data. It is alsopossible to fit a model to portions of the parent distribution’s tails,as the distribution of the tails is equivalent to the appropriateextreme value distribution. Once the parametric model is ob-tained, it can be used to compute an effective threshold for nov-elty based on the true statistics of the data as opposed to statisticsbased on a blanket assumption of a Gaussian distribution.

2.5 Parameter Estimation of Extreme Value Distributions.Having established the appropriate limit distribution, the nextstage in the analysis is to estimate the parameters of the chosendistribution. The actual parameter estimation technique employedin this study only fits parameters to one canonical model form: theGumbel distribution for minima. Therefore, if the data are distrib-uted as maxima, the transformationsx→2x and l→2l carryeach maximum CDF into the corresponding minimum CDF atleast as far as optimization is concerned.

Suppose the data have the Weibull distribution for minima.Then the transformationY5 ln(X2l) carries the Weibull distribu-tion X into the Gumbel distributionY with the following relationsbetween the parameters;

lG5 ln~dW! and dG51

bW(12)

where the subscriptsG andW denote Gumbel and Weibull distri-butions, respectively. This transformation requires an a priori es-timate oflW , but this transformation can be obtained by optimiz-ing the linearity of the empirical CDF plot in Weibull coordinates.

If the data have the Frechet distribution for minima, the trans-formation Y52 log(l2X) carries the Frechet distributionX intothe Gumbel distributionY, with the following relations betweenthe parameters:

lG52 ln~dF! and dG51

bF(13)

Journal of Dynamic Systems, Measurement, and Control MARCH 2005, Vol. 127 Õ 127


where the subscriptF denotes a Frechet distribution. Again theprior estimation oflF is required and this estimate can be ob-tained by maximizing the linearity of the empirical CDF plot inFrechet coordinates.

After transforming either the Weibull or Frechet distribution tothe Gumbel distribution, the parameter estimation problem is re-duced to fitting the data to the limit distribution of the form in Eq.~11!. The optimization estimates the parametersl and d, whichminimize some error criterion. Note that because all distributiontypes are now transformed to a Gumbel distribution, the subscriptG for the Gumbel distribution is omitted hereafter. The moststraightforward error criterion is the weighted least-squaresmethod, which seeks to minimize the following objective functionG:

G5(i 51

q

wi@pi2L3,0~xi ;l,d!#2 (14)

where the training data are the points on the empirical CDF$(xi ,pi),i 51, . . . ,q% and pi ’s are an appropriate choice of plot-ting positions.wi ’s are a set of weights, and there are variouspossibilities for choosingwi values.

3 Numerical AnalysisSimulated random signals from three different distributions are

used to demonstrate the usefulness of EVS in accurately modelingthe tails without any assumptions of the parent distribution. Ineach example, a 99% confidence interval for each distribution iscomputed based on the following three methods:

1. The assumed true parent distribution2. A best-fit normal distribution where the sample mean and

standard deviation are estimated from the random data gen-erated from the assumed parent distribution

3. An extreme value distribution, the parameters of which areestimated from either the top or bottom fraction of the simu-lated random data

Hereafter, the confidence interval estimation methods based on theabove three distributions are referred to as method 1, method 2,and method 3, respectively.

Setting a confidence interval on the parent distribution usingeither method 1 or 2 is fairly trivial. The lower and upper limits ofthe confidence interval are constructed based on the probability ofa type 1 error that one intends to tolerate. When the probability ofthe type I error is specified to bea ~0<a<1!, 1003~12a!% theof data from a normal condition should be encompassed withinthe confidence interval. In other words, 1003a% of the data willbe outliers. Accordingly, the lower and upper limits of the confi-dence interval can be set atF21(a/2) andF21(12a/2), respec-tively. HereF21(x) is the inverse CDF of the known parent dis-tribution. These threshold limits correspond to a 1003~12a!%confidence interval. For instance, when the type I error is set 0.1,this type I error corresponds to a 90% confidence interval. Inaddition, the lower and upper limits are set so that 5% and 95% ofthe normal data are below each of these threshold values~90% arewithin the two bounds!. Because the true CDF of the parent dis-tribution is unknown in method 2, the CDF of the best-fit normaldistribution is used instead of the true CDF to compute the lowerand upper limits.

When method 3 is applied to compute the threshold values,cautions much be taken in selecting the probability of type I errorfor the distribution of the extreme values. For instance, let usassume that 10,000 sample points are generated from a parentdistribution and the type I error is set to 1%~a50.01!. Then, bythe definition of the type I error, it is expected that there will beabout 1% or 100 outliers out of 10,000 samples. If either themaximum or minimum value is extracted from a moving windowof 10 samples (n510), 1000 extreme values will be obtainedfrom the original 10,000 samples. In other words, 10% of the

original data will be used to fit the extreme value distribution. Inthe next step, the type I error of the extreme value distributionshould be set so that this type I error produces the same number ofoutliers as the type I error of the parent distribution does. Toaccomplish this, the type I error of the extreme value distributionshould be set to 10%~or a3n) in order to produce 100 outliersout of 1000 extreme values~or out of 10,000 original samples!.That is, the lower and upper limits of the confidence interval canbe set atF21(n3a/2) andF21(12n3a/2), respectively.

For the computation of the lower limit atF21(n3a/2), theGumbel distribution for minima is used to approximate the CDFfunction. For the given cumulative probability value atn3a/2,Eq. ~11! becomes@28#

n3a

2512expF2expS x2l

d D G (15)

By solving Eq. ~15! with respect tox, the lower limit xm atF21(n3a/2) is obtained:

Lower limit: xm5l1d lnS 2 lnS 12n3a

2 D D (16)

The upper limitxM at F21(12n3a/2) is obtained from Eq.~8!in a similar fashion:

Upper limit: xM5l2d lnS 2 lnS 12n3a

2 D D (17)

Note that thel andd values in Eqs.~15! and~16! are obtained byfitting the maxima values to the Gumbel distribution for minima,and thel and d values in Eq.~17! correspond to the Gumbeldistribution for maxima.

Three distributions are chosen to investigate the number offalse positives, or type I errors, produced by each of the threemethods discussed previously. The normal, lognormal, andgamma distributions are modeled using the three methods and thenumber of outliers is compared for a 99% confidence interval~Table 1!. The normal distribution will provide a sanity check tomake sure that the establishment of the confidence intervals basedon EVS and best-fit normal distribution produce similar thresh-olds. The lognormal and the gamma distributions are both skewedand will provide an opportunity to dramatically illustrate theshortcomings of the confidence interval estimation based on anormal assumption of the data. The probability density functions~PDFs! for each of the three distributions are as follows:

Gaussian: f ~xum,s!51

sA2pe2~x2m!2/2s2

(18)

Lognormal: f ~xum,s!51

xsA2pe2~ ln~x!2m!2/2s2

for x>0

(19)

Gamma: f ~xua,b!51

baG~a!xa21e2x/b for x>0 (20)

whereG(a) is the gamma function.

Table 1 Estimation of 99% confidence intervals for the 10,000data points generated from a Gaussian parent distribution

Estimationmethod

Upperconfidence

limit

Lowerconfidence

limit

No. of outliers out of10,000 samples

~a50.01!

Method 1~Exact! 2.548 22.548 100Method 2~Normal! 2.551 22.545 91Method 3~Gumbel! 2.549 22.482 99



Castillo @12# shows that both the minimum and the maximumfor the normal and lognormal distributions can be modeled with aGumbel distribution, thereby reducing the effort of finding thebest-fit distribution in this example. On the other hand, the gammadistribution has Gumbel distributed maxima and Weibull distrib-uted minima. Distributions of varying sample size fromN51000 toN5106 were created and analyzed. The typical analysisresults for only the sample size ofN510,000 are presented in thisstudy. Similar results are, however, observed for the other exam-ined sample sizes. Tables 2–4 summarize the results of the param-eter estimation and number of outliers for 10,000 sets of data fromeach of the three distributions. Only the first 1,000 data points areplotted for illustrative purposes in Figs. 2–4.

Looking at the normally distributed data in Fig. 2, it seems thatthe thresholds obtained from methods 1–3 are comparable. Formethod 3, theleast-squares return period relative error~LSR-PRE! estimation technique@12# is used to compute parameters ofthe Gumbel distributions for the maxima and minima of the nor-mally distributed data. Initially several techniques of parameterestimation suggested by Castillo@12# were investigated and theLSRPRE turned out to produce the best-it result for the given datasets. Table 1 shows the upper and lower confidence limits com-puted from methods 1–3, and the associated numbers of outliers.As can be seen in Fig. 2 and Table 1, even though method 3returns thresholds that are slightly different from the known PDF,the number of outliers is closer to the expected 1% than method 2.

In the second numerical example, the parent distribution is log-normal instead of normal. For this simulation,m51.0 ands50.5are assumed for the parameter values in Eq.~19!. The associatedlognormal density function is displayed on the left side of Fig. 3.The skewness and kurtosis of this distribution are 1.74 and 8.45,respectively. Note that, for all normal distributions, the values ofthe skewness and kurtosis should be 0.0 and 3.0, respectively@29#.Therefore, the departure of the skewness and kurtosis values from0.0 and 3.0 indicates the non-Gaussian nature of the data. Figure 3and Table 2 display similar analysis results for the lognormal par-ent distribution. Again the LSRPRE estimation technique is em-ployed for the maxima of the lognormal data. The minima, how-ever, are fitted using theleast-squares probability absolute errormethod@12#. For the lognormal example, method 3 only returns 3more false-positive indications than the expected 100 outliers ascalculated from method 1. Method 2, however, shows over doublethe number of false-positive indications due to the upper thresholdbeing far too low. On the other hand, the lower limit based onnormality completely misses all of the minimum values becausethe lognormal distribution contains only positive data points.

Finally, the sequential tests are applied to data sets simulated

Fig. 2 The exact 99% confidence interval of a normal parent distribution com-pared with that from extreme values statistic. This figure shows the first 1000data points from a 10,000 data point set.

Table 2 Estimation of 99% confidence intervals for the 10,000data points generated from a lognormal parent distribution

Estimationmethod

Upperconfidence

limit

Lowerconfidence

limit


~a50.01!


Table 3 Estimation of 99% confidence intervals for the 10,000data points generated from a gamma parent distribution

Estimationmethod

Upperconfidence

limit

Lowerconfidence

limit


~a50.01!


Table 4 List of time series employed in this study

Case Description Input level Data no. per input Total data no.

0 No bumper 3, 4, 5, 6, 7 V 15 sets 75 sets1 Bumper between m1–m2 3, 4, 5, 6, 7 V 5 sets 25 sets2 Bumper between m5–m6 3, 4, 5, 6, 7 V 5 sets 25 sets3 Bumper between m7–m8 4, 5, 6, 7 V 5 sets 20 sets



from a gamma parent distribution. In this example, the sampledata are generated from a gamma distribution witha53 andb55 for the parameter values in Eq.~20!. This gamma distributionhas the skewness value of 1.15 and kurtosis of 5.00, respectively.The associated density function is plotted on the left side of Fig. 4.The gamma distribution is skewed to the right for a small value ofa. As the degrees of freedom,k, increase the gamma distributionconverges to the normal distribution. The maxima of the gammaparent distribution are fit using the LSRPRE method, while theminimum values are fit using thestandard weighted least-squaresmethod with a weighting factor of 1@12#. In Table 3, the extreme

value method again shows a distinct advantage over the normalassumption. For the gamma distribution, method 3 returns fourlower numbers of false positives than expected from method 1,while method 2 again returns almost twice as many false positivesas method 1. The number of false positive indications returned bymethod 2 might lead to incorrect damage diagnosis of the system.

A drawback of EVS is that different methods of parameter es-timation are optimal for fitting different distributions. Once theparameter values of the extreme value distribution are estimated,there is, however, a noticeable advantage of EVS over normality

Fig. 3 The exact 99% confidence interval of a lognormal parent distributioncompared with those computed from either extreme values statistic or the nor-mality assumption

Fig. 4 The exact 99% confidence interval of a gamma parent distribution compared with thosecomputed from either extreme values statistic or the normality assumption



assumption in properly setting the threshold values. The next sec-tion applies the EVS technique to a test structure for damagedetection.

4 Experimental ResultsThe effectiveness of EVS is demonstrated using acceleration

time series recorded from an 8 DOF spring mass system shown inFig. 5. The system is formed with eight translating masses con-nected by springs. Each mass is an aluminum disc 25.4 mm thickand 76.2 mm in diameter with a center hole. The hole is lined witha Teflon bushing. There are small steel collars on each end of thediscs~Fig. 6!. The masses all slide on a highly polished steel rod

that supports the masses and constrains them to translate onlyalong the rod. The masses are fastened together with coil springsepoxied to the collars that are, in turn, bolted to the masses.

The DOFs, springs, and masses are numbered from the rightend of the system, where the excitation is applied, to the left endas shown in Fig. 5. The nominal value of mass 1~m1! is 559.3 g.Again, this mass is located at the right end where the shaker isattached. m1 is greater than the others because of the hardwareneeded to attach the shaker. All the other masses~m2–m8! are419.4 g. The spring constant for all the springs is 56.7 kN/m forthe initial condition. Damping in the system is caused primarilyby Coulomb friction. Every effort is made to minimize the frictionthrough careful alignment of the masses and springs. A commoncommercial lubricant is applied between the Teflon bushings andthe support rod.

The undamaged configuration of the system is the state forwhich all springs are identical and have a linear spring constant.Nonlinear damage is defined as an occurrence of impact betweentwo adjacent masses. Placing a bumper between two adjacentmasses so that the movement of one mass is limited relative to theother mass simulates damage. Figure 6 shows the hardware usedto simulate nonlinear damage. When one end of a bumper, whichis placed on one mass, hits the other mass, impact occurs. Thisimpact simulates damage caused by the impact from the closing ofa crack during vibration. Changing the amount of relative motionpermitted before contact, and changing the hardness of thebumpers on the impactors, can control the degree of damage. Forall damage cases presented, the initial clearance is set to zero.Table 4 summarizes each of the four damage cases. In damagecase 3, 5 of the 25 data sets were ignored because the excitationlevel was low enough that the bumpers did not contact the othermass, resulting in effectively undamaged cases.

In this example, the AR-ARX model is first fit to an accelera-tion time history measured from the baseline condition of thespring-mass system. If a time prediction model obtained from thebaseline system is used to predict a new time signal measuredunder a damaged condition, the prediction errors will increase.Based on this premise, novelty analysis is performed using theprediction errors as features. However, because the 8 DOF systemis also subject to changing excitation levels, the varying inputlevels might result in unwanted false outliers. To overcome thisdifficulty, the auto-associative neural network is employed fordata normalization. Because there are 4096 points in each caseand a 99% confidence interval is being used, one would expectthat for an undamaged case there would be 21 statistically deviantpoints, or outliers, on each side of the distribution, 42 outliers intotal. Table 5 summarizes the diagnosis results of the 8 DOF ex-periment.

The outliers in the undamaged data were slightly higher thanwas expected, but both the normality assumption and the extremevalue method yielded similar results. Several normality assess-ment techniques revealed that the prediction errors used as fea-tures were fairly close to normal, therefore there is no surprise thatthe normality assumption and EVS returned similar results in thiscase. Looking at m6 in the third damage case, the number ofoutliers is definitely above the undamaged case and would most

Fig. 5 An eight degrees-of-freedom system attached to ashaker with accelerometers mounted on each mass

Fig. 6 A typical bumper used to simulate nonlinear damage

Table 5 Summary of the 8 DOF system test results showing the predicted number of outlierscontrasted with the normal assumption and the extreme value statistics

*Highlighted cells show locations of damage where the bumper is placed between two masses and the number of outliers isexpected to increase.** Entries in the table represent the number of outliers for 90% confidence thresholds. The first number is obtained using theEVS method. The second number in parentheses is obtained using the normality assumption.



likely show up as a false-positive indication. m1 has consistentlythe lowest number of outliers. This is likely because it is con-nected to the shaker and has less variability than the other massesin the system. In most of the damage cases the two masses be-tween which the bumper is placed show a large increase in outli-ers, as expected.

5 SummaryData that lie in the tails of distributions have traditionally been

modeled based on a Gaussian distribution. This inherent assump-tion of many statistical processes can be dangerous for applica-tions such as novelty detection, which deal mostly with thoseextreme data points that may not be accurately modeled by theGaussian assumption. Extreme value statistics~EVS! takes acloser look at modeling those extreme points independent of anymeasure of the Gaussian assumption. Modeling the tails simplifiesthe statistics to some extent. These extreme points conform to oneof three types of distributions: Gumbel, Weibull, or Frechet. Inthis paper, the novelty detection is reworked to take advantage ofthese extreme value distributions.

The numeric examples demonstrated the ability of EVS whenapplied to simple novelty analysis. Thresholds obtained from theactual distribution, the best-fit normal distribution, and specifi-cally modeling the extreme values were contrasted. In all of theexamined cases, EVS produced results that only slightly deviatedfrom those of the true distributions. The new novelty detection,extended by incorporating EVS, was then applied to accelerom-eter time signals obtained from an 8 degree-of-freedom~DOF!spring-mass system. The 8 DOF system was created to demon-strate the robustness of EVS in detecting a nonlinear damage in-troduced into an otherwise linear system. The nonlinear damagewas introduced into the system in the form of bumpers placedbetween the translating masses. Looking at the 8 DOF system, theresults were much less drastic than the numeric examples. Severalnormality tests revealed that the features were nearly normallydistributed, and both the Gaussian and EVS methods yielded com-parable results. Despite limited scope, this paper improves theconventional novelty detection by computing a threshold value ina statistically rigorous manner from extreme value statistics in-stead of Gaussian statistics.

References@1# Worden, K., Manson, G., and Fieller, N. J., 2000, ‘‘Damage Detection Using

Outlier Analysis,’’ J. Sound Vib.,229, pp. 647–667.@2# Roberts, S., 1998, ‘‘Novelty Detection Using Extreme Value Statistics,’’ IEE

Proc. Vision Image Signal Process.,146, pp. 124–129.@3# Roberts, S., 2000, ‘‘Extreme Value Statistics for Novelty Detection in Bio-

medical Signal Processing,’’ IEE Proc.: Sci., Meas. Technol.,147, pp. 363–367.

@4# Worden, K., Pierce, S. G., Manson, G., Philip, W. R., Staszewski, W. J., andCulshaw, B., 2000, ‘‘Detection of Defects in Composite Plates Using LambWaves and Novelty Detection,’’ Int. J. Syst. Sci.,31, pp. 1397–1409.

@5# Kramer, S. L., 1996,Geotechnical Earthquake Engineering, Prentice Hall,Upper Saddle River, NJ.

@6# Takahira, S., and Mita, A., 2002, ‘‘Damage Index Sensors for Structural HealthMonitoring,’’ The Second International Conference on Advances in StructuralEngineering and Mechanics, Busan, South Korea, August 21–13.

@7# Gumbel, E. J., 1958,Statistics of Extremes, Columbia University Press, NewYork, NY.

@8# Galambos, J., 1978,The Asymptotic Theory of Extreme Order Statistics, JohnWiley and Sons, New York.

@9# Embrechts, P., Kluppelberg, C., and Mikosch, T., 1997,Modeling ExtremalEvents, Springer-Verlag, New York.

@10# Kotz, S., and Nadarajah, S., 2000,Extreme Value Distributions: Theory andApplications, Imperial College Press, UK.

@11# Reiss, R. D., and Thomas, M., 2001,Statistical Analysis of Extreme ValuesWith Applications to Insurance, Finance, Hydrology and Other Fields,Birkhauser Verlag, Boston.

@12# Castillo, E., 1998,Extreme Value Theory in Engineering, Academic Press Se-ries in Statistical Modeling and Decision Science, San Diego, CA.

@13# Box, G. E., Jenkins, G. M., and Reinsel, G. C., 1994,Time Series Analysis:Forecasting and Control, Prentice–Hall, Upper Saddle River, NJ.

@14# Ljung, L., 1999,System Identification: Theory for the User, Prentice Hall,Eaglewood Cliffs, NJ.

@15# Sohn, H., and Farrar, C. R., 2001, ‘‘Damage Diagnosis Using Time SeriesAnalysis of Vibration Signals,’’ Smart Mater. Struct.,10, pp. 446–451.

@16# Sohn, H., Worden, K., and Farrar, C. R., 2003, ‘‘Statistical Damage Classifi-cation Under Changing Environmental and Operational Conditions,’’ J. Intell.Mater. Syst. Struct.,3~9!, pp. 561–574.

@17# Fukunaga, K., and Koontz, W. L. G., 1970, ‘‘Application of Karhunen-LoeveExpansion to Feature Selection and Ordering,’’ IEEE Trans. Comput.,C-19~4!,pp. 311–318.

@18# Fukunaga, K., 1990,Statistical Pattern Recognition, Academic Press, San Di-ego, CA.

@19# Kramer, M. A., 1991, ‘‘Nonlinear Principal Component Analysis Using Au-toassociative Neural Networks,’’ AIChE J.,37, pp. 233–243.

@20# Rumelhart, D. E., and McClelland, J. L., 1988,Parallel Distributed Process-ing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge,MA.

@21# Bishop, C. M., ‘‘Novelty Detection and Neural Network Validation,’’ IEEProc.: Vision, Image, and Signal Processing,141~4!, pp. 217–222. Specialissue on applications of neural networks.

@22# Tarassenko, L., Nairac, A., Townsend, N., Buxton, I., and Cowley, Z., 2002,‘‘Novelty Detection for the Identification of Abnormalities,’’ Int. J. Syst. Sci.,31~11!, pp. 1427–1439.

@23# Worden, K., 1997, ‘‘Structural Fault Detection Using A Novelty Measure,’’ J.Sound Vib.,201~1!, pp. 85–101.

@24# Duda, R. O., and Hart, P. E., 1973,Pattern Classification and Scene Analysis,John Wiley and Sons, New York.

@25# Barnett, V., and Lewis, T., 1994,Outliers in Statistical Data, John Wiley andSons, Chichester, UK.

@26# Benjamin, J. R., and Cornell, C. A., 1970,Probability, Statistics and Decisionfor Civil Engineers, McGraw-Hill, Inc., New York.

@27# Fisher, R. A., and Tippett, L. H. C., 1928, ‘‘Limiting Forms of the FrequencyDistributions of the Largest or Smallest Members of a Sample,’’ Proc. Cam-bridge Philos. Soc.,24, pp. 180–190.

@28# Worden, K., Allen, D. W., Sohn, H., and Farrar, C. R., 2002, ‘‘Extreme ValueStatistics For Damage Detection in Mechanical Structures,’’ Los Alamos Na-tional Laboratory Report LA-13903-MS.

@29# Wirsching, H., Paez, T. L., and Ortiz, K., 1995,Random Vibrations Theory andPractice, John Wiley and Sons, New York.

@30# Cybenko, G., 1989, ‘‘Approximation by Superposition of a Signoidal Func-tion,’’ Math. Control, Signals, Syst.,2~4!, pp. 303–314.

@31# Farrar, C. R., Baker, W. E., Bell, T. M., Cone, K. M., Darling, T. W., Duffey,T. A., Eklund, A., and Migliori, A., 1994, ‘‘Dynamic Characterization andDamage Detection in the I-40 Bridge Over the Rio Grande,’’ Los AlamosNational Laboratory Report LA-12767-MS.

@32# Sanger, T. D., 1989, ‘‘Optimal Unsupervised Learning in a Single-Layer Lin-ear Feedforward Neural Network,’’ Neural Networks,2~6!, pp. 459–473.



Structural Damage Classification Hoon Sohn Using Extreme ...ssslab.kaist.ac.kr/article/pdf/2005_Structural.pdf@DOI: 10.1115/1.1849240# Keywords: Extreme Value Statistics, Novelty Detection,

Documents