10.1.1.124.7488

8/2/2019 10.1.1.124.7488

1/6

Fourth InternationalConJhmeaknowMgeBmai Intelligent Engineeh8 Systems6 llied Techdogics.30"Aug-1' Sept ZWO, Brighh,UK

Classification of surface defects on Hot Rolled Steel UsingAdaptiveLearning MethodsP. Caleb,M. SteuerIntelligent Computer Systems Centre, University of the West of England, Bris tol, BSI 6 I QYE-mail: Praminda. Cale b-solly @u we. c .ukAbstractClassification of local area surface defects on hotrolled steel is a problematic task due to thevariability in manifestations of the defects groupedunder the same defect label. This paper discussesthe use of two adaptive computing techniques,based on supervised and unsupervised learning,with a view to establishing a bask for buildingreliable dec ision support systems fo r classifica tion.IntroductionThere are a number of difficulties associated withthe classification of defects in hot rolled steel. Oneof the main problems arises due to the fact that eachclass of defect, as grouped by an expert,encompasses a wide range of manifestations. Thiscan be due to the degree of severity of the defect orthe composition of the steel. The result of this isthat several subgroups might fall under the umbrellaof one label. The ambiguity of feature values thatresults within each labelled class makes it difficultto build reliable classifiers.One might try to separate these subgroups, butoften as there can be no clear distinction, thisbecomes an extremely labour intensive andsubjective exercise.The problem is m e r xacerbated by the surfaceof hot rolled steel being an extremely noisyenvironment. From an image processingperspective this implies that a number of spuriousregions get segmented as being regions of interest.It is possible to exclude the spurious regions usingsimple threshold functions on a few of the featurevalues, but this approach while reasonablysuccessful is not the complete answer.Using a supervised learning method fordistinguishing spurious regions fiom actual defectsrequires labelling. Again, as the spurious region cantake on any one of a number of forms, to label themall as one class results in deterioration in theperformance of the classifier. A limited size,manually labelled, data set was used to evaluatesupervised learning methods for classification.Multi-layer perceptrons trained using errorbackpropagation were used, together withpolynomial neural networks that implement GroupMethod of Data Handling, a feature based mappingnetwork technique.

An alternative approach to the problems discussedabove is to use an unsupervised learning technique.In unsupervised learning, the system adapts in orderto recognise groups of similar cases &om within itsset of training examples without being told inadvance what groups or affinities there might be. Indoing so, it develops a set of prototypical vectors orcentres that correspond to the centres in the data setto form regions or clusters. These regions can beidentified and calibrated by projecting known dataexamples on to the trained map, so when previouslyunseen cases are projected, it is possible to get anindication of which group they belong to byobserving which region is activated.The Self Organising Map (SOM) r Kohonennetwork [Kohonen 19901 was used forunsupervised learning.BackgroundThe computational processing required for theanalysis of hot rolled steel is comprised of threemajor components. The first component is thesegmentation module. It includes image processingprocedures for pre-processing and segmenting theregions of interest. The next component consists offeature extraction routines. These generate a set ofdescriptors for each of the regions of interest in theimage.The third module is the classification module thatindicates the presence of a defect and provides alabel to indicate which class it belongs to or someother indication of its nature.Defects occurring in hot rolled steel can either belocal area defects covering only small regions of theimage, or whole area defects, coveringcomparatively larger area of the image.Accordingly, they are described by a different set ofdescriptors, so are handled by different featureextraction and in turn, different classificationmodules.This study deals with the identification of local areadefects such as bruises, rolled in bruises, scratches,rolled in scale and skin laminations.Image SegmentationSegmentation involves the delineation of theregions of interest fiom the background. There are alarge number of methods available that performedge detection of the boundaries of the Regions ofInterest. The most commonly used method is based

1030-7803-6400-7/00/$10.002000EEE

8/2/2019 10.1.1.124.7488

2/6

Fort& Internationalcrmfnmcca L7lauladgcsased ntclligent Eng i m ' ng Systems6 llied Technologies, Aug-1' Sept 2000, Bri@ton,UK

on using gradient operators. However hot rolledsteel images generally have an extremely noisysurface and the gradient operator method was notfound to be successful for a large number ofimages. Too much of the background noise wasgetting segmented and also the optimumsegmentation parameters varied considerably fiomone image to another.The method that was found to perform consistentand robust region detection was developed usingtexture based segmentation.Visually, texture can be described as fine, coarse,smooth, speckled etc. Mathematically, one of theways in which textural information can be extractedis by computing the statistical relationships of thespatial distribution of the sub-patterns of greylevels. This method, developed by R.M. Haralick[Haralick 19731, involves the calculation of a set ofgrey level spatial dependence probabilitydistribution matrices or grey level co-occurrencematrices (GLCMs). GLCMs are based on thesecond order joint conditional probability densityfunctions. Each value in the matrix is theprobability that a grey level difference x will occurat a certain distance and angle in a givenrectangular region of an image. From this GLCM,textural features, which are statistical measures thatrelate to specific textural characteristics such ashomogeneity, contrast, correlation, variance andentropy, are calculated.In this study the image was simultaneouslyprocessed by two texture modules and thenrecombined. Each texture module used a differentsized kernel, with the GLCM calculated at adifferent orientation angle. These texture moduleswere thus tuned to make them more sensitive tolarger regions of interest in the vertical direction,such as bruises or more sensitive to smaller regionsof interest in the vertical direction such as finescratches.In order to extract these ROI the texture image wasthresholded. The thresholded texture images fromthe two branches were then combined together.Feature ExtractionThe features that are extracted to describe theregions of interest (blobs) that have been segmentedare a combination of geometric and morphologicaldescriptors and a set of texture measures.There are 1 1 spatial features which include area ofthe blobs, orientation, the angle between the majoraxis of the blob and the horizontal, eccentricity,which is the ratio of the length of the longest chordof the blob to the longest chord perpendicular to it,and invariant moments.There are 13 texture measures based on theHaralick's methodology [Haralick 19731 formathematically describing texture. These include,measures of homogeneity and local variations,

Figure 1 Original image and corresponding textureimage, thresholded and outlined.measures of grey level linear dependencies,measures of grey level variance within the window(second-order moment about the mean), entropy, ameasureof the average uncertainty of grey level CO-occurrence and the range of viilues for each of thetexture measures.The texture features are derived from GLCMscalculated in all four directions, 0", 45", 90" and135" for the rectangular regicin enclosing each ofthe blobs.Classification TechniquesArtificial neural networks are adaptive learningalgorithms inspired by bio1og;ical neural systems.The biological neuron is represented by the neuronor node, which produces an output which is afunction of its weighted inpi~ts.Artificial neuralnetworks are arrangements of nodes that can betrained to "learn" with or without "supervision". Inunsupervised learning, the network has to recognisegroups of affinities between inputs without beingtold in advance what groups or affinities theremight be. In supervised "learning" the network ispresented with a set of known inputs and has totrain itself to produce an output as close as possibleto a known output.

10 4

8/2/2019 10.1.1.124.7488

3/6

Multi-Layer Perceptrons and the ErrorBackpropagation AlgorithmThe Multi-layer Perceptron (MLP), trained usingthe error back-propagation algorithm, is a popularbasis for supervised learning. An MLP consiststypically of a layer of input nodes, each linked byweighted connections to every one of a layer ofhidden nodes: each of these is linked, in turn, o aset of output nodes (Figure 2). The number of inputnodes is equal to the number of features that may berelated to the output. In our case, the input featuresare the descriptors of the segmented regions ofinterest, and the output is the label describing thedefect. A single output node is sufficient for abinary outcome (one of two, "defect" or "non-defect", etc.), or a one of N encoder where the Nare the number of classes.InputLaya Hidden OutputLava Laver

Figure 2. A Multi-layer PerceptronThe multilayer perceptron is "trained" using a set ofknown inputs and known designated outputs. Theoutput of the network for a set of inputs iscompared with the designated output, and the errorsare "propagated back" into the network so as tomodify the weights on the nodal interconnectionsand thereby reduce the error between the actualoutput and the designated output. The speed anddirection of weight modification is determined bythe user-controlled parameters of learning rate andmomentum. The trained network is a model of therelationship between the data and the designatedoutputs in the training data set. Its 'accuracy inpredicting unknownoutputs from new sets of inputsis the principal test of its performance. Training aMLP network involves a considerable a degree ofempiricism, so that success can depend heavily onthe expertise of the trainer. It also depends on thenature and quality of the data on which it is trained.It is relatively straightforward when there is noshortage of well-characterised exemplars of allclasses that are to be distinguished. If not, thenumbers may need to be inflated by"bootstrapping", creating additional exemplars withthe same underlying group characteristics as theoriginal ones, but with random noise added toprovide variety.Group Method of Data HandlingThe Group Method of Data Handling (GMDH) is afeature-based mapping polynomial neural network.

UInput I" Hidden 2" Hidden 3d Hidden OutputUnit s Layer Layer Layer Unit

Figure 3. A GMDH networkGMDH enables the simultaneous discovery of amathematical model of the relationship between thefeatures and the output classes, and the dependenceof the modelled system output on the values of themost significant features.The architecture of an example GMDH can be seenin Figure 3.The network starts with just the inputlayer and grows progressively toward the outputlayer, one layer at a time. In the first stage thehidden layer is configured with one processingelement for each different pair of outputs from theprevious layer. The output of each hidden layerprocessing element is a quadratic combination of itstwo inputs. Each processing element in that layer istrying to produce an output equal to y, this isachieved via linear regression. The performance ofeach processing element is calculated by measuringthe mean square error of each element on a set ofnew testing examples. The processing elementswith the larger mean square error than a pre-setthreshold are eliminated, thus leaving only thefittest polynomials elements in that layer. This layeris then frozen. The process of building the networklayer by layer continues until a stopping criterion issatisfied. If the mean squared error of the bestperforming element of each hidden layer is plotted,a global minima is observed before the error startsto rise again with the addition of further layers. Thelayer of the minima is where further constructionshould cease and the output becomes the best unitof that layer. Each of the preceding layers is thentrimmed to eliminate those units whose outputs donot connect to the final output unit.It should be noted that a network is created foreach class.Self Organising MapsThe Self Organising Map (SOM) as inspired bythe cerebral cortex where sensory input is mappedin an ordered two-dimensional fashion across anarray of neurons.The simulated SOM is an array of processingelements or output nodes, each of which has a

105

8/2/2019 10.1.1.124.7488

4/6

Fourth InternationalCO~JIWNCCn knowlalge-Bosed ntelligent EngineeringSystem0 Allied Teshnoiogies,3@ Aug-1 Sept 2000, BrighbOn.UK

No. ofcases154138I 9994191 5801084

weighted connection to every input (or feature),Figure 4. Initially, the connections between inputsand output nodes are randomly weighted. A vectorquantisation algorithm then changes the weights sothat the nodes form into clusters that depend onboth the values of the inputs and their frequency ofoccurrence. The Euclidean distance between theinput vector and the weight vector of a particularoutput node becomes a measure of the closeness ofan input to an output, so that a particular outputnode will respond to a particular set of inputfeatures. The clusters of output nodes thus createdon the map can then be labelled a posteriori byexamining the characteristics of the cases that arerepresented in different regions.Training a SOM also requires some empiricism, inchoosing the format of the array of output nodesand selecting the input features from thoseavailable: it also involves some subjectivity indeciding on the best map from the range of thoseproduced. However, it requires no a priori valuejudgements about the nature of the input.The performance of an SOM an be judged by howwell new test data map to the labelled clustersidentified with the trainingdata.

InputsFigure 4. The Self OrganisingMap.This can be measured by creating histogramprojections or measuring the average quantisationerror between data vectors and their best matchingunit on the map.MethodsThe data setThe table below shows the distribution of thedifferent classes.

1. Rolled in Scale2. Bruise3. Rolled in Bruise4. Scratch5. Skin Lamination6. No DefectTotal

The data was partitioned into 5 pairs of trainiig andtest sets with a 70% and 30% dlistribution of casesrespectively for each partition.Subsets derived from the originzil data set were alsoconstructed by reducing the number of features.The feature reduction was based on a GA selectionmethod and also a statistical method that involvedanalysing the correlation between the features andthe discriminant functions. Sevleral of these subsetswere experimented with.Additional data sets were also constructed by themethod of bootstrapping to generate training setswith an even distribution of the different classes.Experiments and resultsSupervised neural networksBoth GMDH and MLP networks were implementedusing in-house software.Various MLP network configurations wereexperimented with, varying thle number of hiddenlayer nodes. Sigmoid activation functions wereused.GMDH was implemented using the basiccombinatorial algorithm. For the each class, aseparate GMDH network was constructed, with asingle output node, represented by a 1 in thetraining data for the class of interest and 0 for allthe other categories. To evaluate the networks, eachcase in the test set was presented to all thenetworks, and the classification was on the basis ofthe class of the network with thie highest activation.The networks were trained to i i number of differentsensitivitiesor mean square enor thresholds.In the fmt instant classifiers were constructed toseparate between the spuriously segmented regionsor non-defects, and the defects. There were roughlyequal numbers of both classes in the training andtest sets. MLP networks gave an average accuracyof 97% and 93 % ,and the GiMDH networks 91%and 88% on the training and test sets respectively.On building classifiers to separate all the sixclasses, i.e. the non-defect cllass and the 5 defectclasses, it was found that using the data subsetcomprising a reduced number of 24 features gavemarginally better results, as compared to using allthe 37 features. In addition it was also found thatthe MLPs performed better than GMDH in terms ofpercentage accuracy, an average accuracy of 80%for the training set and 77 YO or the test set.The experiments to separatle all 6 classes wererepeated using the bootslrapped data subsetcomprising all the features. The results showed anoticeable improvement in the classificationaccuracy, for the MLP the average accuracy on thetraining set was 86% and on the test set was 80%.

106
http://brighbon.uk/http://brighbon.uk/

8/2/2019 10.1.1.124.7488

5/6

Four th lntrmntional G m w e n know&dgeBaxd Intelligent E n g i m h g Systems&Allied Technologes, Aug-I Sept 20a0, Brighton,UK

Lastly, classifiers were constructed using thebootstrapped data subset to distinguish betweenjust the five defect classes. Having removed thenon-defect classes resulted in a considerableimprovement in the ability of the classifiers toclassify the defect data. A classification accuracy of89% for the training set data and 78 % for test data.Extracting the results for the defect classes fiom theconfusion matrices generated fiom the previousexperiment using all six classes, the averagepercentage classification accuracy was 84% fortraining data and 64% for test data. The resultsquoted are those obtained using MLP networkswhose overall performance was higher than that ofthe Gh4DH networks.Self-OrganisingMapsSOM networks were simulated using the self-organising map package SOM-PAK adapted forMATLAB, prepared by the SOM ProgrammingTeam of the University of Helsinki.In order to ensure that each feature hasapproximately the same influence during thetraining of the SOM, it is necessary to normalise thedata sets. In this study the variance of each featurewas normalised to 1 and the mean to 0.A number ofhexagonal lattices made up of grids ofx by y nodes in a rectangular map shape wereexperimented with. The weight vectors of the mapwere linearly initialised in the subspace covered byk first principal components of the data found usingsingular value decomposition. The initialneighbourhood radius was calculated in relation tothe map dimensions. A gaussian neighbourhoodfunction was used to calculate the values by whichthe weights of the winning node and itsneighbouring nodes were adapted.The aims of the experiments conducted using theSOM networks were 1) to explore the sub-groupspresent within each of the classes and 2) to re-analyse the defects misclassified by the supervisedlearning methods by associating them to the clustersformed on the SOM.With these objectives in mind, it was decided toconstruct the maps using all the data, rather thansplit it into training and test sets.In order to explore sub-clusters within groups, mapswere constructed using each of the separate classes,and using the U-matrix method pUltsch A] tovisualise the boundaries between the different sub-clusters.SOM networks form an estimation of theprobability density function of the training data Aswe were more concerned with evolving prototypesof the centres that existed in the data set rather thangenerating a statistical model of the data, thebootstrapped data set was utilised. Maps wereconstructed using all the features and the reducedfeature subset consisting of 24 features that had

performed well using the MLP networks. Theseexperiments were repeated for a training setcontaining all the six classes and a training setcontaining only the defect classes.To visualise formation of clusters we used the hitshistograms, which are a representation of theseparate classes projected onto the trained map,Figure 5 . Analysis of ResultsUsing supervised methods of data classification, ona limited labelled data set, the best results are usingMLPs to separate between regions labelled asdefects and non-defects. The defects were split intoone of five categories. Using the limited number ofcategories did lead to some ambiguous labelling,and this is evident fiom the conhsion matrices ofthe results for the classification on a class basis.This was also apparent on visual re-examination ofsome of the regions wrongly classified, the degreeof severity and the manifestation of the defect beingthe main causes on the conhion.An improvement in the performance of theclassifiers was noted when the bootstrapped data setcontaining approximately equal number of classeswas used and a further improvement in theperformance when they were trained to classifyonly the defect classes.Even though the results of the GMDH classifierswere not as good as the MLP, it was found thatsome of the features were consistently not utilisedfor the construction these classifiers. The possibilityof using GMDH for the purposes of featureselection is currently under further investigation.Analysing the projections on the Self OrganisingMaps, there is a clear separation of regions whichhave been labelled as not being defects and thosethat have. There is a small degree of overlap in theRolled in Scale class, but this is possibly becausesome of the regions that were labelled as not beingdefects did in fact include a certain amount ofdebris.There is also separation between the different typesof defects. In addition, sub-clusters within thelabelled classes can also be seen.The smaller sized maps result in the formation oftighter clusters due to merger of closely correlatingcases.The reduced feature subset gave the lowestquantisation error and the best visual separation ofclasses based on the expert labelling.On examination of the actual images of the defectsmisclassified by the MLP-BP network andprojecting these on to the SOMmap and observingthe regions on the map they activate, it is evidentthat the groupings formed on the SOM consists ofvisually similar objects irrespective of the labelling.As a result, some of the apparent overlap of classes

107

8/2/2019 10.1.1.124.7488

6/6

Four t h InternationalCon# onkwwltdge-Based IntelligentEnginewingSystemsb Alliai Technolops, 30Aug-I Sept 2000, Brighton,UK

can be attributed to an ambiguous manifestation ofthe defect.ConclusionsIf supervised learning methods are to be used forthe purposes of classifying regions of interestsegmented on the surface of hot rolled steel, it isclear that to ensure the construction of reliableclassifiers an alternative labelling scheme will haveto be adopted. This would be a morecomprehensive categorisation of the defects whichwould incorporate information directly relating tothe manifestation of the defect and thus be moreclosely related to the mathematical morphologicaland textural features derived from the it. Apart fromincreased labour costs this would involve, it wouldhave to be ensured that the data sets used for thetraining and testing of the classifiers would containan adequate number of representatives of each ofthe categories in reasonably balanced numbers.Increased mapping complexity in terms of thenumber of classes that have to be separated in afuted input space increase the networks complexityresulting in a scaling problem and the networksability to generalise. Simple problemdecomposition on a class level would involvecascading two networks, the first networkseparating regions on defecthon-defect basis andthe second only dealing with the cases classified asdefects by the first.Incorporating the Self Organising Map within a realtime inspection system would require calibratingthe map into category zones. This would involve re-annotating the data based on the region on the

RolledIn Scale

20Rdled In Brulses

20S M n Leunlnatlons3.

trained map it activated and visual inspection ofother blobs that activate the same region. Thiswould be a useful aid in generating new categorylabels that were directly correlated to the features.Functionally, the SOM could be used to augmentthe image processing, in order to remove anyspurious regions delineated which were justbackground noise. At a higher level, the calibratedSOM could be used to colour code the remainingregions of interest depending on which categoryzone they fell in. In this way the human inspectorcould be informed of the type of defect. In additionthey could also be given an indication about itsseverity and manifestation, depending on the rangeof data it was trained with, thereby dictating thegranularity of partitioning of the SOM.ReferencesHaralick, R.M., hanmugam, K., Dinstein, I.(1973): Texlure Feature for ImageCfmsification, IEEE Transactions on Systems,Man and Cypernetics, Vol. SMC-3, No 6,November., pp 610-620.Kohonen T. (1990) The Self-organizing Map.Proceedings of the IEEE, 78(9), 1990, 1464-1480.Kohonen T. (1995) Self-.Organizing Mups(Springer Series in Information Sciences, Vol 30).Springer-Verlag, Heidelberg, 1995.Ultsch A. Self-rganising N eural Networks forVkuafisalion and Cfmsijlcution.Department ofComputer Science, University of Dortmund,P.O.Box 500, D-4600 Dorbnund 50, Germany.Report.

Eruises

I

20Scratches

20 No Defect

20 20

Figure 5 . Defects as classified by an expert, projected on to a trained SOM map.

108

10.1.1.124.7488

Documents