SAR imagery segmentation using probabilistic winner-take-all clustering

SAR imagery segmentation using probabilistic winner-take-allclusteringHossam Osman and Steven D. BlosteinDepartment of Electrical and Computer Engineering, Queen's UniversityKingston, Ontario, Canada, K7L 3N6AbstractThis paper applies a recently-developed neural clustering scheme, called "probabilistic winner-take-all (PWTA)", to image segmentation. Experimental results are presented. These results showthat the PWTA clustering scheme signi�cantly outperforms the popular k-means algorithm whenboth are utilized to segment a synthetic-aperture-radar (SAR) image representing ship targets inan open-ocean scene.Keywords: image segmentation, SAR imagery, clustering, arti�cial neural networks1 INTRODUCTIONImage segmentation is the partitioning of an image into di�erent regions, each having certain prop-erties. Usually, it is the �rst step in an image-processing system that aims at either a descriptionor a classi�cation of an image. It is a critical component of this system since errors in segmenta-tion would most probably propagate to the next processing steps. One successful approach to imagesegmentation is clustering.1;2 Among the clustering techniques that have been utilized, the k-means3scheme is the most popular. Here, in this paper, the utilization of a new neural clustering scheme,called "probabilistic winner-take-all (PWTA)", in the application of image segmentation is proposed.The PWTA has been developed recently by Osman and Fahmy4 as a neural approximation to theoptimal Bayesian clustering. It has the advantage of overcoming the unit under-utilization problem4,typically encountered using other neural clustering schemes, without making any assumptions aboutthe populations of the di�erent clusters.The paper is organized as follows. Section 2 discusses the clustering problem. Section 3 describesthe PWTA technique. Then, Section 4 investigates its utilization in the application of image segmen-tation. Finally, Section 5 contains the conclusion of the given work.

2 CLUSTERING PROBLEMSuppose that a collection of m clusters is needed to model the distribution of an n-dimensional vectorx. Suppose that the jth cluster, Hj, has a known a priori probability P (Hj), and let p(xjHj;�j)denote the jth cluster-conditional probability density function, where �j is an unknown parametervector. Finally, suppose that we are given a training set TN = fx1;x2; � � � ;xNg ofN unlabeled samplesdrawn independently from the mixture density p(xj�), where �t = (�t1;�t2; � � � ;�tm); andp(xj�) = mXi=1 p(xjH i;�i)P (H i): (1)Given the above assumptions, the problem, now, is to estimate � from TN . Theoretically speaking, theBayesian scheme3 is one possible approach for solving this clustering problem. The Bayesian schemeassumes that part of our knowledge about � is contained in a known a priori density function p(�).By observing l patterns, p(�) is converted to the a posteriori density function p(�jXl), where Xl isa set composed of the l observed patterns sorted according to the order of their observation. Let xidenote the pattern observed at the lth iteration, and let S denote the space over which � is de�ned.Then, the a posteriori density p(�jXl) is computed from p(�jXl�1) using Bayes' theorem as followsp(�jXl) = p(xij�)p(xijXl�1) p(�jXl�1); (2)where p(xijXl�1) = ZS p(xij�) p(�jXl�1) d�; (3)and p(�jX0) = p(�). Unfortunately, the Bayesian scheme is computationally infeasible for mostpractical choices of p(xjHj;�j). For the Bayesian scheme to be feasible, this unsupervised clusteringproblem has to be converted to a supervised one, since in this case, for most practical choices ofp(xjHj;�j), a su�cient statistic vector exists for �.5 Based on this argument, Agrawala has proposedhis "learning with a probabilistic teacher" scheme.6 In his scheme, when a training pattern xi ispresented, the a posteriori probabilities of all clusters are computed using all available informationand then, using these probabilities, xi is probabilistically assigned to a cluster. In other words, using aprobabilistic teacher, the unsupervised clustering problem is converted to a supervised one. Thereafter,to this supervised problem, the Bayesian approach is applied. Mathematically, at the lth iteration,when xi comes in, the a posteriori probabilities are computed usingP (Hj jxi;Xl�1;Ll�1) = p(xijHj;Xl�1;Ll�1)P (Hj)Pmk=1 p(xijHk;Xl�1;Ll�1)P (Hk) ; j = 1; : : : ; m; (4)where p(xijHj ;Xl�1;Ll�1) = ZS p(xijHj;�) p(�jXl�1;Ll�1) d�; (5)Ll�1 = f�1; �2; � � � ; �l�1g with �l being the probabilistically chosen label at the lth iteration, i.e., �l = j,if the lth observed pattern is probabilistically assigned toHj , and the notation (Xl�1;Ll�1) is employedto emphasize the role of the probabilistic teacher. Once the a posteriori probabilities are computed,

a label �l is probabilistically chosen for xi such that�l = Hj ; with probability P (Hj jxi;Xl�1;Ll�1); j = 1; : : : ; m: (6)Suppose that �l = `. Then, Bayes' theorem is employed to update p(�jXl�1;Ll�1) using the patternxi as follows p(�jXl;Ll) = p(xijH`;�)p(xijH`;Xl�1;Ll�1) p(�jXl�1;Ll�1): (7)For a proper p(xijH`;�), such as any member of the exponential family, the density p(�jXl;Ll) is areproducing one, and for each computation of (7), it is only needed to update a su�cient statisticvector s.5 Hence, at the lth iteration, we havep(xijHj;Xl�1;Ll�1) = p(xijHj; s(l� 1)); (8)P (Hj jxi;Xl�1;Ll�1) = P (Hj jxi; s(l� 1)); (9)and p(�jXl;Ll) = p(�js(l)): (10)This makes Agrawala's clustering scheme computationally feasible. Agrawala has proved that if theassumptions given while de�ning the clustering problem in the beginning of this section are valid andthe mixture density p(xj�) is identi�able, then as l ! 1, p(�jXl;Ll) converges in probability to aDirac delta function positioned at the true value of �.6 In the next section, the PWTA clusteringscheme is described.3 PROBABILISTIC WINNER-TAKE-ALL CLUSTERINGUpon examination of (2) and (7), it is seen that Agrawala's clustering scheme can be viewed as anapproximation to the Bayesian one in the sense that the probabilistically chosen cluster H` has theproduct p(xjH`;�`) P (H`) much higher than those of the other clusters and therefore, instead ofevaluating p(xj�) using the summation involved in (1), Agrawala's scheme uses4p(xj�) � probabilistically chosenmi=1 p(xjH i;�i) P (H i): (11)Now, assume that �j ; j = 1; : : : ; m, are statistically independent, and let st = (st1; st2; � � � ; stm) ,where si is the su�cient statistic of the ith cluster, thenp(�js) = mYi=1 p(�ijsi): (12)Equation (12) implies that patterns that belong to cluster Hj give no information about �i, if i 6= j.Thus, in view of (12), Equation (7) can be rewritten asp(�j jsj(l)) = p(�j jsj(l� 1)); j = 1; : : : ; m; j 6= `; (13)and p(�`js`(l)) = p(xijH`;�`)p(xijH`; s`(l� 1)) p(�`js`(l� 1)); (14)

where p(xijH`; s`(l � 1)) = ZS p(xijH`;�`) p(�`js`(l � 1)) d�: (15)To derive the rule of updating s`, our attention has to be restricted to a special form of p(xjHj;�j).For a random parameter p, de�ne the univariate Gaussian density byGp(a; b) 4= (2�b)�1=2 expf�0:5(p� a)2=bg; (16)with a and b being the mean and variance of the Gaussian density, respectively. Assume that the jthcluster has an axes-aligned elliptical Gaussian density with unknown center cj = (cj1; cj2; : : : ; cjn)t andunknown diagonal covariance matrix whose components constitute the vector �j = (�j1; �j2; : : : ; �jn)t,i.e., p(xjHj;�j) = nYi=1Gxi(cji; �ji); (17)where �j = (ctj ;�tj)t: (18)Also, let Sp be the space on which a random parameter p is positive and de�ne the Wishart densityfor p by5 Wp(e; d) 4= 1� f0:5 (d� 1)g (0:5de)(d�1)=2 p(d�3)=2 expf�0:5depg on Sp; (19)4= 0 otherwise; (20)where �f:g is the well-known gamma function, e is a positive number, and d > 1. The parameter dmeasures how Wp(e; d) is concentrated around e, or in other words, it re ects the con�dence in e asan estimate of p. In view of (17), when cj and �j are estimated by the sample mean and variance,respectively, p(cj ;�j jsj) is known to be a composite Gaussian-Wishart density5, i.e.,p(�j jsj) = nYi=1 Gcji(cji; �ji=�j) W�ji(�ji; Nj); (21)where the parameter �j re ects the con�dence in cj as the estimate of cj , the parameter Nj re ectsthe con�dence in �j as the estimate of �j , and it is implied thatsj = (�j ; �j; Nj)t= (ctj ; �tj ; �j ; Nj)t: (22)De�ne �j 4= �f0:5Njg�f0:5(Nj � 1)g : (23)Then, using (17) and (21), Equation (15) yields4p(xjHj; sj) = �j�2j�(1 + �j)Nj!n=2 nYi=1 1p�ji 1 + �j(1 + �j)Nj (xi � cji)2�ji !�Nj=2 (24)

Also, in view of (17), (21), and (24), Equation (14) can be rewritten as4c`(l) = �`(l� 1)c`(l� 1) + xi1 + �`(l� 1) (25)�`(l) = �`(l� 1) + 1 (26)�`k(l) = 1N`(l � 1) + 1 �N`(l � 1)�`k(l� 1) + �`(l� 1)1 + �`(l� 1)(xik � c`k(l� 1))2� ;k = 1; : : : ; n (27)N`(l) = N`(l � 1) + 1 (28)Therefore, at each iteration, it is only needed to update the components of the su�cient statistic s`using Equations (25)-(28). From (23), (28), and the fact that �fp+ 1g = p �fpg, it is seen that�`(l) = 0:5(N`(l� 1)� 1)�`(l � 1) ; (29)and thus (23) is used only to compute �`(0). For an asymptotically large number of iterations, it hasbeen proved that4 liml!1 p(xjH`; c`(l); �`(l); �`(l); N`(l)) = nYk=1Gxk(c`k(1); �`k(1)); (30)where if the made assumptions are valid and the input distribution is identi�able, c`(1) and �`(1)are the true values of c` and �`, respectively. Equations (25)-(29) constitute the learning rule of aniterative clustering scheme that can be directly implemented using a one-layer neural network. The jthnetwork unit corresponds to the jth cluster and is fully connected to the n input units. The jth-unitparameters include sj and �j . As for the unit activation function hj(x), it has two di�erent forms; onewhile training given by (24) and one once training is complete given by (17). In view of (6), (13), and(14), it is noted that for each input presentation the units, by evaluating their posterior probabilities,compete for the right to learn the input pattern and only one of them, the winner, is permitted todo so. Hence, this learning scheme is a winner-take-all7 (WTA) competitive one. However, since,in view of (6), the winner is probabilistically chosen, this scheme has been called "the probabilisticwinner-take-all (PWTA)" clustering scheme.4 Its various steps can be stated as follows:1. The means of all units are initialized with m training patterns randomly selected from thetraining set. The whole training data are assumed to have a covariance matrix given by �I,where I is the identity matrix. The variance � is estimated from the data and, then, dividedby m to yield an initial value for the variances of all units. The parameters �j(0) are set to1, and Nj(0) are set to 2, where from (23), Nj(0) must be greater than 1. Also, using (23),�j(0) = �f1:0g=�f0:5g= 0:5. Finally, the iteration number l is set to zero.2. l is incremented, and a randomly-picked training pattern xi is applied to the network.3. The jth unit computes its activation hj(xi).4. The m computed activation values are normalized. Assuming that the a priori probabilities,

P (H i), are equal, now, hj(xi) corresponds to the a posteriori probability P (Hjjxi; sj(l� 1)).5. The label of the winner of the lth competition is probabilistically chosen such that�l = j with probability hj(xi); j = 1; : : : ; m: (31)Let g denote a randomly generated number that is uniformly distributed between 0 and 1, then�l = `; if `�1Xj=1 hj(xi) � g < Xj=1 hj(xi): (32)6. Suppose that the `th unit is the winning unit. Then, (25)-(29) are employed to update itsparameters.7. Items 2 through 6 constitute an iteration. The end of an epoch is marked by performing N iter-ations. Insigni�cant variation in the values of �j ; j = 1; : : : ; m, evaluated over many iterationsindicates convergence. If convergence does not occur, more iterations are performed.At end of training, a pattern x is assigned to cluster Hj , if hj(x) � hi(x); i = 1; : : : ; m. Here, it shouldbe mentioned that using typical neural clustering schemes as the standard WTA7, which is the neuralimplementation of the popular k-means technique, some network units may never win and, therefore,never learn. Thus, these units do not contribute to clustering the data. This problem is referred to asthe unit under-utilization problem. By probabilistically choosing the winner, the PWTA overcomesthis problem without making any assumptions about the populations of the di�erent clusters. In thenext section, the performance of the PWTA is compared to that of the k-means in the application ofsynthetic-aperture-radar (SAR) imagery segmentation.4 SAR IMAGERY SEGMENTATIONSAR is one of the most important microwave-based sensors. Its main advantages are the productionof images of very high resolution and the possibility of all-weather operation. Recently, the problem ofautomatically interpreting collected SAR imagery has received increasing attention. Interesting resultshave been presented in the current literature.8�11 Here, in this section, the PWTA clustering schemeis utilized in the application of SAR imagery segmentation. The image used in our simulations isshown in Figure 1. It is an 85�512 airborne SAR image containing two ship targets in an open-oceanscene. One ship is moving, whereas the other is still. The goal is to successfully separate image pixelscorresponding to ship bodies and wakes from the ocean background. The successful separation of thesepixels is a crucial step before the detection of ships can take place. Poor segmentation makes the taskof the detector very hard, if not impossible. A PWTA network with 6 units was employed. This meansthat 6 clusters were initially assumed. The training set for the network was generated by sweepingthe SAR image using a 3� 3 window shifted on a pixel-by-pixel basis. At each window position, thepixels were row-ordered into a 9-dimensional vector. The PWTA network was iterated through the

Figure 1: Original SAR image.Figure 2: SAR image segmented into 6 clusters using a PWTA network.generated training set till convergence. After convergence, the center pixel at each window positionwas assigned to a cluster. The assignment of just one pixel rather than all of them was chosen toimprove the resolution of the generated clusters. The obtained results are shown in Figure 2. Thesimilarities between the distribution of cluster Hj; j = 1; : : : ; m, and the distributions of all otherclusters were then computed using the Bhattacharyya distance12Bji = 18(cj � ci)t��j +�i2 ��1 (cj � ci) + 12 ln�j�j +�i2 j=qj�jj j�ij� ; i = 1; : : : ; m; i 6= j: (33)The reason for utilizing the Bhattacharyya distance as a distribution similarity measure is the factthat relative to all other measures, it is a reasonable compromise between simplicity and accuracy.12For cluster Hj , the normalized average Bhattacharyya distance was computed usingBj = Pmk=1;k 6=j BjkPmi=1Pmk=1;k 6=iBik ; j = 1; : : : ; m: (34)Finally, if cluster Hj; j = 1; : : : ; m, had a normalized average Bhattacharyya distance signi�cantlyless than 1m , it was merged with the one that had the smallest Bhattacharyya distance to it. Thisresulted in the three clusters shown in Figure 3.The above processing steps were then repeated in case of using a neural network trained by thestandard WTA. As previously mentioned, the WTA is the neural implementation of the popular k-means clustering. The obtained results are given in Figures 4 and 5. As shown in Figures 2-5, thePWTA network separated the ship wakes and bodies from the ocean background. It is true that someocean pixels were incorrectly segmented, but with a priori information such as the expected ship size,these isolated pixels can be easily rejected in following processing steps. As for the WTA network, it

Figure 3: SAR image segmented into 3 clusters by merging the 6 clusters obtained using a PWTAnetwork.Figure 4: SAR image segmented into 6 clusters using the k-means scheme. Notice that the right-handship is incorrectly clustered, whereas the wakes of the other ship are poorly segmented.Figure 5: SAR image segmented into 3 clusters by merging the 6 clusters obtained using the k-meansscheme.

incorrectly segmented the motionless ship as well as some pixels of the body and wakes of the movingship. 5 CONCLUSIONThis paper has proposed using the PWTA clustering scheme in the application of image segmentation.Experimental results have been presented. From these results, it has been observed that the PWTAsigni�cantly outperforms the k-means when both are utilized in the segmentation of SAR imagery.Other experimental results, not mentioned here, have also con�rmed this observation. A possiblereason for it is that the k-means only adapts the means of the clusters, whereas the PWTA adaptsthe means as well as the variances of the clusters and, therefore, it gives a better representation of theinput distribution. Another possible reason is that the PWTA overcomes the unit under-utilizationproblem that may be encountered when the k-means scheme is employed. Since SAR images arecharacterized by small target signatures immersed into a huge amount of image data, the occurrenceof the unit under-utilization problem would result in poor segmentation performance.6 ACKNOWLEDGMENTThis work was supported in part by the Canadian Natural Science and Engineering Research Council(NSERC), under grant no. 177119, and in part by Loral Canada.7 REFERENCES1. K. S. Fu and J. K. Mui, "A survey on image segmentation," Pattern Recognition, vol. 13, pp.3-16, 1981.2. G. B. Coleman and H. C. Andrews, "Image segmentation by clustering," Proceedings of theIEEE, vol. 67, no. 5, pp. 773-785, 1979.3. R. O. Duda and P. E. Hart, Pattern Classi�cation and Scene Analysis, John Wiley and Sons,New York, 1973.4. H. Osman and M. M. Fahmy,"Probabilistic winner-take-all learning algorithms for radial-basis-function neural classi�ers," Neural Computation, vol. 6, no. 5, pp. 925-941, 1994.5. J. Spragins, "A note on the iterative application of Bayes' rule," IEEE Transactions on Infor-mation Theory, vol. 11, no. 4, pp. 544-549, 1965.6. A. K. Agrawala,"Learning with a probabilistic teacher," IEEE Transactions on InformationTheory, vol. 16, pp. 373-379, 1970.7. D. E. Rumelhart and D. Zipser, "Feature discovery by competitive learning," Cognitive Science,vol. 9, pp. 75-112, 1985.

8. G. Benelli, A. Garzelli, and A. Mecocci, "Complete processing system that uses fuzzy logic forship detection in SAR images," IEE Proc.-Radar, Sonar Navig., vol. 141, no. 4, pp. 181-186, 1994.9. M. R. Azimi-Sadjadi, S. Ghaloum, and R. Zoughi, "Terrain classi�cation in SAR images usingprincipal components analysis and neural networks," IEEE Transactions on Geoscience and RemoteSensing, vol. 31, no. 2, pp. 511-515, 1993.10. J. S. Lee and I. Jurkevich, "Segmentation of SAR images," in Proceedings of IGARSS '88Symposium, pp. 1503-1506, Edinburgh, Scotland, 1988.11. K. Eldhuset, "Automatic ship and ship wake detection in spaceborne SAR images from coastalregions," in Proceedings of IGARSS '88 Symposium, pp. 1529-1533, Edinburgh, Scotland, 1988.12. K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, 1990.

SAR imagery segmentation using probabilistic winner-take-all clustering

Documents