Top Banner
2532 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015 Spectral–Spatial Classification for Hyperspectral Data Using Rotation Forests With Local Feature Extraction and Markov Random Fields Junshi Xia, Student Member, IEEE, Jocelyn Chanussot, Fellow, IEEE, Peijun Du, Senior Member, IEEE, and Xiyan He Abstract—In this paper, we propose a new spectral–spatial classification strategy to enhance the classification performances obtained on hyperspectral images by integrating rotation forests and Markov random fields (MRFs). First, rotation forests are performed to obtain the class probabilities based on spectral information. Rotation forests create diverse base learners using feature extraction and subset features. The feature set is randomly divided into several disjoint subsets; then, feature extraction is performed separately on each subset, and a new set of linear extracted features is obtained. The base learner is trained with this set. An ensemble of classifiers is constructed by repeating these steps several times. The weak classifier of hyperspectral data, classification and regression tree (CART), is selected as the base classifier because it is unstable, fast, and sensitive to rotations of the axes. In this case, small changes in the training data of CART lead to a large change in the results, generating high diversity within the ensemble. Four feature extraction methods, including principal component analysis (PCA), neighborhood preserving embedding (NPE), linear local tangent space alignment (LLTSA), and linearity preserving projection (LPP), are used in rotation forests. Second, spatial contextual information, which is modeled by MRF prior, is used to refine the classification results obtained from the rotation forests by solving a maximum a posteriori problem using the α-expansion graph cuts optimization method. Experimental results, conducted on three hyperspectral data with different resolutions and different contexts, reveal that rotation forest ensembles are competitive with other strong supervised classification methods, such as support vector machines. Rotation forests with local feature extraction methods, including NPE, LLTSA, and LPP, can lead to higher classification accuracies than that achieved by PCA. With the help of MRF, the proposed algorithms can improve the classification accuracies significantly, Manuscript received January 17, 2014; revised June 28, 2014 and September 5, 2014; accepted September 18, 2014. This work was supported by the Natural Science Foundation of China under Grant 41171323, by Jiangsu Provincial Natural Science Foundation under Grant BK2012018, and by the National Key Scientific Instrument and Equipment Development Program under Grant 012YQ050250. (Corresponding author: Peijun Du.) J. Xia and X. He are with the GIPSA-Lab, Grenoble Institute of Tech- nology, 38400 Grenoble, France (e-mail: [email protected]; [email protected]). J. Chanussot is with the GIPSA-Lab, Grenoble Institute of Technology, 38400 Grenoble, France and also with the Faculty of Electrical and Com- puter Engineering, University of Iceland, 107 Reykjavik, Iceland (e-mail: [email protected]). P. Du is with the Key Laboratory for Satellite Mapping Technology and Applications of National Administration of Surveying, Mapping and Geoin- formation of China, Nanjing University, Nanjing 210023, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2014.2361618 confirming the importance of spatial contextual information in hyperspectral spectral–spatial classification. Index Terms—Feature extraction, hyperspectral image classifi- cation, Markov random fields (MRFs), rotation forests. I. I NTRODUCTION O VER the past 20 years, hyperspectral remote sensing images, which are characterized by a high spectral res- olution that results in hundreds or thousands of narrow spectral bands, have been more and more widely used for land cover mapping, urbanization, disaster management, and environmen- tal monitoring [1]–[5]. One of the most important tasks for the analysis of hyperspectral data is the design of competitive supervised classification algorithms, assigning one class label to each pixel after some training procedures. Hyperspectral sensors acquire spectral information in a continuous fashion, providing a high discrimination capacity between different land cover classes. However, the high dimensionality of hyperspec- tral images introduces challenging methodological problems in the context of supervised classification. This is the well- known curse of dimensionality, also referred to as the Hughes phenomenon [6]. In order to tackle this problem, several significant efforts, such as kernel-based methods and feature extraction/selection algorithms, have been developed in recent years [7]–[15]. Classifier ensemble or multiple classifier systems (MCSs) are a suitable alternative approach to improve the classification performances [16]–[20]. Since the generalization ability of the ensemble could be significantly better than that of an individual classifier, MCSs have been a hot topic during the past years, and many ensemble algorithms have been developed [21]–[24]. In general, an MCS is constructed in two steps, i.e., generat- ing multiple classifiers and then combining their predictions [21]–[24]. In order to construct a strong ensemble, the base learners should be with high accuracy as well as high diversity [25]–[27]. Kuncheva [22] summarized four fundamental ap- proaches for building ensembles of diverse classifiers: 1) using different combination schemes; 2) using different base classi- fiers; 3) using different feature subsets; and 4) using different training subsets. We are more interested in the last two ap- proaches. Classifier ensembles are constructed by manipulating the data (including features and training set). Random subspace [28] belongs to the third category. Bagging [29] and boosting [30] belong to the fourth category. 0196-2892 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
15

Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

Jun 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

2532 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015

Spectral–Spatial Classification for HyperspectralData Using Rotation Forests With Local Feature

Extraction and Markov Random FieldsJunshi Xia, Student Member, IEEE, Jocelyn Chanussot, Fellow, IEEE,

Peijun Du, Senior Member, IEEE, and Xiyan He

Abstract—In this paper, we propose a new spectral–spatialclassification strategy to enhance the classification performancesobtained on hyperspectral images by integrating rotation forestsand Markov random fields (MRFs). First, rotation forests areperformed to obtain the class probabilities based on spectralinformation. Rotation forests create diverse base learners usingfeature extraction and subset features. The feature set is randomlydivided into several disjoint subsets; then, feature extraction isperformed separately on each subset, and a new set of linearextracted features is obtained. The base learner is trained withthis set. An ensemble of classifiers is constructed by repeatingthese steps several times. The weak classifier of hyperspectral data,classification and regression tree (CART), is selected as the baseclassifier because it is unstable, fast, and sensitive to rotations ofthe axes. In this case, small changes in the training data of CARTlead to a large change in the results, generating high diversitywithin the ensemble. Four feature extraction methods, includingprincipal component analysis (PCA), neighborhood preservingembedding (NPE), linear local tangent space alignment (LLTSA),and linearity preserving projection (LPP), are used in rotationforests. Second, spatial contextual information, which is modeledby MRF prior, is used to refine the classification results obtainedfrom the rotation forests by solving a maximum a posterioriproblem using the α-expansion graph cuts optimization method.Experimental results, conducted on three hyperspectral data withdifferent resolutions and different contexts, reveal that rotationforest ensembles are competitive with other strong supervisedclassification methods, such as support vector machines. Rotationforests with local feature extraction methods, including NPE,LLTSA, and LPP, can lead to higher classification accuraciesthan that achieved by PCA. With the help of MRF, the proposedalgorithms can improve the classification accuracies significantly,

Manuscript received January 17, 2014; revised June 28, 2014 and September5, 2014; accepted September 18, 2014. This work was supported by the NaturalScience Foundation of China under Grant 41171323, by Jiangsu ProvincialNatural Science Foundation under Grant BK2012018, and by the NationalKey Scientific Instrument and Equipment Development Program under Grant012YQ050250. (Corresponding author: Peijun Du.)

J. Xia and X. He are with the GIPSA-Lab, Grenoble Institute of Tech-nology, 38400 Grenoble, France (e-mail: [email protected];[email protected]).

J. Chanussot is with the GIPSA-Lab, Grenoble Institute of Technology,38400 Grenoble, France and also with the Faculty of Electrical and Com-puter Engineering, University of Iceland, 107 Reykjavik, Iceland (e-mail:[email protected]).

P. Du is with the Key Laboratory for Satellite Mapping Technology andApplications of National Administration of Surveying, Mapping and Geoin-formation of China, Nanjing University, Nanjing 210023, China (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TGRS.2014.2361618

confirming the importance of spatial contextual information inhyperspectral spectral–spatial classification.

Index Terms—Feature extraction, hyperspectral image classifi-cation, Markov random fields (MRFs), rotation forests.

I. INTRODUCTION

OVER the past 20 years, hyperspectral remote sensingimages, which are characterized by a high spectral res-

olution that results in hundreds or thousands of narrow spectralbands, have been more and more widely used for land covermapping, urbanization, disaster management, and environmen-tal monitoring [1]–[5]. One of the most important tasks forthe analysis of hyperspectral data is the design of competitivesupervised classification algorithms, assigning one class labelto each pixel after some training procedures. Hyperspectralsensors acquire spectral information in a continuous fashion,providing a high discrimination capacity between different landcover classes. However, the high dimensionality of hyperspec-tral images introduces challenging methodological problemsin the context of supervised classification. This is the well-known curse of dimensionality, also referred to as the Hughesphenomenon [6]. In order to tackle this problem, severalsignificant efforts, such as kernel-based methods and featureextraction/selection algorithms, have been developed in recentyears [7]–[15].

Classifier ensemble or multiple classifier systems (MCSs)are a suitable alternative approach to improve the classificationperformances [16]–[20]. Since the generalization ability of theensemble could be significantly better than that of an individualclassifier, MCSs have been a hot topic during the past years,and many ensemble algorithms have been developed [21]–[24].In general, an MCS is constructed in two steps, i.e., generat-ing multiple classifiers and then combining their predictions[21]–[24]. In order to construct a strong ensemble, the baselearners should be with high accuracy as well as high diversity[25]–[27]. Kuncheva [22] summarized four fundamental ap-proaches for building ensembles of diverse classifiers: 1) usingdifferent combination schemes; 2) using different base classi-fiers; 3) using different feature subsets; and 4) using differenttraining subsets. We are more interested in the last two ap-proaches. Classifier ensembles are constructed by manipulatingthe data (including features and training set). Random subspace[28] belongs to the third category. Bagging [29] and boosting[30] belong to the fourth category.

0196-2892 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

XIA et al.: SPECTRAL–SPATIAL CLASSIFICATION FOR HYPERSPECTRAL DATA USING ROTATION FORESTS 2533

Many ensemble techniques for classifying remote sensingimages are reported in [31] and [32]. Random forest is the mostintuitive ensemble learning technique for the classification ofhigh-dimensional data [16], [33]–[36]. In random forest, eachtree is trained on a bootstrapped sample of the original data set,and only a randomly chosen subset of the features is consideredfor splitting a leaf in a decision tree. The main advantages ofrandom forest are that the computational complexity can bereduced and the correlations between the trees are decreased[16], [33]–[36]. Rodriguez et al. proposed a new ensembleclassifier, namely, rotation forest [37]. This method uses princi-pal component analysis (PCA) to generate the rotation featurespace for the training samples so as to promote the diversity. Inorder to preserve the variability information and to encourageindividual accuracy, all of the principal components are kept[37]. In [20], we applied rotation forest to classify hyperspectralimages and found that its performances are better than bagging,AdaBoost, random subspace, and random forest.

It has been demonstrated that spatial information is a crucialcomponent for the analysis of remote sensing images [38]–[40].The classification result could be very noisy if spatial informa-tion is not taken into account. In order to increase classificationaccuracy and regularize the classification maps, it is criticalto combine spectral information with spatial information inthe process. Tarabalka et al. [38]–[40] proposed a group ofspectral–spatial classification methods to combine a pixelwiseclassification result with a segmentation map. The segmentationmaps can be obtained from partitional clustering, minimumspanning forest, and watershed transformation techniques. Re-cently, Markov random fields (MRFs) have become a populartool to exploit the spatial information in the classificationof hyperspectral data. MRF is a probabilistic model that isused to integrate spatial information into image classification.Tarabalka et al. [41] used the MRF model as a postprocessingscheme to a probability support vector machine (SVM) classi-fication map. The classification process is solved by metropolisalgorithm based on stochastic relaxation and annealing. Li et al.[11] combined the class posterior probabilities and spatial infor-mation into a combinatorial optimization problem and solvedthis problem by graph cuts algorithms. The class posteriorprobabilities are produced by a sparse multinomial logisticregression classifier, and the spatial information is representedby MRF-based multilevel logistic (MLL) prior.

In this paper, we develop new spectral–spatial classifiers,which contain two essential components, namely, rotationforests for the pixelwise classification and MRF for the spa-tial regularization, respectively. In particular, rotation forests,which create a sparse projection matrix using feature extractionand randomly selected subsets of the original features, areused to estimate the class probabilities. Then, spatial contextualinformation achieved by MRF is used to refine the classificationresults obtained from rotation forest classifiers. Finally, theoutput is produced by solving a maximum a posteriori (MAP)problem using the α-expansion graph cuts optimization algo-rithm. The main contribution of the proposed work is to intro-duce three manifold learning local feature extraction methods,including neighborhood preserving embedding (NPE), linearlocal tangent space alignment (LLTSA), and linearity preserv-

ing projection (LPP), into the rotation forests. The results ofrotation forests with three local feature extraction techniquesare further refined with the help of spatial contextual informa-tion, which will be shown to provide a good characterizationof the content of hyperspectral data. Experimental results arepresented on three hyperspectral airborne images recorded bythe Airborne Visible/Infrared Imaging Spectrometer (AVIRIS),Reflective Optics System Imaging Spectrometer (ROSIS), andDigital Airborne Imaging Spectrometer (DAIS), respectively.They have different spatial/spectral resolutions and correspondto different contexts, hence demonstrating the robustness of theconclusion.

The remainder of this paper is organized as follows.Section II reviews some related works about MCSs andspectral–spatial classification techniques. Section III presentsthe proposed models, including rotation forests and MRF.Section IV describes the experimental results with analysis.Section V provides the summary and suggestions for futurelines of research.

II. RELATED WORKS

A. MCSs

During the last decade, MCSs have become an attractivefield of research, both for methodological development andfor practical applications. As previously mentioned, diversityplays an important role in MCS design [21]–[24]. Therefore,a successful MCS should combine individual classifiers withhigh diversity among the ensemble members [25]–[27].

For remote sensing scenarios, many researchers have inves-tigated the classification performances using different MCSapproaches [31], [32]. The pioneer work of MCSs for remotesensing image classification, to the best of our knowledge, canbe traced back to the 1980s. Since then, this topic has beenextensively explored, and it is still under investigation (Table I).

Table II lists the valuable studies of MCSs used to differenttypes of remote sensing images (since 2000). For instance,Smits [42], Briem et al. [43], and Gislason et al. [34] applieddynamic classifier selection, bagging, boosting, and randomforest to classify multisource remote sensing data, respectively.Lawrence et al. [44], Kawaguchi and Nishii [17], Chan andPaelinckx [35], and Rodriguez-Galiano et al. [36] used boostingand random forest for the classification of multitemporal andhyperspectral remote sensing images.

From Table III, the most popular MCSs for hyperspectralimages are the random forest. Waske et al. [45] combinedrandom feature selection and SVM classifier for the classi-fication of hyperspectral images. Yang et al. [18] proposedthe dynamic subspace method to improve the random sub-space method on automatically determining dimensionalityand selecting component dimensions for a various subspace.Du et al. [46] produced diverse classifiers using different fea-ture extraction methods and then combined their results usingevidence theory and linear consensus algorithms. Recently,we have used rotation forest to classify hyperspectral remotesensing images [20]. In a comparison to random forest, rotationforest uses feature extraction and subset features to promote

Page 3: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

2534 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015

TABLE ILIST OF ABBREVIATIONS USED IN THIS PAPER

TABLE IILIST OF NOTATIONS USED IN THIS PAPER

both the diversity and the accuracy of individual classifiers.Therefore, it leads to better classification performance thanrandom forest.

B. Spectral–Spatial Classification

Spatial information is critical for the classification of hyper-spectral images, especially in the case of high-spatial-resolutionimages. If spatial information is not considered in the classifi-cation process, the produced thematic map usually looks verynoisy. Many studies have been carried out on spectral–spatialclassification, which explore simultaneously spectral and spa-tial information. Spectral–spatial classification algorithmscan be divided into several groups, which are detailed inTable IV [52].

III. PROPOSED METHOD

Let x = {x1, . . . ,xN} ∈ RN×D denote an image of

D-dimensional feature vectors, let y = {y1, . . . , yN} be an

image of a pixel, and let {X,Y} = {(x1, y1), . . . , (xn, yn)} bea total number of n training samples.

The proposed methods based on rotation forests and MRF,which are depicted in Fig. 1, are composed of the three mainsteps as follows:

1) supervised pixelwise classification using rotation forests;2) spatial information extraction using MRF;3) spatial–spectral classification by solving a MAP problem

computed by the α-expansion graph cuts optimizationalgorithm.

A. Rotation Forest

In this paper, rotation forests are used for the pixelwiseclassification of the hyperspectral data. They construct differentversions of the training set by using the following steps: 1) Thefeature set is divided into several disjoint subsets on which theoriginal training set is projected; 2) a rotation sparse matrixRa

i is constructed by performing feature extraction on each

Page 4: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

XIA et al.: SPECTRAL–SPATIAL CLASSIFICATION FOR HYPERSPECTRAL DATA USING ROTATION FORESTS 2535

TABLE IIISTUDIES OF REMOTE SENSING IMAGE CLASSIFICATION USING MCS APPROACHES PUBLISHED IN JOURNALS SINCE 2000

TABLE IVSUMMARY OF SPECTRAL–SPATIAL APPROACHES APPLIED IN HYPERSPECTRAL IMAGE CLASSIFICATION

Fig. 1. Flowchart of the spectral–spatial approaches using rotation forests and MRF.

Page 5: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

2536 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015

subset with the bootstrapped samples corresponding to 75%of the initial training samples; 3) a classifier is built on thefeatures projected by Ra

i ; and 4) the final result is obtained bycombining the outputs of the multiple classifiers, repeating thefirst three steps several times. The details of rotation forests areshown in Algorithm 1 and Fig. 2.

Algorithm 1 Rotation Forest

Input: {X,Y}={(xi, yi), . . . , (xn, yn)}: training samples,T : number of classifier, K: number of subsets (M :number of features in each subset), base classifier Γ

1. For i = 1: T2. randomly split the features F into K subsets3. For j = 1: K4. select the corresponding features of Fi,j to

compose a training set Xi,j

5. select a new training set X′i,j using bootstrap

algorithm, whose size is 75% of the original size6. transform X′

i,j by a certain feature extraction

method to get the coefficients v(1)i,j , . . . , v(Mj)i,j

7. End for8. sparse matrix Ri is constructed with the above

coefficients

Ri=

⎡⎢⎢⎢⎢⎣

v(1)i,1 , . . . , v

(M1)i,1 0 · · · 0

0 v(1)i,2 , . . . , v

(M2)i,2 · · · 0

......

. . ....

0 0 · · · v(1)i,j , . . . , v(MK)i,j

⎤⎥⎥⎥⎥⎦

9. rearrange Ri to Rai with respect to the original

feature set10. obtain the new training samples {XRa

i ,Y}11. build the ith classifier Γi using {XRa

i ,Y}12. End for

Output: The probability of xi for each class is calculated bythe average combination method:

p(yi|xi) =1

T

T∑j=1

p(yi|xiR

aj

)

The excellent performances of rotation forests can be at-tributed to simultaneous improvements in two aspects. One isto promote the diversity within the ensemble by the use offeature extraction on training data and the use of the decisiontree, known to be sensitive to variations in the training data.The other is to improve the accuracies of the base classifiers bykeeping all extracted features in the training data. It is crucialto notice step 5 in Algorithm 1; the objective of selectingsubsamples is, on the one hand, to avoid obtaining the samecoefficients of the transformed components if the same featuresare selected and, on the other hand, to enhance the diversityamong the member classifiers.

The two important issues for rotation forests are the baseclassifier and the feature extraction method. Classification andregression tree (CART) is adopted as the base classifier in this

study because it is unstable, sensitive to the rotations of theaxes, and fast [62]. In this case, small changes in the trainingdata of CART lead to a large change in the results, generatinghigh diversity within the ensemble. The main idea of CART isto choose the best split that makes the data in each child nodeas pure as possible. The Gini index is used to select the bestsplit in this paper [62]. It should be pointed out that we do notemploy feature extraction for dimensionality reduction but forrotation of the axes while keeping all of the dimensions.

In linear feature extraction, we often assume that there existsa mapping matrix v, which can transform each data point xi

to zi. NPE [63] aims at preserving the local neighborhoodstructure on the data manifold. For this purpose, each data pointcan be presented as a linear combination of its τ neighborhoods.

Moreover, the combination weights can be computed byminimizing the following objective function:

min∑i

∥∥∥∥∥∥xi−∑j

Wijxj

∥∥∥∥∥∥ ,∑j

Wij=1, j=1, 2, . . . , τ (1)

where W is a weight matrix with elements Wij having theweight of the edge from node i to node j. The projectionmatrix v is given by the minimum eigenvector solution to thegeneralized eigenvalue problem

X(I−W)�(I−W)X�v = λXX�v. (2)

LLTSA uses the tangent space in the neighborhood of a datapoint to represent the local geometry and then aligns those localtangent spaces in the low-dimensional space which is linearlymapped from high-dimensional space [64]. More precisely,LLTSA defines a neighborhood graph on the data and estimatesthe local tangent space Θi at each data point xi. Subsequently,it forms the alignment matrix B by performing the summationas follows:

Bνiνi= Bνiνi

+ Jτ(I − ViV

�i

)Jτ (3)

where the entries of the alignment matrix B are obtainedby iterative summation (for all matrices Vi and starting fromBij = 0 for ∀ ij), Jτ is the centering matrix of size τ , and νiis a selection matrix that contains the indices of the nearestneighbors of data point xi. The cost function is minimized ina linear manner by solving the generalized eigenvalue problem

XBX�v = λXX�v. (4)

LPP is a technique that aims at combining the benefitsof linear techniques and local nonlinear method for featureextraction by finding a linear mapping that minimizes the costfunction [65] ∑

ij

(zi − zj)2Wij . (5)

In the aforementioned cost function, large weights Wij corre-spond to small distances between the data points xi and xj . LPPstarts with the construction of a nearest graph in which eachpoint xi is connected to its τ nearest neighbors. The weights ofthe edges in the graph are computed as follows:

Wij = e‖xi−xj‖2

2σ2 . (6)

Page 6: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

XIA et al.: SPECTRAL–SPATIAL CLASSIFICATION FOR HYPERSPECTRAL DATA USING ROTATION FORESTS 2537

Fig. 2. Illustration of the rotation forest.

Subsequently, LPP solves the generalized eigenvalue problem

XLX�v = λXQX�v (7)

in which Q is the diagonal matrix whose entries are column,Qii =

∑j Wji, and L = Q−W is the Laplacian matrix.

B. MRFs

Rotation forests are supervised classifiers that only focuson spectral information, without considering any spatial cor-rection, which may consequently lead to a low accuracy ofclassification. In this paper, we integrate the contextual infor-mation with spectral information by using an isotropic mul-tilevel logistic model (MLL) prior to modeling the image ofthe class label. This approach exploits the fact that, in real-world images, it is often likely that spatially neighboring pixels

belong to the same class. This prior, which belongs to the MRFclass, encourages piecewise smooth segmentation and promotessolutions in which adjacent pixels are likely to belong to thesame class [41], [55], [66].

MRF assumes that any pixel is independent to others outsideits defined neighborhood. The Hammersely–Clifford theoremshows that a random field is an MRF only if it follows a Gibbsdistribution [67]. Therefore, the prior model has the followingstructure:

p(y) =1

Ze

(−

∑(i,j)∈ς

Vς(y)

)(8)

where (i, j) ∈ ς denotes that pixel xi and pixel xj areconnected.

Page 7: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

2538 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015

The MLL has been a popular spatial contextual model, inwhich the clique energy is defined as [55], [68]

−Vς(yi, yj) ={μ if yi �= yj0 otherwise

(9)

where μ is a positive number that controls the degree ofsmoothness. Thus, (8) can be rewritten as follows:

p(y) =1

Ze

μ∑

(i,j)∈ς

δ(yi−yj)

(10)

where δ(·) is the Dirac unit impulse function.

C. MAP Labeling

The image classification task can then be formulatedas a MAP problem, for which maximizing the posteriorp(y|x) gives a solution, which is equivalent to maximizingp(x|y)p(y). It is possible to impose spatial contextual con-straints by modeling p(y) with an MRF [55], [68]. Assum-ing conditional independence of the features given the labels,p(x|y) can be formulated as [55]

p(x|y) =N∏i=1

p(xi|yi) (11)

=

N∏i=1

p(yi|xi)

p(yi). (12)

Then, the posterior distribution can be rewritten as follows:

p(y|x) ∝ p(x|y)p(y) (13)

∝N∏i=1

p(xi|yi)p(y) (14)

∝N∏i=1

p(yi|xi)

p(yi)p(y). (15)

In the proposed model, the densities p(yi) are assumed to beequally distributed. The MAP problem can then be defined asfollows:

argmax{∑

log p(yi|xi)+log p(y)}

= argmin{∑

− log p(yi|xi)− log p(y)}

(16)

= argmin

⎧⎨⎩∑

− log p(yi|xi)− μ∑

(i,j)∈ςδ(yi − yj)

⎫⎬⎭ .

(17)

An important issue of MRF-based approaches is to computethe global minimum of the objective function. MRF-basedobjective functions such as that in the aforementioned equationsare highly nonconvex. The existence of local minima causesconsiderable difficulties in finding the global minima in anintractably vast search space.

Various combinatorial optimization methods have been pro-posed to solve this problem. In this paper, we resort to theα-expansion graph-cut-based algorithm. This method yieldsgood approximations to the MAP segmentation and is quiteefficient from a computational viewpoint, with practical com-putational complexity [55], [68]. It should be noticed that theα-expansion graph-cut-based algorithm can exactly solve thebinary class problem. Since more than two different classesare present, a multiclass problem cannot be solved exactly, butan approximate solution within a known factor of the optimalsolution is found.

IV. EXPERIMENTAL RESULTS AND ANALYSIS

In this section, three real hyperspectral data with severaldifferent characteristics (sensors, areas, dimensions, and spa-tial resolutions) were used for the experiments. The detaileddescriptions of the three hyperspectral images and the corre-sponding results and analysis are shown in the next sections.

A. Indian Pines AVIRIS Image

The first hyperspectral image was recorded by the AVIRISsensor over the Indian Pines in Northwestern Indiana, USA.The image is composed of 145 × 145 pixels with a spatialresolution of 20 m/pixel. This image is a classical benchmarkto validate the accuracy of hyperspectral image classificationapproaches and constitutes a challenging problem due to thesignificant presence of mixed pixels in all available classes andthe unbalanced class distribution. Three-band color compositeand ground truth of the AVIRIS image can be seen in Fig. 3.

In the first experiment, we investigated the performances ofrotation forests with local feature extraction and MRF usingdifferent numbers of training samples. In this experiment, Mand T are fixed to be 10, μ is fixed to be 4, and τ is fixedto be 12. Table V shows the average of overall accuraciesobtained from the proposed methods using different numbersof training samples. The standard deviations of the proposedmethods are also given in the table. As can be seen in Table V,the overall performances of RoF-NPE, RoF-LLTSA, and RoF-LPP are better than those of RoF-PCA. With the help of spatialcontextual information, the combination of rotation forests andMRF significantly outperforms the rotation forests, which usethe spectral information only. Moreover, it is clear that rotationforests with local feature extraction methods have more stableperformances than RoF-PCA in most cases.

In the second experiment, the dependence of the classifi-cation accuracies on different parameters is studied. In theproposed model, there are four parameters: ensemble size (T ),number of features in the subset (M), number of neighbors(τ) considered by the local feature extraction methods, andthe regularization parameter (μ). Following our previous study[20], the value T = 10 is recommended. A larger size of Tin the rotation forests has nonsignificant effects on the overallaccuracy while increasing the computation time. Thus, T isfixed to be 10 in all experiments. The regularization parameter(μ) is empirically set to be 4.

Page 8: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

XIA et al.: SPECTRAL–SPATIAL CLASSIFICATION FOR HYPERSPECTRAL DATA USING ROTATION FORESTS 2539

Fig. 3. (a) Three-band color composite of the AVIRIS image. (b) Ground truth: corn-no till, corn-min till, corn, soybean-no till, soybeans-min till, soybeans-cleantill, alfalfa, grass/pasture, grass/trees, grass/pasture-mowed, hay-windrowed, oats, wheat, woods, bldg-grass-tree-drives, and stone-steel towers.

TABLE VAVERAGE OF OVERALL ACCURACIES OBTAINED FROM THE PROPOSED METHODS USING DIFFERENT NUMBERS OF TRAINING SAMPLES

(TEN MONTE CARLO RUNS) FOR THE INDIAN PINES AVIRIS IMAGE. THE NUMBERS OF TRAINING SAMPLES PER CLASS

AND THE TOTAL NUMBERS OF TRAINING SAMPLES (IN BRACKETS) ARE ALSO GIVEN

TABLE VIAVERAGE OF OVERALL ACCURACIES OBTAINED FROM THE PROPOSED

ROTATION FORESTS AND MRF USING DIFFERENT VALUES OF M(INDIAN PINES AVIRIS IMAGE)

The impacts of the different values of M on the classificationperformances are presented in Table VI. The number of trainingsamples per class is 20, and the total number of trainingsamples is 320. τ is fixed to be 12. The classification resultsare significantly improved when larger values of M are used.The main reason is that an insufficient number of featuresin a subset (low values of M ) could not provide a reliablesparse rotation matrix based on the aforementioned featureextraction methods, resulting in a decrease of the classificationperformance. Table VII shows the impact of the different values

TABLE VIIAVERAGE OF OVERALL ACCURACIES OBTAINED FROM THE PROPOSED

ROTATION FORESTS AND MRF USING DIFFERENT VALUES OF τ(INDIAN PINES AVIRIS IMAGE)

of τ on classification accuracies. From Table VII, it is found thatthe classification performances indeed depend on the settingsof τ . The optimal values of τ are between 8 and 12.

In order to compare the class-specific accuracies and overallaccuracies precisely, we have chosen 30 pixels per class fromthe available ground truth (a total size of 423 pixels) as thetraining set. We also used the two standard classifiers for com-prehensive comparisons: SVM and logistic regression via vari-able splitting and augmented Lagrangian algorithm (LORSAL),respectively [69]. Global and class-specific accuracies achievedby all of the compared algorithms are listed in Table VIII. Asshown in Table VIII, RoF-PCA leads to an OA of 72.11%.By introducing local structure in feature extraction, RoF-NPE,RoF-LLTSA, and RoF-LPP can obtain more accurate results

Page 9: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

2540 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015

TABLE VIIIOVERALL, AVERAGE, AND CLASS-SPECIFIC ACCURACIES OBTAINED FOR THE INDIAN PINES AVIRIS IMAGE

Fig. 4. Classification results of the Indian Pines AVIRIS image. (a) RoF-PCA. (b) RoF-NPE. (c) RoF-LLTSA. (d) RoF-LPP. (e) RoF-PCA-MRF. (f) RoF-NPE-MRF. (g) RoF-LLTSA-MRF. (h) RoF-LPP-MRF.

than RoF-PCA, with OAs of 72.18%, 72.89%, and 73.01%,respectively. LORSAL leads to the best overall accuraciesamong the pixelwise classification results. RoF-LLTSA andRoF-LPP are superior to SVM. From Table VIII, it can beseen that, by exploiting the spatial contextual information,the classification results are significantly improved comparedto the results obtained by only spectral information, indicat-ing the importance of spatial information. Rotation forestswith feature extraction methods did not provide more accurateresults than LORSAL, but rotation-forest-based MRF can actu-ally outperform LORSAL-MRF. The main reason is that rota-

tion forests produce very reliable class posterior probabilitieswith the help of multiple decision trees. Therefore, the finalspectral–spatial classification results derived from the obtainedclass posterior probabilities can greatly enhance the classifica-tion performances. The best OA and AA are obtained by RoF-LLTSA-MRF. In this case, the average accuracy is improved by11.7 percentage points when compared to the pixelwise clas-sification achieved by RoF-LLTSA. However, the use of otherfeature extraction algorithms also leads to high accuracies.

Fig. 4 shows the classification maps produced by differentclassification methods when applied to the Indian Pines AVIRIS

Page 10: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

XIA et al.: SPECTRAL–SPATIAL CLASSIFICATION FOR HYPERSPECTRAL DATA USING ROTATION FORESTS 2541

Fig. 5. (a) Three-band color composite of the University of Pavia image.(b) Reference map: asphalt, meadows, gravel, trees, metal sheets, bare soil,bitumen, bricks, and shadow.

TABLE IXAVERAGE OF OVERALL ACCURACIES OBTAINED FROM THE PROPOSED

APPROACHES USING DIFFERENT NUMBERS OF TRAINING SAMPLES FOR

THE UNIVERSITY OF PAVIA ROSIS IMAGE. THE NUMBER OF TRAINING

SAMPLES PER CLASS AND THE TOTAL NUMBER OF TRAINING

SAMPLES (IN BRACKETS) ARE ALSO GIVEN

image. Without MLL prior, the classification maps of the rota-tion forests look noisy due to the existence of mixed pixels.Spatial–spectral classification maps based on the combinationof rotation forests and MRF provide very smooth results withthe help of MLL prior.

B. University of Pavia ROSIS Image

The second experiment was carried out on the University ofPavia image of an urban area, acquired by the ROSIS-03 opticalsensor. Nine land cover classes were considered for classifi-cation. The original image is composed of 610 × 340 pixels,with a spatial resolution of 1.3 m/pixel and 103 spectral bands.Fig. 5 shows the three-band color composite and reference mapof the University of Pavia image.

The average accuracies over ten independent runs and thecorresponding standard deviations of the proposed methodsusing different numbers of training samples are featured inTable IX. The sensitivity analysis of parameters, M and τ ,are presented in Tables X and XI, respectively. Experimentalresults reveal a number of interesting facts: 1) Rotation forestswith local feature extraction methods (e.g., NPE) provide betterand more stable performances than RoF-PCA; 2) the combi-

TABLE XAVERAGE OF OVERALL ACCURACIES OBTAINED FROM THE PROPOSED

APPROACHES USING DIFFERENT VALUES OF M (UNIVERSITY OF PAVIA

ROSIS IMAGE). THE NUMBER OF TRAINING SAMPLES PER CLASS

IS 20, AND THE TOTAL NUMBER OF TRAINING SAMPLES IS 180

TABLE XIAVERAGE OF OVERALL ACCURACIES OBTAINED FROM THE PROPOSED

METHODS USING DIFFERENT VALUES OF τ (UNIVERSITY OF PAVIA

ROSIS IMAGE). THE NUMBER OF TRAINING SAMPLES PER CLASS

IS 20, AND THE TOTAL NUMBER OF TRAINING SAMPLES IS 180

nation of rotation forests and MRF significantly improves theclassification accuracy when compared to the rotation forests;3) classification accuracies of the rotation forests decrease whenthe value of M increase, but the classification performances ofthe combination of rotation forest and MRF are improved; and4) being consistent with the experiments of the AVIRIS image,the optimal values of τ are between 8 and 12.

Furthermore, we test the classification performances of theproposed spectral–spatial classification algorithms using thewhole training set. Table XII summarizes the overall accura-cies, average accuracies, and class-specific accuracies. Fig. 6presents the classification maps. We also list the classificationresults of the pixelwise classifiers (SVM and LORSAL) andthe following spectral–spatial classifier (LORSAL-MRF) inTable XII. The results of SVM and LORSAL (LORSAL-MRF)reported in the table are taken from [70] and [69], respectively.As can be seen in Table XII, the OAs of the four rotation forestsare all higher than those of SVM and LORSAL. The globaland most of the class-specific accuracies (except the classGravel) increase owing to the proposed methods. The classGravel is wrongly classified and confused with the similarclass Bricks. Therefore, when we apply MRF on a pixelwiseclassification result, even more pixels of the class Gravel arewrongly assigned to the class Bricks. That leads to a loweraccuracy than that of the classifier using spectral informationonly. The best global accuracies are achieved by RoF-LPP-MRF. The corresponding classification map is significantly

Page 11: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

2542 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015

TABLE XIIOVERALL, AVERAGE, AND CLASS-SPECIFIC ACCURACIES OBTAINED FOR THE UNIVERSITY OF PAVIA ROSIS IMAGE

Fig. 6. Classification results of the University of Pavia ROSIS image. (a) RoF-PCA. (b) RoF-NPE. (c) RoF-LLTSA. (d) RoF-LPP. (e) RoF-PCA-MRF.(f) RoF-NPE-MRF. (g) RoF-LLTSA-MRF. (h) RoF-LPP-MRF.

Page 12: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

XIA et al.: SPECTRAL–SPATIAL CLASSIFICATION FOR HYPERSPECTRAL DATA USING ROTATION FORESTS 2543

Fig. 7. Impact of OA using different numbers of M obtained for the ROSIS image. (a) Rotation forests. (b) Rotation forests and MRF.

TABLE XIIIOVERALL, AVERAGE, AND CLASS-SPECIFIC ACCURACIES OBTAINED FROM THE PROPOSED APPROACHES (PAVIA CENTER DAIS IMAGE)

more accurate than any other classification map, according tothe results of the McNemar’s test. In this case, the overall andaverage accuracies are improved by 7.39 and 4.42 percentagepoints, respectively, compared to RoF-LPP. The use of LPPfeature extraction also leads to the highest accuracies for mostof the classes (five out of nine). The OAs of the proposed fourschemes are all higher than standard spectral–spatial classifiers,such as SVM-Watershed segmentation [40], SVMMSF [39],SVMMRF-NE [41], SVMMRF-E [41], and LORSAL-MRF[69]. Corresponding results are not listed here but can be foundin the original reference.

The sensitivity to the number of features in a subset Mis explored using the whole training set. Fig. 7 plots theOA as a function of M for both the proposed pixelwise andspectral–spatial classifiers. Fig. 7 indicates that rotation-forest-based algorithms achieved the highest performances when Mis set to be 10. That is because, when a smaller value of Mis used, the diversity within the ensemble increases. High di-versity within the ensemble often leads to high accuracies. Thespectral–spatial classification results reported in Fig. 7(e)–(h)are more accurate than those of rotation forests presented in

Fig. 7(a)–(d), further demonstrating the importance of spatialinformation.

C. Pavia Center DAIS Image

This image was acquired by the DAIS sensor at 1500-mflight altitude over the city of Pavia, Italy. The image has a sizeof 400 × 400 pixels, with a ground resolution of 5 m. Here,80 data channels recorded by this spectrometer were usedfor this experiment. Nine land cover classes of interest areconsidered, which are detailed in Table XIII, with a number oflabeled samples for each class.

Rotation forest classifiers with PCA, NPE, LLTSA, andLPP feature extraction techniques are performed on the PaviaCenter DAIS image using the whole training set. Table XIIIgives the classification accuracies. The OAs and AAs of fourrotation forests are all higher than those of SVM and LORSAL.MRF regularization with μ = 4 was performed on the pixelwiseclassification result derived from rotation forest ensembles. InTable XIII, the results for the proposed methods with differentfeature extraction algorithms are presented. From Table XIII,

Page 13: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

2544 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015

it can be seen that rotation forests with four feature extractionmethods achieved excellent global accuracies. Again, the MRFmethods perform better than the spectral-based approaches.Among them, the use of LLTSA achieves the best perfor-mances. It is consistent with the characteristic of LLTSA, whichindicates that LLTSA can provide more local information thanPCA. In terms of class-specific accuracies, the main improve-ment is achieved by the class Shadows. The other classes areclassified equally accurately.

V. CONCLUSION

The large number of spectral channels in a hyperspectral im-age increases the potential of discriminating different materialsand structures in a scene. However, the huge volume of hyper-spectral data often leads to challenges in image analysis. Thesuccess of hyperspectral remote sensing image classificationdoes not only depend on the high-precision pixelwise classifierbut also needs the incorporation of the spatial information intothe classifier.

In this paper, we have developed new spectral–spatial clas-sification methods, suited for hyperspectral remote sensingimage. Rotation forest is applied as the spectral classifier forhyperspectral data. Different feature extraction methods havebeen investigated for the construction of rotation forests. Itis shown that, with the help of local information obtainedby NPE, LLTSA, and LPP, classification accuracies can beimproved. Furthermore, rotation forests with spatial contextualinformation using MRF were then proposed. This strategy canfurther significantly improve the performances. The proposedclassification methodology succeeded in taking advantage ofthe spatial and spectral information simultaneously. The sen-sitivity of the parameters in the proposed methods was alsoinvestigated.

Future studies will focus on the integration of rotationforests with other spatial information regularizations, the useof semisupervised feature extraction algorithms, and the com-bination of ensemble learning and active learning.

ACKNOWLEDGMENT

The authors would like to thank Prof. D. Landgrebe fromPurdue University for providing the free AVIRIS image andProf. P. Gamba from the University of Pavia for providing theROSIS and DAIS images.

REFERENCES

[1] S. Shang and L. A. Chisholm, “Classification of Australian native forestspecies using hyperspectral remote sensing and machine-learning classifi-cation algorithms,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens.,vol. 7, no. 6, pp. 2481–2489, Jun. 2014.

[2] H. Onoyama, C. Ryu, M. Suguri, and M. Iida, “Integrate growing temper-ature to estimate the nitrogen content of rice plants at the heading stageusing hyperspectral imagery,” IEEE J. Sel. Topics Appl. Earth Observ.Remote Sens., vol. 7, no. 6, pp. 2506–2515, Jun. 2014.

[3] K. Bakos, G. Lisini, G. Trianni, and P. Gamba, “A novel framework forurban mapping from multispectral and hyperspectral data,” Int. J. RemoteSens., vol. 34, no. 3, pp. 759–770, Feb. 2013.

[4] J. Xia, J. Chanussot, P. Du, and X. He, “(Semi-) supervised probabilisticprincipal component analysis for hyperspectral remote sensing imageclassification,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens.,vol. 7, no. 6, pp. 2224–2236, Jun. 2014.

[5] N. Sims, D. Culvenor, G. Newnham, N. C. Coops, and P. Hopmans,“Towards the operational use of satellite hyperspectral image data formapping nutrient status and fertilizer requirements in Australian planta-tion forests,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6,no. 2, pp. 320–328, Apr. 2013.

[6] G. Hughes, “On the mean accuracy of statistical pattern recognizers,”IEEE Trans. Inf. Theory, vol. IT 14, no. 1, pp. 55–63, Jan. 1968.

[7] V. N. Vapnik, The Nature of Statistical Learning Theory. New York, NY,USA: Springer-Verlag, 1995.

[8] C. Huang, L. S. Davis, and J. Townshend, “An assessment of support vec-tor machines for land cover classification,” Int. J. Remote Sens., vol. 23,no. 4, pp. 725–749, 2002.

[9] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sens-ing images with support vector machines,” IEEE Trans. Geosci. RemoteSens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.

[10] B. Krishnapuram, L. Carin, M. A. T. Figueiredo, and A. J. Hartemink,“Sparse multinomial logistic regression: Fast algorithms and generaliza-tion bounds,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 6,pp. 957–968, Jun. 2005.

[11] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Semisupervised hyperspectralimage segmentation using multinomial logistic regression with activelearning,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 11, pp. 3947–3960, Nov. 2010.

[12] I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, Feature Extraction:Foundations and Applications. New York, NY, USA: Springer-Verlag,2006.

[13] M. Pal and G. M. Foody, “Feature selection for classification of hyper-spectral data by SVM,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5,pp. 2297–2307, May 2010.

[14] B. Mojaradi, H. Abrishami-Moghaddam, M. Zoej, and R. Duin, “Di-mensionality reduction of hyperspectral data via spectral feature extrac-tion,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 7, pp. 2091–2105,Jul. 2009.

[15] A. Plaza, P. Martinez, J. Plaza, and R. Perez, “Dimensionality reduc-tion and classification of hyperspectral image data using sequences ofextended morphological transformations,” IEEE Trans. Geosci. RemoteSens., vol. 43, no. 3, pp. 466–479, Mar. 2005.

[16] J. Ham, Y. Chen, M. M. Crawford, and J. Ghosh, “Investigation of therandom forest framework for classification of hyperspectral data,” IEEETrans. Geosci. Remote Sens., vol. 43, no. 3, pp. 492–501, Mar. 2005.

[17] K. Kawaguchi and R. Nishii, “Hyperspectral image classification bybootstrap AdaBoost with random decision stumps,” IEEE Trans. Geosci.Remote Sens., vol. 45, no. 11, pp. 3845–3851, Nov. 2007.

[18] J. M. Yang, B. C. Kuo, P. T. Yu, and C. H. Chuang, “A dynamic subspacemethod for hyperspectral image classification,” IEEE Trans. Geosci.Remote Sens., vol. 48, no. 7, pp. 2840–2853, Jul. 2010.

[19] K. L. Bakos and P. Gamba, “Hierarchical hybrid decision tree fusionof multiple hyperspectral data processing chains,” IEEE Trans. Geosci.Remote Sens., vol. 49, no. 1, pp. 388–394, Jan. 2011.

[20] J. Xia, P. Du, X. He, and J. Chanussot, “Hyperspectral remote sensingimage classification based on rotation forest,” IEEE Geosci. Remote Sens.Lett., vol. 11, no. 1, pp. 239–243, Jan. 2014.

[21] L. I. Kuncheva, “A theoretical study on six classifier fusion strategies,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 2, pp. 281–286,Feb. 2002.

[22] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms.Hoboken, NJ, USA: Wiley, 2004.

[23] L. Rokach, Pattern Classification Using Ensemble Methods. Singapore,Singapore: World Scientific, 2010.

[24] G. Brown, “Ensemble learning,” in Encyclopedia of Machine Learning,G. I. Webb and C. Sammut, Eds. New York, NY, USA: Springer-Verlag,2010.

[25] C. A. Shipp and L. I. Kuncheva, “Relationships between combinationmethods and measures of diversity in combining classifiers,” Inf. Fusion,vol. 3, no. 2, pp. 135–148, Jun. 2002.

[26] L. I. Kuncheva and C. J. Whitaker, “Measures of diversity in classifierensembles and their relationship with the ensemble accuracy,” Mach.Learn., vol. 51, no. 2, pp. 181–207, Mar. 2003.

[27] L. I. Kuncheva, “Diversity in multiple classifier systems,” Inf. Fusion,vol. 6, no. 1, pp. 3–4, Mar. 2005.

[28] T. K. Ho, “The random subspace method for constructing decisionforests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8, pp. 832–844, Aug. 1998.

[29] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, Aug. 1996.

[30] Y. Freund and R. E. Schapire, “Experiments with a new boosting algo-rithm,” in Proc. Int. Conf. Mach. Learn., Bari, Italy, 1996, pp. 148–156.

Page 14: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

XIA et al.: SPECTRAL–SPATIAL CLASSIFICATION FOR HYPERSPECTRAL DATA USING ROTATION FORESTS 2545

[31] J. A. Benediktsson, J. Chanussot, and M. Fauvel, “Multiple classifiersystems in remote sensing: From basics to recent developments,” in Proc.7th Int. Workshop Multiple Classifier Syst., 2007, pp. 501–512, Prague,Czech Republic, May 23–25.

[32] P. Du et al., “Multiple classifier system for remote sensing image classifi-cation: A review,” Sensors, vol. 12, no. 4, pp. 4764–4792, 2012.

[33] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32,2001.

[34] P. Gislason, J. A. Benediktsson, and J. M. Sveinsson, “Random forests forland cover classification,” Pattern Recognit. Lett., vol. 27, no. 4, pp. 294–300, 2006.

[35] J. C. Chan and D. Paelinckx, “Evaluation of random forest and AdaBoosttree-based ensemble classification and spectral band selection for ecotopemapping using airborne hyperspectral imagery,” Remote Sens. Environ.,vol. 112, no. 6, pp. 2999–3011, Jun. 2008.

[36] V. F. Rodriguez-Galiano, B. Ghimire, J. Rogan, M. Chica-Olmo, andJ. P. Rigol-Sanchez, “An assessment of the effectiveness of a random for-est classifier for land-cover classification,” ISPRS J. Photogramm. RemoteSens., vol. 67, no. 1, pp. 93–104, Jan. 2012.

[37] J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: Anew classifier ensemble method,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 28, no. 10, pp. 1619–1630, Oct. 2006.

[38] Y. Tarabalka, J. A. Benediktsson, and J. Chanussot, “Spectral–spatialclassification of hyperspectral imagery based on partitional clusteringtechniques,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 8, pp. 2973–2987, Aug. 2009.

[39] Y. Tarabalka, J. Chanussot, and J. A. Benediktsson, “Segmentation andclassification of hyperspectral images using minimum spanning forestgrown from automatically selected markers,” IEEE Trans. Syst., Man,Cybern. B, Cybern., vol. 40, no. 5, pp. 1267–1279, Oct. 2010.

[40] Y Tarabalka, J Chanussot, and J. A. Benediktsson, “Segmentation andclassification of hyperspectral images using watershed transformation,”Pattern Recognit., vol. 43, no. 7, pp. 2367–2379, Jul. 2010.

[41] Y. Tarabalka, M. Fauvel, J. Chanussot, and J. A. Benediktsson, “SVMand MRF-based method for accurate classification of hyperspectral im-ages,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 4, pp. 736–740,Oct. 2010.

[42] P. C. Smits, “Multiple classifier systems for supervised remote sensingimage classification based on dynamic classifier selection,” IEEE Trans.Geosci. Remote Sens., vol. 40, no. 4, pp. 801–813, Apr. 2002.

[43] G. J. Briem, J. Benediktsson, and J. R. Sveinsson, “Multiple classifiersapplied to multisource remote sensing data,” IEEE Trans. Geosci. RemoteSens., vol. 40, no. 10, pp. 2291–2299, Oct. 2002.

[44] R. Lawrence, A. Bunna, S. Powellb, and M. Zambon, “Classification ofremotely sensed imagery using stochastic gradient boosting as a refine-ment of classification tree analysis,” Remote Sens. Environ., vol. 90, no. 3,pp. 331–336, Jan. 2004.

[45] B. Waske, S. Van Der Linden, J. A. Benediktsson, A. Rabe, and P. Hostert,“Sensitivity of support vector machines to random feature selection inclassification of hyperspectral data,” IEEE Trans. Geosci. Remote Sens.,vol. 48, no. 7, pp. 2880–2889, Jul. 2010.

[46] P. Du, W. Zhang, and J. Xia, “Hyperspectral remote sensing image clas-sification based on decision level fusion,” Chin. Opt. Lett., vol. 9, no. 3,pp. 031002–031004, 2011.

[47] J. C.-W. Chan, C. Huang, and R. S. DeFries, “Enhanced algorithm per-formance for land cover classification from remotely sensed data usingbagging and boosting,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 3,pp. 693–695, Mar. 2001.

[48] J. A. Benediktsson, M. Pesaresi, and K. Amason, “Classification andfeature extraction for remote sensing images from urban areas basedon morphological transformations,” IEEE Trans. Geosci. Remote Sens.,vol. 41, no. 9, pp. 1940–1949, Sep. 2003.

[49] M. Fauvel, J. Chanussot, and J. A. Benediktsson, “Decision fusion forthe classification of urban remote sensing images,” IEEE Trans. Geosci.Remote Sens., vol. 44, no. 10, pp. 2828–2838, Oct. 2006.

[50] H. T. X. Doan and G. M. Foody, “Increasing soft classification accuracythrough the use of an ensemble of classifiers,” Int. J. Remote Sens., vol. 28,no. 20, pp. 4609–4623, 2007.

[51] B. Waske, J. A. Benediktsson, K. Arnason, and J. R. Sveinsson, “Mappingof hyperspectral AVIRIS data using machine-learning algorithms,” Can.J. Remote Sens., vol. 35, pp. 106–116, 2009.

[52] S. Velasco-Forero and J. Angulo, “Classification of hyperspectral imagesby tensor modeling and additive morphological decomposition,” PatternRecognit., vol. 46, no. 2, pp. 566–577, Feb. 2013.

[53] G. Camps-Valls, L. Gomez-Chova, J. Munoz-Mari, and J. Vila-Frances,“Composite kernels for hyperspectral image classification,” IEEE Geosci.Remote Sens. Lett., vol. 3, no. 1, pp. 93–97, Jan. 2006.

[54] G. Camps-Valls, N. Shervashidze, and K. M. Borgwardt, “Spatio-spectralremote sensing image classification with graph kernels,” IEEE Geosci.Remote Sens. Lett., vol. 7, no. 4, pp. 741–745, Oct. 2010.

[55] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Hyperspectral image segmenta-tion using a new Bayesian approach with active learning,” IEEE Trans.Geosci. Remote Sens., vol. 49, no. 10, pp. 3947–3960, Oct. 2011.

[56] X. Huang and L. Zhang, “An adaptive mean-shift analysis approach forobject extraction and classification from urban hyperspectral imagery,”IEEE Trans. Geosci. Remote Sens., vol. 46, no. 12, pp. 4173–4185,Dec. 2008.

[57] S. Van der Linden, B. Waske, M. Eiden, P. Hostert, and A. Janz, “Classi-fying segmented hyperspectral data from a heterogeneous urban environ-ment using support vector machines,” J. Appl. Remote Sens., vol. 1, no. 1,p. 013543, Oct. 2007.

[58] J. A. Benediktsson, J. A. Palmason, and J. R. Sveinsson, “Classificationof hyperspectral data from urban areas based on extended morphologicalprofiles,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 480–491,Mar. 2005.

[59] M. Fauvel, J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson, “Spec-tral and spatial classification of hyperspectral data using SVMs and mor-phological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 11,pp. 3804–3814, Nov. 2008.

[60] M. Dalla Mura, A. Villa, J. Benediktsson, J. Chanussot, and L. Bruzzone,“Classification of hyperspectral images by using extended morphologicalattribute profiles and independent component analysis,” IEEE Geosci.Remote Sens. Lett., vol. 8, no. 3, pp. 542–546, May 2011.

[61] T. Lin and S. Bourennane, “Hyperspectral image processing by jointlyfiltering wavelet component tensor,” IEEE Trans. Geosci. Remote Sens.,vol. 51, no. 6, pp. 3529–3541, Jun. 2013.

[62] L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification andRegression Trees. Boca Raton, FL, USA: CRC Press, 1984.

[63] X. He, D. Cai, S. Yan, and H. Zhang, “Neighborhood preservingembedding,” in Proc. IEEE ICCV , Beijing, China, Oct. 17–20, 2005,pp. 1208–1213.

[64] T. Zhang, J. Yang, D. Zhao, and X. Ge, “Linear local tangent spacealignment and application to face recognition,” Neurocomputing, vol. 70,no. 7–9, pp. 1547–1553, Mar. 2007.

[65] X. He and P. Niyogi, “Locality preserving projections,” in Advances inNeural Information Processing Systems. Cambridge, MA, USA: MITPress, 2003.

[66] S. Z. Li, Markov Random Field Modeling in Image Analysis, 3rd ed.New York, NY, USA: Springer-Verlag, 2009.

[67] J. Besag, “Spatial interaction and the statistical analysis of latticesystems,” J R. Stat. Soc. Series B., vol. 36, no. 2, pp. 192–236, 1974.

[68] Y. Boykov, O. Veksler, and Z. R., “Fast approximate energy minimizationvia graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 11,pp. 1222–1239, Nov. 2001.

[69] J. Li, J. Bioucas-Dias, and A. Plaza, “Spectral–spatial classificationof hyperspectral data using loopy belief propagation and active learn-ing,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 2, pp. 844–856,Feb. 2013.

[70] Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and J. C. Tilton, “Multi-ple spectral–spatial classification approach for hyperspectral data,” IEEETrans. Geosci. Remote Sens., vol. 48, no. 11, pp. 4122–4132, 2010.

Junshi Xia (S’11) received the B.S. degree in geo-graphic information systems and the Ph.D. degree inphotogrammetry and remote sensing from the ChinaUniversity of Mining and Technology, Beijing,China, in 2008 and 2013, respectively. He is cur-rently working toward the Ph.D. degree in imageprocessing with the Grenoble Images Speech Signalsand Automatics Laboratory, Grenoble Institute ofTechnology, Grenoble, France.

His research interests include multiple classifiersystems in remote sensing, hyperspectral remote

sensing image processing, and urban remote sensing.

Page 15: Spectral–Spatial Classification for Hyperspectral Data ... · cently, Markov random fields (MRFs) have become a popular tool to exploit the spatial information in the classification

2546 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 5, MAY 2015

Jocelyn Chanussot (M’04–SM’04–F’12) receivedthe M.Sc. degree in electrical engineering from theGrenoble Institute of Technology (Grenoble INP),Grenoble, France, in 1995 and the Ph.D. degree fromSavoie University, Annecy, France, in 1998.

In 1999, he was with the Geography Im-agery Perception Laboratory, Delegation Generale del’Armement (DGA—French National Defense De-partment). Since 1999, he has been with GrenobleINP, where he was an Assistant Professor from 1999to 2005 and an Associate Professor from 2005 to

2007 and where he is currently a Professor of signal and image processing. He isconducting his research at the Grenoble Images Speech Signals and AutomaticsLaboratory (GIPSA-Lab). Since 2013, he has been an Adjunct Professor withthe University of Iceland, Reykjavik, Iceland. His research interests includeimage analysis, multicomponent image processing, nonlinear filtering, and datafusion in remote sensing.

Dr. Chanussot is the founding President of the IEEE Geoscience and RemoteSensing French chapter (2007–2010) which received the 2010 IEEE GRS-SChapter Excellence Award. He was the corecipient of the NORSIG 2006 BestStudent Paper Award, the IEEE GRSS 2011 Symposium Best Paper Award,the IEEE GRSS 2012 Transactions Prize Paper Award, and the IEEE GRSS2013 Highest Impact Paper Award. He was a member of the IEEE Geoscienceand Remote Sensing Society AdCom (2009–2010), in charge of membershipdevelopment. He was the General Chair of the first IEEE GRSS Workshopon Hyperspectral Image and Signal Processing, Evolution in Remote Sensing(WHISPERS). He was the Chair (2009–2011) and Cochair of the GRS DataFusion Technical Committee (2005–2008). He was a member of the MachineLearning for Signal Processing Technical Committee of the IEEE Signal Pro-cessing Society (2006–2008) and the Program Chair of the IEEE InternationalWorkshop on Machine Learning for Signal Processing (2009). He was anAssociate Editor of the IEEE GEOSCIENCE AND REMOTE SENSING LETTERS

(2005–2007) and of Pattern Recognition (2006–2008). Since 2007, he hasbeen an Associate Editor of the IEEE TRANSACTIONS ON GEOSCIENCE AND

REMOTE SENSING. Since 2011, he has been the Editor-in-Chief of the IEEEJOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND

REMOTE SENSING.

Peijun Du (M’07–SM’12) received the Ph.D. degreefrom the China University of Mining and Technol-ogy, Beijing, China, in 2001.

After receiving his Ph.D. degree from China Uni-versity of Mining and Technology, he had been em-ployed as a teacher by this university until he joinedNanjing University, Nanjing, China, in 2011. Hewas a Postdoctoral Fellow with Shanghai Jiao TongUniversity, Shanghai, China, from February 2002 toMarch 2004 and was a Visiting Scholar with theUniversity of Nottingham, Nottingham, U.K., from

November 2006 to November 2007. He is a Professor of photogrammetryand remote sensing with the Department of Geographic Information Sciences,Nanjing University, and the Deputy Director of the Key Laboratory for SatelliteSurveying Technology and Applications of National Administration of Survey-ing and Geoinformation, Nanjing. He has published 9 textbooks in Chineseand more than 120 research articles about remote sensing and geospatialinformation processing and applications. His research interests are remotesensing image processing and pattern recognition, remote sensing applications,hyperspectral remote sensing information processing, multisource geospatialinformation fusion and spatial data handling, integration and applications ofgeospatial information technologies, and environmental information science(environmental informatics).

Dr. Du is a Senior Member of IEEE GRSS, a council member of the ChinaSociety for Image and Graphics (CSIG), and a council member of the ChinaAssociation for Remote Sensing Applications (CARSA). He was the Cochairof the Technical Committee of URBAN 2009 (the 5th IEEE GRSS/ISPRSJoint Workshop on Remote Sensing and Data Fusion over Urban Areas) andIAPR PRRS 2012 and the Cochair of the Local Organizing Committee ofJURSE 2009, WHISPERS 2012, and EORSA 2012. He is also a member of thescientific committee or technical committee of other international conferences,e.g., Spatial Accuracy 2008; ACRS 2009; WHISPERS 2010, 2011, 2012, and2013; URBAN 2011 and 2013; MultiTemp 2011 and 2013; ISDIF 2011; andSPIE Remote Sensing (Conference 07) 2012 and 2013. He is also an AssociateEditor of the IEEE GEOSCIENCE AND REMOTE SENSING LETTERS.

Xiyan He received the Generalist Engineer de-gree from Ecole Centrale Paris, Paris, France, in2006, the M.E. degree in pattern recognition andintelligent system from Xi’an Jiaotong University,Xi’an, China, and the Ph.D. degree in computerscience from the University of Technology of Troyes,Troyes, France, in 2009.

She was a Teaching Assistant with the Universityof Technology of Troyes in 2009, a PostdoctoralResearch Fellow with the Research Centre for Au-tomatic Control of Nancy in 2010, and a Teaching

Assistant with the University of Pierre-Mends-France, Grenoble, France, in2011. Since 2012, she has been a Postdoctoral Research Fellow with the Greno-ble Laboratory of Image, Speech, Signal and Automatics. Her main researchinterests include machine learning, pattern recognition, and data fusion, withspecial focus on applications to remotely sensed images.