Page 1
1
EDST - UL Ecole Doctorale des Sciences
et de Technologie Université Libanaise
“Band Selection for Dimension Reduction in Hyper
Spectral Image Using Integrated Information Gain and
Principal Components Analysis Technique ”
Submitted to Dr. Jihane KHODER January 30th, 2016
Sarah Hussein Master TIS | TIS04 Course
Summary of the Article:
Kitti Koonsanit, Chuleerat Jaruskulchai, Apisit Eiumnoh - Vol. 2
Page 2
O U T L I N E S
INTRODUCTION METHODOLOGIES
RESULTS
1� 2� 3� 4�CONCLUSIONS
2 Sarah Hussein Master TIS | TIS04 Course
Page 3
INTRODUCTION
¨ Context ¨ Objectives 1�
3 Sarah Hussein Master TIS | TIS04 Course
Page 4
I N T R O D U C T I O NContext
q Some applications requires fast data processing à Satellite application
q Dimension Reduction of hyper spectral remote sensing data is a need in such
applications
q Principle Component Analysis (PCA) is the most popular dimension reduction
technique for remotely sensed data
q However, data volumes are increasing & high computational demands of PCA
are required
q Need of a fast and efficient algorithm for PCA
4 Sarah Hussein Master TIS | TIS04 Course
Page 5
I N T R O D U C T I O NObjectives
“ An Implementation of Information Gain with PCA
dimension reduction of hyper spectral data ” &
“Comparison of the effects of integrated IG and PCA method of band selection on the final clustering results
for hyper spectral imaging applications”
5 Sarah Hussein Master TIS | TIS04 Course
Page 6
METHODOLOGIES
¨ Materials ¨ Dimensionality Reduction ¨ Background on PCA ¨ Information Gain IG ¨ PCA - IG
2�
6 Sarah Hussein Master TIS | TIS04 Course
Page 7
M E T H O D O L O G I E SMaterials
q Hyper spectral data was obtained from Small Multi - Mission Satellite (SMMS)
q Focus on data taken in June 2010 in
Amnat Charoen province, Thailand
¨ 200 x 200 pixels ¨ 115 bands ¨ Total size of 8.86 Mbytes
q Unsupervised classification method using simple K-mean from the Weka software package
7 Sarah Hussein Master TIS | TIS04 Course
Page 8
M E T H O D O L O G I E SDimensionality Reduction
q Hyper spectral images provide abundant information about bands
q Their high dimensionality substantially increases the computational burden
q Reducing the redundancy of the spectral and spatial information without losing
any valuable details is crucial
q Therefore, the conventional processing methods require dimension reduction
q It is a transformation from a high order dimension to a low order which
eliminates data redundancy
q Dimension reduction is a transformation from a high order dimension to a low
order dimension
q Principle Component Analysis (PCA) is the most popular dimension reduction
technique for remotely sensed data
q However, data volumes are increasing & high computational demands of PCA
are required à need a fast and efficient algorithm for PCA
8 Sarah Hussein Master TIS | TIS04 Course
Page 9
M E T H O D O L O G I E SDimensionality Reduction (Cont.)
q The collected hyper spectral image data are in the form of three dimensional
image cube
¨ Two spatial dimensions (horizontal and vertical)
¨ One spectral dimension (from SMMS spectrum 1 to spectrum 115)
q Reducing the dimensionality and being convenient for the subsequent
processing steps is a must
q PCA reduction technique applied !
9 Sarah Hussein Master TIS | TIS04 Course
Page 10
M E T H O D O L O G I E SBackground on PCA
q PCA is a widely used dimension reduction technique in data analysis
q It is the optimal linear scheme for reducing a set of high dimensional vectors into
a set of lower dimensional vectors
q Two methods are applicable in PCA: the matrix method & the data method
q To compute PCA, the general 4 steps are followed
1. Find mean vector in x-space
2. Assemble covariance matrix in x-space
3. Compute eigenvalues and corresponding eigenvectors
4. Form the components in y-space
10
Sarah Hussein Master TIS | TIS04 Course
Page 11
M E T H O D O L O G I E SBackground on PCA (Cont.)
q Only the first few components contain the
needed information
q Intrinsic dimensionality is the number of
components with most of information
q Each data image may have a different
intrinsic dimensionality
q PCA maximizes the covariance and reduces
redundancy to achieve lower dimensionality
11
Sarah Hussein Master TIS | TIS04 Course
Page 12
M E T H O D O L O G I E SInformation Gain
q Information Gain (IG) is a measure of dependence between the feature and the
class label
q It is one of the most popular feature selection techniques as it is easy to
compute and simple to interpret
q Information gain of a feature or band X and the class labels Y is calculated as
¨ Entropy (H) : uncertainty associated with a random variable
¨ H(X) : entropy of band X
¨ H(X|Y) : entropy of band X after observing Class Y
12 Sarah Hussein Master TIS | TIS04 Course
Page 13
M E T H O D O L O G I E SInformation Gain (Cont.)
q H(X) and H(X|Y) are calculated through the equations:
q The maximum value of information gain is 1
q A feature with a high information gain is relevant
q Information gain is evaluated independently for each feature
13
Sarah Hussein Master TIS | TIS04 Course
Page 14
M E T H O D O L O G I E SInformation Gain (Cont.)
q Information Gain does not eliminate redundant features
q Result from band selection by IG methods suspected to be notoriously
redundant : much data but not much information
14 Sarah Hussein Master TIS | TIS04 Course
Page 15
M E T H O D O L O G I E SPCA - IG
q IG methods integrated with PCA (PCA-IG) are proposed to transform into a
reduced representation set of features à Feature extraction
q The optimal bands are those that maximally preserves features that separate
different object classes
q PCA method does not guarantee the preservation of classification information
among different classes
q IG value preserve features that separate different object classes
15 Sarah Hussein Master TIS | TIS04 Course
Page 16
M E T H O D O L O G I E SPCA – IG (Cont.)
q At the band selection stage, PCA-IG
method was integrated as follow:
q X is a band member of band selected if
X is a band member of PCA AND X is a
band member of IG
16 Sarah Hussein Master TIS | TIS04 Course
Page 17
M E T H O D O L O G I E SPCA – IG (Cont.)
Original Satellite Image
Spectral Attributes
PCA Method IG Method
PCA of Band AND
IG of Band
Band Selection
17 Sarah Hussein Master TIS | TIS04 Course
Page 18
RESULTS
¨ Experiment 1 ¨ Experiment 2
3�
18 Sarah Hussein Master TIS | TIS04 Course
Page 19
R E S U L T SExperiment 1
ORIGINAL 115 BAND
10 BAND - PCA 10 BAND – PCA-IG
19 Sarah Hussein Master TIS | TIS04 Course
Page 20
R E S U L T SExperiment 1 (Cont.)
4 Cluster 115 Band
Original Image
4 Cluster 10 Band
Reduced by PCA
4 Cluster 10 Band
Reduced By PCA-IG
Cluster 1 34% 35% 34%
Cluster 2 3% 2% 3%
Cluster 3 30% 31% 30%
Cluster 4 31% 32% 31%
All Clusters 100% 100% 100%
The percent clustering for various classes for experiment 1:
20 Sarah Hussein Master TIS | TIS04 Course
Page 21
R E S U L T SExperiment 2
q Band selection on the statlog (Landsat satellite) data set from UCI data- bases
q The database consists of:
q Multi-spectral values of pixels in 3x3 neighborhoods in a satellite image
q Classification associated with the central pixel in each neighborhood
q A frame of Landsat MSS imagery consists of 4 digital images of the same scene:
q Two in the visible region (corresponding to green and red regions)
q Two in the near infra-red
q Each pixel is an 8-bit binary word
q 0 corresponding to black
q 255 to white
21 Sarah Hussein Master TIS | TIS04 Course
Page 22
R E S U L T SExperiment 2 (Cont.)
q The spatial resolution of a pixel is about 80m x 80m
q Each image contains 2340X3380 such pixels
q These data contain 6435 instances
q Each instance consists of 36 band attributes
q The proposed process was implemented on java environment
q Tested on CPU 2.80 GHz Intel(R) Core two duo processor with 1 GB of RAM
22
Sarah Hussein Master TIS | TIS04 Course
Page 23
R E S U L T SExperiment 2 (Cont.)
7 Cluster 36 band
Original image
7 Cluster 7 band
Reduced by PCA
7 Cluster 7 band
Reduced by PCA-IG
Cluster1 8% 10% 9%
Cluster2 20% 22% 21%
Cluster3 14% 12% 12%
Cluster4 9% 9% 9%
Cluster5 20% 23% 23%
Cluster6 16% 10% 14%
Cluster7 12% 14% 12%
All clusters 100% 100% 100%
The percent clustering for various classes for experiment 2:
23 Sarah Hussein Master TIS | TIS04 Course
Page 24
CONCLUSIONS
¨ Brief Summary ¨ Perspectives 4�
24 Sarah Hussein Master TIS | TIS04 Course
Page 25
C O N C L U S I O N SBrief Summary & Perspectives
q A band selection technique using principal components analysis (PCA) and
information gain (IG) functional was proposed for hyper spectral image
reduction
q The process was tested on satellite image data for unsupervised classification
q The comparison of the effects of PCA-IG method on the clustering results for
hyper spectral imaging application shows no significant difference between this
technique’ and the PCA and the original image’s clusters
q The outcome of this research will be used in further steps for analysis tools in
hyper spectral image processing
25 Sarah Hussein Master TIS | TIS04 Course