Page 1
Aerial Image Analysis Using SpikingNeural Networks with Application toPower Line Corridor Monitoring
Zhengrong LiFaculty of Science and Technology
Queensland University of Technology
A thesis submitted for the degree of
Doctor of Philosophy
May 2011
Page 3
Statement of Originality
The work contained in this thesis has not been previously submitted to
meet requirements for an award at this or any other higher education
institution. To the best of my knowledge and belief, the thesis contains
no material previously published by another person except where due
reference in made.
Signature
Date
Administrator
文本框
31/05/2011
Page 4
-
-
-
-
-
-
Dedicated to my wife and parents
Page 5
Acknowledgements
First, I would like to thank my supervisors Dr. Ross Hayward, Prof. Rodney
Walker and Dr. Jinglan Zhang, for their guidance and support during my candi-
dature. I really appreciated all the discussion with my supervisors, not only the
scientific ones but also the nonscientific ones. I am grateful that the university and
my supervisors provide me the opportunity to join the world-level research group, an
excellent working environment and great opportunities for collaboration with external
institutions. Additionally, the Cooperative Research Centre for Spatial Information
(CRCSI) and the Australian Research Centre for Aerospace Automation (ARCAA)
played a large part in providing me with this excellent opportunity.
I was lucky to have enough research funding for data collection, field survey and
attending several international conferences. I have really appreciated the communi-
cation with many world-level researchers through attending the conferences. I was
also fortunate to visit Prof. Wolfgang Förstner’s lab at the Institute of Geodesy and
Geoinformation, University of Bonn in Germany.
I would like to thank my colleagues in the CRCSI research project: Dr. Jinhai
Cai, Dr. Troy Bruggemann, Dr. Luis Mejias, Dr. Jason Ford, David Zuill, Marcos
Gerardo and Steven Mills. I also acknowledge the assistance of David Wood from
Ergon Energy, Bred Jeffers from Greening Australia and George Curran from CRCSI.
Financially, my doctorate scholarship was supported by QUT and CRCSI. Addi-
tionally, I have received a Chinese government award for outstanding PhD students
abroad from Chinese Scholarship Council (CSC), and also a travel grant from IEEE
Signal Processing Society (SPS) during my study. In this regard, I need to thank all
the financial support from these institutions.
Last but not least, I would like to thank my wife, my parents and all the friends
for their love, attention, and continuous support during my study.
5
Page 7
Abstract
Trees, shrubs and other vegetation are of continued importance to the
environment and our daily life. They provide shade around our roads
and houses, offer a habitat for birds and wildlife, and absorb air pollu-
tants. However, vegetation touching power lines is a risk to public safety
and the environment, and one of the main causes of power supply prob-
lems. Vegetation management, which includes tree trimming and vegeta-
tion control, is a significant cost component of the maintenance of elec-
trical infrastructure. For example, Ergon Energy, the Australia’s largest
geographic footprint energy distributor, currently spends over $80 million
a year inspecting and managing vegetation that encroach on power line
assets. Currently, most vegetation management programs for distribution
systems are calendar-based ground patrol. However, calendar-based in-
spection by linesman is labour-intensive, time consuming and expensive.
It also results in some zones being trimmed more frequently than needed
and others not cut often enough. Moreover, it’s seldom practicable to
measure all the plants around power line corridors by field methods. Re-
mote sensing data captured from airborne sensors has great potential in
assisting vegetation management in power line corridors.
This thesis presented a comprehensive study on using spiking neural net-
works in a specific image analysis application: power line corridor mon-
itoring. Theoretically, the thesis focuses on a biologically inspired spik-
ing cortical model: pulse coupled neural network (PCNN). The original
Page 8
PCNN model was simplified in order to better analyze the pulse dynamics
and control the performance. Some new and effective algorithms were de-
veloped based on the proposed spiking cortical model for object detection,
image segmentation and invariant feature extraction. The developed algo-
rithms were evaluated in a number of experiments using real image data
collected from our flight trails. The experimental results demonstrated the
effectiveness and advantages of spiking neural networks in image process-
ing tasks. Operationally, the knowledge gained from this research project
offers a good reference to our industry partner (i.e. Ergon Energy) and
other energy utilities who wants to improve their vegetation management
activities. The novel approaches described in this thesis showed the poten-
tial of using the cutting edge sensor technologies and intelligent computing
techniques in improve power line corridor monitoring. The lessons learnt
from this project are also expected to increase the confidence of energy
companies to move from traditional vegetation management strategy to a
more automated, accurate and cost-effective solution using aerial remote
sensing techniques.
8
Page 9
Keywords
• Biologically inspired image processing
• Pulse coupled neural network
• Power line corridor monitoring
• Aerial remote sensing
• Vegetation management
• Geographic object based image analysis
• Image segmentation
• Visual feature extraction
• Machine learning
Page 11
Contents
1 Introduction 1
1.1 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Vegetation Management in Power Line Corridors . . . . . . . 1
1.1.2 Advanced Remote Sensing Techniques . . . . . . . . . . . . . 3
1.1.3 Biologically Inspired Image Processing . . . . . . . . . . . . . 5
1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Research Outcomes and Contributions . . . . . . . . . . . . . . . . . 8
1.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Remote Sensing Data Collection 15
2.1 Remote Sensing Platforms . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Remotely Sensed Data . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Optical Remote Sensing Imagery . . . . . . . . . . . . . . . . 18
2.2.2 Airborne Laser Scanning data . . . . . . . . . . . . . . . . . . 20
2.3 Data Collection in Power Line Corridors . . . . . . . . . . . . . . . . 21
2.3.1 Aerial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Field Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
i
Page 12
3 Object Detection and Segmentation 29
3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.1 Geographic Object Based Image Analysis . . . . . . . . . . . . 30
3.1.2 Pulse Coupled Neural Networks . . . . . . . . . . . . . . . . . 33
3.2 Power Line Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.1 Characteristics of Power Lines . . . . . . . . . . . . . . . . . . 38
3.2.2 Design of Pulse Coupled Neural Filter . . . . . . . . . . . . . 40
3.2.3 Knowledge-based Line Clustering in Hough Space . . . . . . . 46
3.3 Individual Tree Crown Detection and Delineation . . . . . . . . . . . 50
3.3.1 Spectral Properties of Vegetation . . . . . . . . . . . . . . . . 50
3.3.2 Initial Tree Crown Segmentation . . . . . . . . . . . . . . . . 52
3.3.3 Decomposition of Tree Clusters Using Watershed Algorithm . 56
3.4 Combining LiDAR Data and Multi-spectral Imagery for Improved Tree
Crown Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4.1 Ground Filtering Using Statistical Analysis . . . . . . . . . . . 59
3.4.2 Region-level Fusion of LiDAR and Georeferenced Multi-spectral
Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4 Visual Feature Extraction and Data Classification 67
4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1.1 Major Steps for Thematic Information Extraction From Imagery 67
4.1.2 Visual Feature Extraction . . . . . . . . . . . . . . . . . . . . 69
4.2 Spectral-Texture Feature Extraction Using PCNN . . . . . . . . . . . 75
4.2.1 Multi-spectral Unit-linking PCNN . . . . . . . . . . . . . . . . 76
4.2.2 Properties and Behaviors of multi-spectral PCNN . . . . . . . 78
4.2.3 Rotational and Scale Invariant Feature Extraction Using Pulse
Spectral Frequency . . . . . . . . . . . . . . . . . . . . . . . . 80
ii
Page 13
4.3 Colour and Texture Feature Fusion . . . . . . . . . . . . . . . . . . . 82
4.3.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.2 Feature Fusion Based on Kernel PCA . . . . . . . . . . . . . . 84
4.3.3 Intrinsic Dimensionality Estimation . . . . . . . . . . . . . . . 86
4.4 Machine Learning Based Classification . . . . . . . . . . . . . . . . . 87
4.4.1 Multilayer Perceptron Neural Networks . . . . . . . . . . . . . 88
4.4.2 Decision Tree Forest . . . . . . . . . . . . . . . . . . . . . . . 89
4.4.3 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . 90
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5 Experiments and Results 97
5.1 Object Detection and Segmentation . . . . . . . . . . . . . . . . . . . 97
5.1.1 Power Line Detection . . . . . . . . . . . . . . . . . . . . . . . 97
5.1.2 Individual Tree Crown Segmentation . . . . . . . . . . . . . . 102
5.1.3 Fusion of LiDAR and Multi-spectral Imagery . . . . . . . . . . 106
5.2 Feature and Classifier Evaluation . . . . . . . . . . . . . . . . . . . . 110
5.2.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.2.2 Performance Measure . . . . . . . . . . . . . . . . . . . . . . . 112
5.2.3 Evaluation of PSF feature in Rotation and Scale Invariant Tex-
ture Classification . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.2.4 Evaluation of Features and Classifiers for Tree Species Classifi-
cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2.5 Evaluation of Colour and Texture Feature Fusion . . . . . . . 120
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6 Conclusion and Future Work 125
6.1 Summary of Findings and Contributions . . . . . . . . . . . . . . . . 125
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
iii
Page 15
List of Figures
2.1 Optical passive remote sensing from satellite platforms . . . . . . . . 19
2.2 The basic components and principle of airborne laser scanning . . . . 21
2.3 The UAS platforms and examples of the collected data . . . . . . . . 22
2.4 Data collections systems from commercial data providers . . . . . . . 24
2.5 An example of the collected multi-spectral imagery . . . . . . . . . . 24
2.6 An example of the collected LiDAR point cloud data . . . . . . . . . 25
2.7 Experiment Test Site . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.8 Vegetation Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 Two segmentation levels overlaid a colour aerial image . . . . . . . . . 32
3.2 Visual cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 The Eckhorn-type neuron . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 The structure of a PCNN neuron . . . . . . . . . . . . . . . . . . . . 37
3.5 Power lines from different perspectives . . . . . . . . . . . . . . . . . 39
3.6 Linking weight matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.7 Original image and 7 pulse outputs of PCNN . . . . . . . . . . . . . . 43
3.8 The structure of pulse coupled neural filter . . . . . . . . . . . . . . . 45
3.9 Comparison of Canny filter, Sobel filter and PCNF . . . . . . . . . . 45
3.10 Voting procedures and the 3D visualization of voting maps . . . . . . 47
3.11 An example of power line detection results . . . . . . . . . . . . . . . 49
3.12 Example of tree crown and shadow in a RGB image . . . . . . . . . . 51
v
Page 16
3.13 Comparison of two vegetation indexes . . . . . . . . . . . . . . . . . . 54
3.14 Tree crown segmentation from CIR image . . . . . . . . . . . . . . . 55
3.15 Comparison of initial tree crown segmentation and the ground truth . 57
3.16 Tree cluster decomposition using watershed algorithm . . . . . . . . . 58
3.17 Change sequence of skewness and kurtosis . . . . . . . . . . . . . . . 62
3.18 Framework of LiDAR and georeferenced multi-spectral imagery fusion
for individual tree crown segmentation . . . . . . . . . . . . . . . . . 65
4.1 Tree crown shapes from triple views . . . . . . . . . . . . . . . . . . . 69
4.2 Electromagnetic spectrum . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3 Gabor filter response to θ = 0◦, 60◦, 120◦, 180◦ . . . . . . . . . . . . . 74
4.4 Example of binary code calculation in a neighbourhood . . . . . . . . 75
4.5 The structure of the multi-spectral PCNN . . . . . . . . . . . . . . . 77
4.6 Periodical pulse of the neuron (i, j) . . . . . . . . . . . . . . . . . . . 79
4.7 Geometry for scale invariance . . . . . . . . . . . . . . . . . . . . . . 81
4.8 Framework of object-level colour and texture feature fusion . . . . . . 83
4.9 A multilayer perceptron neural network . . . . . . . . . . . . . . . . . 89
4.10 A linear SVM example . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.11 Tradeoff between underfitting and overfitting . . . . . . . . . . . . . . 94
5.1 Comparison of power line detection results . . . . . . . . . . . . . . . 99
5.2 Power line detection results . . . . . . . . . . . . . . . . . . . . . . . 100
5.3 Failure examples for power line detection . . . . . . . . . . . . . . . . 101
5.4 Multi-layer powerlines and crossing powerlines . . . . . . . . . . . . . 101
5.5 Ground truth and segmentation results . . . . . . . . . . . . . . . . . 104
5.6 A failure example of individual tree crown delineation . . . . . . . . . 105
5.7 Comparison of LIDAR intensity and height data by skewness and kur-
tosis analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
vi
Page 17
5.8 A pair of CIR image and LiDAR point cloud data in urban area . . . 108
5.9 Fusion of LiDAR and multi-spectral imagery for tree crown delineation 109
5.10 Illustration of ROC space analysis . . . . . . . . . . . . . . . . . . . . 114
5.11 Examples of texture images and their PSF features . . . . . . . . . . 117
5.12 Analysis of different feature descriptors in ROC space . . . . . . . . . 120
5.13 Classification accuracies of the fused features at different dimensions . 121
vii
Page 18
List of Tables
5.1 Quantitative comparison of three segmentation algorithms . . . . . . 105
5.2 Quantitative analysis of individual tree crown detection and delineation 105
5.3 A confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4 Accuracies of PSF and LBP in texture classification (in percent) . . . 116
5.5 Averaging computational costs of PSF and LBP (in seconds) . . . . . 116
5.6 Overall classification accuracies of PSF and texture features (in percent)119
5.7 Overall classification accuracies of colour histogram and PSF features
in multiple spectral bands (in percent) . . . . . . . . . . . . . . . . . 119
5.8 The classification results of single and fused colour and texture features 122
5.9 The confusion matrix of SVM classification using the fused feature . . 122
5.10 The classification results of single and fused PSF-HSV and LBP features123
viii
Page 19
List of Abbreviations
GEOBIA Geographic Object-based Image Analysis
LiDAR Light Detection and Ranging
UASs Unmanned Aerial Systems
GSD Ground Sample Distance
AGL Above Ground Level
NIR Near Infrared
CIR Colour Infrared
VI Vegetation Index
NDVI Normalized Difference Vegetation Index
SAVI Soil Adjusted Vegation Index
EVI2 2-band Enhanced Vegetation Index
RVI Ratio Vegetation Index
PCNN Pulse Coupled Neural Network
LBP Local Binary Pattern
GLCM Grey-level Cooccurence Matrix
PSF Pulse Spectral Frequency
MLP Multilayer Perceptron Neural Network
DTF Decision Tree Forest
SVM Support Vector Machine
PCA Principal Component Analysis
ix
Page 20
MLE Maximum Likelihood Estimator
ROC Receiver Operating Characteristic
FPR False Positive Rate
TPR Ture Positive Rate
x
Page 21
Chapter 1
Introduction
Vegetation management in power line corridors is essential for the preservation of
public safety, environment and reliability of electricity supply. Remote sensing tech-
nologies have great potential to provide a more reliable and cost-effective solution.
In this chapter, the background, the research objectives and outcomes as well as the
outline of this thesis are introduced.
1.1 Research Background
1.1.1 Vegetation Management in Power Line Corridors
Surveillance and maintenance of electrical infrastructure is a critical issue for the
reliability of electricity transmission. One of the most important tasks is the veg-
etation management of power line corridors. Efficient vegetation management not
only reduces the overall cost but also aids in continuous electricity supply. Ineffective
vegetation management has lead to loss of reliability of electricity transmission and
produce serious hazards. For example, contact of trees with power lines has caused
power outages and significant wild land fires in Canada and USA in 2003 [Appelt &
Goodfellow (2004); Beck & Mathieu (2004)] .
1
Page 22
Management of vegetation around power lines is essential for preservation of public
safety, the environment and reliability of electricity supply. Vegetation management
including tree trimming and vegetation control is a significant cost component of the
maintenance of electrical infrastructure. For example, Ergon Energy, the Australia’s
largest geographic footprint energy distributor, currently spends over $80 million a
year inspecting and managing vegetation that encroaches on power line assets. Ergon
Energy maintains one of the largest electrical distribution systems in the world, cov-
ering over one million square kilometers, including approximately 150,000 kilometers
of poles and wires. Queensland is subject to extreme weather conditions, ranging
from drought to cyclones. Dry conditions increase the risk of fire. Strong winds
and waterlogged ground can result in trees falling across, and bringing down power
lines, especially when inappropriate vegetation species are growing too close to power
lines. Correct and efficient vegetation management not only reduces the overall cost
but also aids in continuous electricity supply by preventing damage to power lines
through removal of the tall-growing trees. Ineffective procedures can result in the loss
of reliability in electricity transmission, produce serious hazards and expose electrical
companies to significant financial penalties.
The reliability of electricity supply and distribution is of the highest priority in
power line corridor monitoring. A short-term strategy is to identify and remove
nearby objects (i.e. buildings and vegetation) around power lines. In urban areas,
vegetation encroachment is less serious than in rural areas as access is much easier
and prompt maintenance can be achieved. Moreover, local councils and private land
owners regularly maintain their trees facilitating the overall maintenance process.
Generally, the risk of man-made structures can be controlled through building reg-
ulations. However, in rural areas, inspection and maintenance becomes difficult due
to limited access and large distances to cover. Vegetation is naturally growing and
particularly in rural areas the growth is unman aged. Strong winds and storms can
2
Page 23
bring branches or even entire trees into contact with power lines. These problems
motivate detection of all vegetation that have the potential to pose risks to power
lines and use this to guide the field workers for vegetation clearance in the corridors.
Ergon Energy has a long-time strategy of managing vegetation according to different
species. The species they are interested in can be generally categorized as desirable
species and undesirable species. Species with fast growth rates and that also have the
potential to reach a mature height of more than four meters are defined as undesirable
species (Wood, 2007). These undesirable species often pose high risks to electrical
infrastructure and therefore should be identified and removed. It is also worth men-
tioning that in the long-term maintenance strategy, low-growing trees or shrubs are
encouraged because they are expected to compete with tall growing species and de-
prive immature taller trees of light and nutrients. These low growing species, along
with the rare and endangered species, are defined as desirable species that should be
managed differently.
In rural areas, traditional calendar-based tree trimming is performed by contrac-
tors. This process is time consuming, labor-intensive and expensive. It also results in
zones being trimmed more frequently than others, or not cut often enough. Satellites
and aerial vehicles can pass over more regularly and automatically than the ground
patrol and therefore, remote sensing approaches have great potential in assisting veg-
etation management in power line corridors.
1.1.2 Advanced Remote Sensing Techniques
Remote sensing is the stand-off collection through the use of a variety of devices
for gathering information on a given object or area 1. Actually, aerial vehicles had
been intensively used in power line inspection for a long period. The present practice
is to fly helicopters/airplanes along the corridor and try to identify dangerous trees1HTTP://en.wiped.org/Viki/Remote_sensing
3
Page 24
and assess the condition of overhead lines assets by visual observation. Such visual
inspection is time consuming and labour intensive. Over the past decades, a number
of ideas have come forth seeking to reduce this workload. These include improved
data collection using satellite sensors (Kobayashi et al., 2009; Beltrame et al., 2007),
airborne laser scanning system (Lu & Kieloch, 2008; Clode & Rottensteiner, 2005),
stereo vision system (Sun et al., 2006), and unmanned aerial systems (UASs) (Jones
et al., 2005).
Satellites and air crafts are the most widely used platforms for remote sensing
in earth observing data collection. Current satellite sensors are not the best choice
for monitoring power line corridors due to two critical limitations: the unfavorable
revisit time and lack of choices in optimum spatial and spectral resolutions. At the
most practical level, most collections of data gathered from satellites are available
only on predetermined schedules, and even those with an “on-demand” capability are
also limited by their orbits and the demands of other users. In contrast, airborne data
collection offers a much greater level of flexibility. An airborne system can capture
data at any time of the day whereas satellites generally pass over one site at the same
time of a day. Another advantage of airborne platform is that different sensor payloads
can be easily fitted, while the sensors launched on a satellite are often not changeable.
As a consequence, airborne systems can be regularly upgraded as sensor technology
advances. Improvements to sensors include systems with higher spectral and spatial
resolution, and advanced microwave or LiDAR sensors. In addition, higher spatial
resolutions are easier to obtain from airborne platforms, due to their low altitude. A
limitation which impedes large-scale airborne remote sensing applications is that the
traditional piloted airborne platforms involve high operational costs. Moreover, using
piloted aircraft for power line inspection will place the operators at a greater level of
risk.
Remote sensors mounted on unmanned aerial systems (UASs) could fill this gap,
4
Page 25
providing a cheap and flexible way to gather spatial data from power line corridors
which can also meet the requirements of spatial, spectral, and temporal resolutions.
Recent development in the aerial vehicles themselves and associated sensing system
make UAS platforms increasingly attractive for both research and operational map-
ping (Berni et al., 2009; Gurtner et al., 2009). One of the main limitations of using
UASs is their ability to carry power-demanding and heavy payloads. Airborne laser
scanning systems (LiDAR) are too heavy for small/medium sized UAS platforms.
This limitation may be overcome in the near future as smaller LiDAR systems be-
come more readily available in the market. However, the performance of these units
in terms of the quality of data collected is currently well away from their full-sized
counterparts. When combined with LiDAR type systems UASs would represent the
technology of choice for future uses.
Aerial remote sensing is a fast and cost-effective technique for power line corridor
mapping. There are some commercial data providers available in the market that
can collect high quality LiDAR data and referenced imagery for general mapping
purposes. Compared to corridor mapping, automated and intelligent information
extraction from remotely sense data is more challenging. One special need for power
line corridor monitoring is to detect the objects of interest for further interpretation
and decision making. The major objects of interest include power line assets and
vegetation. Automated data processing aims to automatically detect these objects
from aerial imagery, and try to extract more specific information such as vegetation
species and height information.
1.1.3 Biologically Inspired Image Processing
Biology offers a great model for mimicking, copying and learning, and also serves as
inspiration for many new technologies (Bar-Cohen, 2006). Humans have learned much
from biology and the results offers enormous potential for inspiring new capabilities
5
Page 26
for exciting technologies. There are numerous examples of the success of biologically
inspired technologies. Examples in engineering include the hulls of boats imitating
the thick skin of dolphins; sonar, radar, and medical ultrasound imaging imitating
the echolocation of bats. In the field computer science, great success has also been
achieved by using biologically inspired technologies. As a branch of computer sci-
ence, artificial intelligence tries to understand the mechanisms underlying thought
and intelligent behavior and studies the computational requirements for allow the
development of systems that perform such tasks as perception, reasoning and learn-
ing. For example, artificial immune systems have been applied to protect computers
from malicious viruses; artificial neural networks have been used for weather forecast-
ing and data classification; evolutionary algorithms have been successfully used for a
number of optimization process.
Image processing has been a science for decades. Since fast and cheap comput-
ers and signal processors become available, digital image processing has become the
most common form of image processing and generally, is used because it is not only
efficient, but also the cheapest. However, even though many scientists are working
in this area and numerous computer based image processing algorithms have been
developed, progress towards achieving recognition capability similar to humans has
been very slow. Computer algorithms can perform specific functions well but it is
difficult for machines to perform robust recognition tasks. In contrast, the human
vision system has outstanding ability to recognize and classify objects in a variety
of complex environments. For example, humans can recognize different plant species
after seeing only a few examples, even when the species are very similar. Humans can
also recognize a car with different sizes, shapes and colours. This excellent recognition
ability also holds true for many other animals. It is obvious that humans use many
elegantly structured process to achieve their image processing goals and we are begin-
ning to understand only a few of them (Lindblad & Kinser, 2005). Current computer
6
Page 27
algorithms are incredibly simple compared to what we know of the biological systems
and the algorithms fail in attempting to perform image recognition at the level of
a human. Therefore, emulation of some biological systems is necessary to advance
current computer vision systems.
One important step towards bio-inspired computer vision systems is to emulate the
process of visual cortex. In the past decades, visual cortex theory has been intensively
studied and many computational models have been proposed (David H. Hubel, 1998;
Ng et al., 2007). Although there still exists significant debate on the theory, the
known processes of visual cortex has already lead to new tools in image processing
and recognition. This thesis focuses on the theory and application of one spiking
neural network model: the pulse coupled neural network (PCNN). Spiking neural
networks are considered as the third generation of neural network models, which
increase the level of realism in a neural simulation Gerstner (2001). Biological neurons
use short and sudden increases in voltage to send information. These signals are more
commonly known as action potentials, spikes or pulses. Neurological research has
shown that neurons encode information in the timing of single spikes, and not only
just in their average firing frequency Vreeken (2003). Spiking neural networks raise
the level of biological realism by using individual spikes, which allows incorporating
spatial-temporal information in communication and computation, like real neurons
do. A PCNN is a kind of spiking neural network developed based on the Eckhorn
model, which is inspired from the phenomena of synchronous pulse bursts in the
cat visual cortex (Eckhorn et al., 1989). PCNNs are spatial-temporal-coding models
which attract much attention from researchers in that they mimic real neurons better
and have more powerful computation performance than traditional neural network
models due to the use of time. PCNNS can be applied to a variety of image processing
applications, such as image segmentation, edge detection, feature generation, noise
reduction, etc. (Lindblad & Kinser, 2005). More discussions of this biologically
7
Page 28
inspired image processing model are given in the following chapters of this thesis.
1.2 Research Objectives
The overall objective of this study is to develop novel and effective computer vision
algorithms in the context of an aerial remote sensing application: vegetation man-
agement in power line corridors. Addressing the problem in this specific application
requires solving a number of sub-problems spanning many disciplines including remote
sensing, image processing and machine learning. A wide range of domain knowledge
is also required to better understand the problems. Specifically, the objectives of this
research project are:
• To review the existing technologies for power line corridor monitoring and to
identify appropriate platforms and sensors for operational data collection;
• To develop new and effective algorithms for object detection and segmentation
from the collected remote sensing data ;
• To develop new feature extraction method using biologically inspired spiking
cortical models which can better model the objects of interest in the specific
classification tasks.
1.3 Research Outcomes and Contributions
This research is the first comprehensive study of using remote sensing techniques in
power line corridor vegetation management. Theoretically, the thesis focuses on a
biologically inspired spiking neural network model: the pulse coupled neural network
(PCNN). The original PCNN model was simplified in order to better analyze the
pulse dynamics and control the performance. Some new and effective algorithms were
developed based on the developed spiking neural network model for object detection,
8
Page 29
image segmentation and invariant feature extraction. Several journal and conference
publications are also generated from this project. Operationally, the knowledge gained
from this research project offers a good reference to our industry partner (i.e. Ergon
Energy) and other energy utilities who want to improve their vegetation management
activities. The novel approaches described in this thesis show the potential of using
the cutting edge technologies to reduce the cost of power line corridor monitoring and
will help the energy companies to improve their traditional vegetation management
strategy.
• Several aerial platforms and sensors were evaluated in order to collect high-
quality data in power line corridors. High spatial resolution natural colour and
multi-spectral imagery as well as laser scanning data are collected using pi-
loted and unmanned aerial platforms. These data provides a perfect simulation
environment for developing and evaluating the data processing algorithms.
• A novel method is developed specifically for power line detection from aerial
images. A pulse couple neural filter is developed to remove the background
noise and generate an edge map prior to the Hough transform being employed
to detect straight lines. An improved Hough transform is used by performing
knowledge-based line clustering in Hough space to refine the detection results.
• Individual tree crown detection and delineation from multi-spectral imagery is
achieved by applying a PCNN in spectral feature space and a watershed algo-
rithm in the post-processing stage. Multi-spectral imagery and laser scanning
data are combined through a region-level fusion method to further improve the
segmentation results.
• A biologically inspired spectral-texture feature extraction method is developed
by using pulse spectral frequency (PSF) of a pulse coupled neural network. The
PSF feature is rotation and scale invariant and it has been evaluated against
9
Page 30
several classic feature descriptors in texture classification as well as vegetation
species classification.
1.4 Publications
Journal Papers
[1] Z. Li, Y. Liu, R. Walker, R. Hayward, J. Zhang, "Towards automatic power
line detection for a UAV surveillance system using pulse coupled neural filter and an
improved Hough transform." Machine Vision and Applications, vol. 21, pp. 677-686,
2010.
[2] S. Mills, M. Gerardo, Z. Li, J. Cai, R. Hayward, L. Mejias, R. Walker, "Eval-
uation of aerial remote sensing techniques for vegetation management in power line
corridors," IEEE Transactions on Geoscience and Remote Sensing, vol. 48, pp. 3379-
3390, 2010.
[3] Z. Li, R. Hayward, R. Walker, Y. Liu, "A Biologically Inspired Object Spectral-
Texture Descriptor and Its Application to Vegetation Classification in Power Line
Corridors" IEEE Geoscience and Remote Sensing Letters, vol. 8, pp. 631-635, 2011.
[4] Z. Li, R. Hayward, Y. Liu, R. Walker "Spectral and Texture Feature Extrac-
tion Using Statistical Moments with Application to Object-based Vegetation Species
Classification" International Journal of Image and Data Fusion (Accepted, in press,
2011).
Peer-reviewed Conference Papers
[1] Z. Li, R. Hayward, J. Zhang, Y. Liu, "Individual tree crown delineation tech-
niques for vegetation management in power line corridor," Proceedings of the In-
ternational Conference on Digital Image Computing: Techniques and Applications
(DICTA), Canberra, 2008.
[2] Z. Li, Y. Liu, R. Hayward, J. Zhang, J. Cai, "Knowledge-based power line
10
Page 31
detection for UAV surveillance and inspection systems," Proceedings of the 23rd
International Conference on Image and Vision Computing New Zealand (IVCNZ),
Christchurch, 2008.
[3] Z. Li, R. Hayward, J. Zhang, Y. Liu, R. Walker, "Towards automatic tree
crown detection and delineation in spectral feature space using PCNN and morpho-
logical reconstruction," Proceedings of the IEEE International Conference on Image
Processing (ICIP), Cairo, 2009.
[4] Y. Liu, Z. Li, R. Hayward, R. Walker, H. Jin, "Classification of airborne lidar
intensity data using statistical analysis and Hough transform with application to
power line corridors," Proceedings of the International Conference on Digital Image
Computing: Techniques and Applications (DICTA), Melbourne, 2009.
[5] H. Jin, Y. Feng, Z. Li, "Extraction of road lanes from high-resolution stereo
aerial imagery based on maximum likelihood segmentation and texture enhancement,
" Proceedings of the International Conference on Digital Image Computing: Tech-
niques and Applications (DICTA), Melbourne, 2009.
[6] Z. Li, R. Hayward, J. Zhang, H. Jin, R. Walker, "Evaluation of spectral
and texture features for object-based vegetation species classification using support
vector machines," International Archives of the Photogrammetry, Remote Sensing and
Spatial Information Sciences. vol. XXXVIII, Part 7A (ISPRS TC VII Symposium-
100 years ISPRS), Vienna, 2010
[7] Z. Li, Y. Liu, R. Hayward, R.Walker, "Empirical comparison of machine
learning algorithms for image texture classification with application to vegetation
management in power line corridors," International Archives of the Photogrammetry,
Remote Sensing and Spatial Information Sciences. vol. XXXVIII, Part 7A (ISPRS
TC VII Symposium-100 years ISPRS), Vienna, 2010
[8] Z. Li, Y. Liu, R. Hayward, R. Walker, "Color and texture feature fusion
using kernel PCA with application to object-based vegetation species classification,"
11
Page 32
Proceedings of the IEEE International Conference on Image Processing (ICIP), Hong
Kong, 2010. (IEEE Signal Processing Society Travel Grant)
[9] Z. Li, R. Walker, R. Hayward, L. Mejias. "Advances in vegetation management
for power line corridor montoring using aerial remote sensing techniques" Proceedings
of the First International Conference on Applied Robotics for the Power Industry
(CARPI), Montreal, 2010.
1.5 Thesis Structure
This thesis is structured in the following manner:
Chapter 1 provides an overview of the background, objectives and outcome of
this research.
Chapter 2 briefly reviews the advantage and disadvantage of current sensors and
remote sensing platforms. Aerial and ground survey data collection trials for this
study and the characteristics of the collected data are also given in this chapter.
Chapter 3 explains the concepts in geographic object based image analysis and
pulse coupled neural networks. The technique details of the developed algorithms for
power line detection and individual tree crown segmentation are given in this chapter.
Chapter 4 discusses visual feature extraction and machine learning techniques
in image classification tasks. The idea of using pulse spectral frequency of PCNN as
spectral-texture feature descriptors is presented. The developed colour and texture
feature fusion based on kernel principal component analysis is also presented in this
chapter.
Chapter 5 presents and discusses the results of a series of experiments conducted
to evaluate the effectiveness of the developed algorithms.
Chapter 6 concludes this dissertation with (i) a summary of the lessons learnt
throughout the thesis and the novel contributions that were made in this study and
12
Page 33
(ii) an outline of some new research directions which are considered important to
further advance the methods in this study and apply to real applications.
13
Page 35
Chapter 2
Remote Sensing Data Collection
Remote sensing is an effective tool for land cover mapping in large areas. In the
past a few years, an increasing number of sensors and platforms become available for
people seeking to map land cover, forest structure and the change of earth’s surface.
The choices involved in the selection of a remote sensing data type are increasingly
complicated.
Associated with every research project is the collection of adequate data for veri-
fication and validation of research outcomes. As a part of the CRCSI project 6.07, a
review of the current remote sensing platforms and sensors is conducted in order to
suggest the best data capture solutions. Commercial data providers were employed
in the data collection and the access to such data is highly beneficial given that
the verification and validation will correspond to actual needs of the industry. The
application specific nature of this research project also requires real world practical
verification.
2.1 Remote Sensing Platforms
Traditional aerial survey in power line corridor monitoring employs helicopter patrols
to fly over the network, trying to identify dangerous trees and assess the condition of
15
Page 36
overhead lines assets by visual observation. However, it is not possible to consistently
and accurately determine the distance between vegetation and powerlines by human
eye (Ituen et al., 2008). The aerial survey often needs to be supplemented with
ground patrols, which make it even more time consuming and labour intensive. A
better way of aerial survey is to map the corridor by collecting data using LiDAR
sensors and geo-referenced cameras. Afterwards, computer systems can be used in the
automated data processing and analysis to provide information for decision making.
Over the past decades, a number of ideas have come forth seeking to collect remote
sensing data in power line corridor mapping. These include improved data collection
using satellite sensors (Kobayashi et al., 2009; Beltrame et al., 2007), airborne laser
scanning systems (Lu & Kieloch, 2008; Clode & Rottensteiner, 2005), airborne stereo
vision systems (Sun et al., 2006), and unmanned aerial systems (UASs) (Jones et al.,
2005).
Several satellite platforms such as IKONOS and QuickBird can be used to collect
high spatial resolution earth observation data. The IKONOS satellite is the world’s
first commercial satellite to collect panchromatic images with one meter ground sam-
ple distance (GSD) and multi-spectral imagery with 4 meter GSD. QuickBird is the
most widely used high-resolution commercial earth observation satellite. This satel-
lite collects panchromatic imagery at 60-70 centimeter resolution and multi-spectral
imagery at 2.4- and 2.8-meter resolutions. However, it is noted that these satellite
platforms are not the best choice for monitoring power line corridors due to two crit-
ical limitations: the unfavorable revisit time and lack of choices in optimum spatial
and spectral resolutions. Satellite data are only available on predetermined sched-
ules, which is not flexible for the monitoring of power line corridors. Moreover, it is
not possible to capture very small objects (i.e. power lines) and detailed textures of
ground objects (i.e. individual tree crowns) using current satellite data. Airborne
data collection offers a much greater level of flexibility. Moreover, airborne systems
16
Page 37
can be regularly upgraded as sensor technology advances, such as higher spectral and
spatial resolution, and advanced microwave or LiDAR sensors. A limitation which im-
pedes large-scale airborne remote sensing applications is that the traditional piloted
airborne platforms involve high operational costs. Moreover, using piloted aircraft
for power line inspection will place the operators at a greater level of risk.
Unmanned aerial systems (UASs) could fill this gap, providing a cheap and flexible
way to gather spatial data from power line corridors. Traditionally the use of UASs
has been limited to military applications. As these military systems grow in matu-
rity, a number of UAV systems with various onboard sensors have been developed
for civilian applications such as homeland security, forestry fire monitoring, quick
response measurements for emergency disaster, Earth science research, volcanic gas
sampling, humanitarian observations, and monitoring of gas pipelines (Zhou et al.,
2009). Recent development in the aerial vehicles themselves and associated sensing
system make UAS platforms increasingly attractive for both research and operational
mapping (Berni et al., 2009; Gurtner et al., 2009). One of the main limitations of
using UASs is their ability to carry power-demanding and heavy payloads. Airborne
laser scanning systems (LiDAR) are too heavy for small/medium sized UAS plat-
forms. This limitation may be overcome in the near future as smaller LiDAR systems
suitable for UASs become more readily available in the market.
2.2 Remotely Sensed Data
There are two kinds of remote sensing: passive remote sensing and active remote
sensing. Passive sensors detect natural radiation that is emitted or reflected by the
object or surrounding area being observed. Reflected sunlight is the most common
source of radiation measured by passive sensors. Optical remote sensing images such
as satellite and airborne multi-spectral imagery are collected from passive sensors.
17
Page 38
Active collection, on the other hand, emits energy in order to scan objects and areas
whereupon a passive sensor then detects and measures the radiation that is reflected
or backscattered from the target. Light detection and ranging (LiDAR) is an example
of active remote sensing.
2.2.1 Optical Remote Sensing Imagery
The spatial resolution and spectral resolution are the two most important character-
istics of optical remote sensing imagery. Spatial resolution commonly referred to as
“pixel size” in digital images, has a close relationship with the information content
that can be extracted from the image. Remote sensing images are usually divided into
two categories: high-resolution and low-resolution. Imagery whose spatial resolution
is coarser than the object to be extracted is low-resolution. Low-resolution images are
widely used for land cover classification in a large scale. Detailed information such
as individual trees can not be extracted from low-resolution images because when a
forested area becomes the central scene, one pixel may represent multiple trees and
their surroundings. In contrast, high-resolution images will contain multiple pixels
for each object, which adds the variance of the image. Higher resolution is not al-
ways necessary. For instance, in an image with trees represented by multiple pixels,
each pixel may be shadowed or sunlit, young foliage or old, or may contain one of a
variety of different understory components. If a forest map is needed, rules should be
developed to incorporate all of these cover types into one. As a consequence, images
with a spatial resolution near the size of the object of interest are usually preferred
(Lefsky & Cohen, 2003).
Spectral resolution is the richness of spectral information in optical remote sens-
ing imagery. Different materials reflect and absorb differently at different wavelengths
(Figure 2.1 1). Spectral features are the specific combination of reflected and absorbed1source: www.crisp.nus.edu.sg
18
Page 39
Figure 2.1: Optical passive remote sensing from satellite platforms
electromagnetic radiation at varying wavelengths which can uniquely identify an ob-
ject. Increasing the number of spectral bands would seem to be an obvious way to
improve prediction of forest attributes (Lefsky & Cohen, 2003). Multi-spectral sensors
are one strategy for obtaining spectral data. To efficiently record most variation, they
break the spectra into a few bands where there are significant differences between the
spectra of a wide range of materials. As technology improves and more data recorded,
hyper-spectral sensors are more commonly used. These sensors record the intensities
of many wavelengths of light being reflected from the same source. Hyper-spectral
sensors split the reflected light energy at the sensor into many separate, narrow chan-
nels on a pixel-by-pixel basis, and are developed with the assumption that improved
identification of particular spectral features will lead to improved discrimination of
cover attributes. Hyper-spectral imagery makes discernment of an area’s composition
through spectral response discrimination more effective (Aardt, 2000).
19
Page 40
2.2.2 Airborne Laser Scanning data
Light Detection and Ranging (LiDAR) is a type of active remote sensing techniques
that measures the properties of scattered light to find range or other information
of a distant target. Airborne laser scanning is an application of LiDAR techniques
which capture and record the geometry and sometimes textural information of visible
surface of the ground objects. It is a relatively young 3D measurement technique
offering much potential in topographic and mapping operations to capture precise
and reliable 3D geodata. Traditionally, the primary use of LiDAR data is to obtain
altitude data and generate digital terrain models (DTM). In recent years, however,
the range of applications in which laser scanning can be used has greatly broadened.
With the advancement of sensor technology, the achievable resolution of point clouds
made it possible to map individual trees and power lines from airborne laser scanning
data.
An airborne laser scanning system has two main components: a laser scanner
which measures the distance to a spot on the ground illuminated by the laser and a
GPS/IMU combination to measure exactly the position and orientation of the system
(Beraldin et al., 2010). Figure 2.2 (Beraldin et al., 2010) illustrates the basic compo-
nents and principle of airborne laser scanning. The laser scanner, mounted over a hole
in the aircraft’s fuselage, continuously sends laser pulses towards the terrain as the
aircraft flies. GPS antenna and inertial measurement unit (IMU) are used to record
position and orientation of the system and the data also allows the reconstruction
of the flight path. There is also a control and data recording unit inside the system
which is responsible for time synchronization and the control of the whole system.
Modern laser scanners generate up to 300 000 laser pulses per second and produce
about 20 Gbyte of ranging data per hour (Beraldin et al., 2010). Depending on the
aircraft velocity and survey height, the densities of LiDAR point clouds vary between
0.2 and 50 points per square meter. Typically, commercial airborne laser scanning
20
Page 41
Figure 2.2: The basic components and principle of airborne laser scanning
systems for land mapping applications operate at wavelengths between 800 and 1550
nm and a spectral width between 0.1 and 0.5 nm. Since the reflectivity of an object
depends on the wavelength, selection of a suitable wavelength should also be consid-
ered in LiDAR mapping applications. For example, water surfaces will rarely be seen
by laser scanners operating at visible part of the spectrum because water will absorb
most of the laser energy.
2.3 Data Collection in Power Line Corridors
Data collection for this study included gathering aerial data for sensor evaluation,
as well as conducting a ground survey for verification. Numerous flight tests were
conducted and a large amount of aerial data were collected. A field survey is also
conducted to evaluate the classification of individual tree species.
21
Page 42
Figure 2.3: The UAS platforms and examples of the collected data
2.3.1 Aerial Data
The first series of flights were conducted to evaluate the capability of UAS in data
collection. Numerous flight tests were performed using two different UASs carrying
different sensor payloads. Two UAS platforms were used: V-TOL Aerospace BAT-3,
and the ARCAA UAS platform Eleanor. The sensors on board two UAS platforms
were colour digital cameras (Canon IXUS 960IS and Canon 350D DSLR). The flight
tests result in over 3 hours of video footage and over one thousand high resolution
images covering approximately 20km of typical power line. The spatial resolutions
of these images vary depending on different flight heights. Figure 2.3 shows the two
UAS platforms and examples of the collected image data.
The second series of flights were conducted to evaluate commercial data collected
from piloted aircrafts. As no single provider could capture the full range of sensor
22
Page 43
data required for the experiment, two separate providers were contracted to collect
data over the test site.
(1) Provider of multi-spectral imagery
The first series of flights occurred on the 25th of November 2008 by a Queensland
based company. Contracted to collect multi-spectral data, their system consists of
a DuncanTech MS-4100 multi-spectral camera with DGPS/INS. This equipment is
mounted in the cargo area of a Piper Cub as pictured in Figure 2.4(a). Multi-spectral
data is captured over 4 spectral bands: NIR (800-966nm), red (670-840nm), green
(540-640nm), blue (460-545nm). Traveling at approximately 34m/s (65 knots) and an
altitude of 350m AGL, multi-spectral images were captured at approximately 15cm
GSD.
(2) Provider of LiDAR data
The second data provider was a also a local company supplying aerial mapping,
LiDAR and other GIS services to government and industry. Their system consisted
of an Integrated LiDAR and Digital Photography System mounted in the cargo area
of a modified Cessna U206G as shown in Figure 2.4(b). The flight for data collection
occurred on the 9th of December 2008, during which the aircraft was flown at ap-
proximately 55m/s (106kts) at an altitude of 500m AGL. LiDAR data was collected
at 200kHz, with a scan angle of ±30º with an average sample rate of 9 points per
square meter.
Figure 2.5 and Figure 2.6 show examples of the collected multi-spectral imagery
and LiDAR point cloud data.
2.3.2 Field Survey
A 1.5-km section of line spanning between the rural towns of Murgon and Wondai
in South East Queensland, Australia, was selected as the test site for ground survey.
Figure 2.7 shows a mosaic of the test area generated from the aerial images acquired
23
Page 44
Figure 2.4: Data collections systems from commercial data providers
Figure 2.5: An example of the collected multi-spectral imagery
24
Page 45
Figure 2.6: An example of the collected LiDAR point cloud data
from the trial, where white lines indicate power lines and dashed lines indicate those
outside of the test area.
The major task of field survey is to label the species of individual trees in the
collected images. The method we used for field survey including:
(1) Geo-locate the test images: approximately 1.5 kilometers corridor was selected
by considering the quality of images and the feasibility to access.
(2) Each tree has a unique ID and was located by recognizing distinct features
(e.g. roads, houses, and power poles) around it.
(3) Tree species were identified by an expert from Greening Australia.
(4) The field survey data were labeled on multi-spectral images and a vegetation
database was derived.
Figure 2.7 shows an example of the vegetation database. From our field survey,
a total of 353 trees were labeled with 12 species found in the test area. The species
were not evenly distributed. There were three dominant species in our test field: Eu-
calyptus tereticornis, Eucalyptus melanophloia, and Corymbia tesselaris. According
to the field survey, these three species account for over 80% of all the trees in the test
area.
25
Page 46
Figure 2.7: Experiment Test Site
Figure 2.8: Vegetation Database
26
Page 47
2.4 Summary
In this chapter, the selection of sensors and platforms was discussed. The basic
theory of optical remote sensing imagery and airborne laser scanning data were also
introduced. A number of flight trails and the collected aerial data as well as the field
survey were also given in this chapter.
27
Page 49
Chapter 3
Object Detection and Segmentation
In object based image analysis, partitioning an image into meaningful image-objects
is a critical step. There are two types of objects are of special interest in this research:
trees and power lines. Therefore, the detection and segmentation of trees and power
lines from remote sensing imagery are two important tasks. A suitable methodology
for these tasks is draw inspiration from the processes that humans use to solve similar
problems, followed by a rigorous evaluation on real data. The approach to solve the
problem will utilize both a traditional top-down approach (using knowledge and log-
ical inference) and a bottom-up approach drawing on techniques from computational
intelligence. The algorithm design process involves the use of a biologically inspired
spiking neural network model as well as domain knowledge to solve the problem. In
this chapter, the related work of geographic object based image analysis (GEOBIA)
and pulse coupled neural network are firstly introduced. After that, the details of the
developed algorithms for power line detection and individual tree crown segmentation
are presented respectively.
29
Page 50
3.1 Related Work
3.1.1 Geographic Object Based Image Analysis
A wide range of classification methods have been developed to derive land cover
information from remotely sensed images. Since remote sensing images consist of
rows and columns of pixels, conventional land-cover mapping has been based on a
per-pixel basis (Yan et al., 2006). Unfortunately, classification algorithms based on
single pixel analysis are often not capable of extracting the information we desire
from high spatial resolution images. For example, the spectral complexity of urban
land-cover materials results in specific limitations using per-pixel analysis for the
separation of human-made materials such as roads and roofs and natural materials
such as vegetation, soil, and water (Jensen, 2005). We need information about the
characteristics of a single pixel and also those of the surrounding pixels so that we
can identify areas (or segments) of pixels that are homogeneous.
Object-based approaches become popular in high spatial resolution remote sensing
image classification, which has proven to be an alternative to the pixel-based image
analysis and a large number of publications suggest that better results can be expected
(Liu et al., 2006b; Blaschke, 2010). It is noted that in the remote sensing and GIS
community, a new discipline of geographic object-based image analysis (GEOBIA)
has gained widespread interest, although a critical discussion has arisen concerning
whether or not geographic space should be included in the name of this concept
in order to discriminate from other disciplines like computer vision and biomedical
imaging, which also conduct object-based image analysis (OBIA) (Hay & Castilla,
2008). GEOBIA is the process of associating initial image-objects (segments) to
geographic-object classes, based on both internal features of the objects and their
mutual relationships. Ideally, there should be a one-to-one correspondence between
image-segments and meaningful image-objects. However, this is hard to achieve in
30
Page 51
the real classification process. The strengths of GEOBIA (Hay & Castilla, 2008)
include: (1) Partitioning an image into objects which are similar to the way humans
conceptually organize the landscape to comprehend it; (2) Using image-objects as
basic units reduces computational cost of classifier; (3) Image-objects exhibit useful
features (e.g. shape, texture, contextual relations with other objects) that single
pixels lack; (4) Image-objects can be more readily integrated into a vector image
representation for a GIS rather than pixel-wise classified raster maps.
GEOBIA assumes that we can identify entities in remote sensing images that can
be related to real entities in the landscape. A first step in this kind of analysis is seg-
mentation, which partition images into separate image objects. Image-objects have
the potential to correspond to geographic-objects. When an image-object can be
seen as a proper representation of an instance of some type geographic-object, then
we can say it is a meaningful image-object. However, perfect segmentation is very
difficult to achieve in real situations. Image segmentation is the process of breaking
an image up into regions that have some meaning with respect to image content and
application (Watt & Polocarpo, 1998). During the past 40 years, there have been
many segmentation algorithms being proposed and applied to various applications
(Zhang, 2006; Brook et al., 2005). These methods can be roughly broken down into
four main categories: 1) thresholding techniques; 2) boundary-based techniques; 3)
region-based techniques; 4) Hybrid techniques. However, these segmentation algo-
rithms have only been proven to work well in specific applications, and it can not be
conclusively stated that segmentation problem has been solved. As a result, choosing
a particular image segmentation method for a specific domain is still a big problem.
The goodness of image segmentation algorithms has to be established on the basis
of expert consensus. There is a two-sided problem in image segmentation that depends
on human judgmental difference: over-segmentation and under-segmentation (Hay,
2008). Over-segmentation refers to a situation where, in the opinion of the perceiver,
31
Page 52
Figure 3.1: Two segmentation levels overlaid a colour aerial image
the contrast between some adjacent segmentation is insufficient and should be merged
into a single image object. Under-segmentation refers to the existence of segments
that in the opinion of the perceivers lack coherency and should be split into separate
segments. Figure 3.1 illustrates the two segmentation levels overlaid a colour aerial
image (generated by eCognition 4.0 software). The grass field F has been under-
segmented in Figure 3.1(a) and over-segmented in Figure 3.1(b). In general, over-
segmentation is less serious a problem than under-segmentation, since a posteriori
merging of segments is easier than splitting them. Since there is no straightforward
relationship between similarity in the image and semantic similarity, it is preferable to
err on the side of over-segmentation and relax the external contrast requirement. In
short, a good segmentation is one that shows little over-segmentation and no under-
segmentation, and a good segmentation algorithm is one that enables the user to
derive a good segmentation without excessive fine tuning of input parameters.
32
Page 53
3.1.2 Pulse Coupled Neural Networks
Traditional computer science approaches in machine vision systems have achieve lit-
tle success when compared with visual analysis capabilities of children or animals,
and much less when compared with trained image analysts (Shi et al., 2009). To
improve the performance of machine vision systems, numerous biologically inspired
models have been developed in the past few decades. A pulse coupled neural network
(PCNN) is a biologically inspired spiking cortical model based on the understanding
of visual cortical models of small mammals. Instead of using rate coding in tradi-
tional neural network models, spiking neuron networks use pulse coding. Neurons
receive and do send out individual pulses, allowing multiplexing of information as
frequency and amplitude of sound Vreeken (2003). Spiking neural networks raise
the level of biological realism by using individual spikes, which allows incorporating
spatial-temporal information in communication and computation, like real neurons
do. PCNN is one of the well-formulated biological inspired spiking cortical models
which can be adopted in computational systems. Before getting down to the details
of PCNN, let us briefly summarize its biological background: visual cortex theory.
The study of the visual cortex has greatly contributed to the current understanding
of mammalian and human visual pathways and their role in the visual perception.
Visual cortex refers to the primary visual cortex (V1) and other extrastriate visual
cortical areas such as V2, V3, V4, and V5. The primary visual cortex (V1) is located
in and around the calcarine fissure in the occipital lobe (Figure 3.2 Polyak, 1957).
The primary visual cortex is the best studied visual area in the brain and also the
simplest, earliest cortical visual area. It is highly specialized for processing informa-
tion about static and moving objects and is excellent in pattern recognition (Ng et al.,
2007). V2 is the the second major area in the visual cortex, which receives strong
feedforward connections from V1 and sends strong connections to V3, V4, and V5.
These cortical areas are interconnected with a high degree of regularity and precision
33
Page 54
Figure 3.2: Visual cortex
(Zeki, 1993). This visual cortex is of enormous complexity and much is still to be
learned from how the visual cortex processes the information. However, an image
processing engine may never mimic the full functionality of the visual cortex system,
but only use a few of its basic features.
Biological models of the visual cortex depict each neuron as a coupled oscillator
with connections to other neurons (Lindblad & Kinser, 2005). In the past decades,
visual cortex theory has been intensively studied and may computational models have
been proposed. Eckhorn model (Eckhorn et al., 1989) is one of these computational
models which derived from analyzing the cat visual cortex and it is also the basis for
most PCNNs. An illustration of an Eckhorn-type neuron is shown in Figure 3.3. The
neuron contains two inputs: the feeding and the linking compartments. The linking
only receives local stimuli while the feeding receives an external stimuli as well as
internal stimulus. The feeding and the linking are combined together to create the
membrane voltage, which is then compared with a local threshold to generate the
output. Eckhorn model provides a simple and effective way for studding synchronous
pulse dynamics in the neuron networks and paved the way for the generation of
pulse coupled neural network. Afterwards, Johnson et al. carried on a number of
modifications and variations to tailor its performance as image processing algorithms
34
Page 55
Figure 3.3: The Eckhorn-type neuron
(Johnson, 1994; Johnson & Padgett, 1999).
PCNN has the fundamental characteristic that each neuron has the ability to
capture neighboring neurons in similar states and thus provides a new biologically
inspired parallel algorithm for image processing applications. Over the last decade,
PCNNs have been utilized for a variety of image processing applications, including
image segmentation, edge detection, feature generation, noise reduction, face recogni-
tion, motion detection, and so on (Ranganath & Kuntimad, 1999; Waldemark et al.,
2000; Gu, 2008). Most PCNNs are based on the Eckhorn model sharing a common
mathematical foundation but with variations each having their own unique terms
(Wang et al., 2010). When applied to image processing, PCNN is a single layered,
two-dimensional, laterally connected neural network of pulse coupled neurons. Each
neuron corresponds to one pixel in an input image, receiving its corresponding pixel’s
color information (e.g. intensity) as an external stimulus. The neuron also connects
with its neighboring neurons, receiving local stimuli from them. Figure 3.4 shows
a typical structure of a standard PCNN. The input part imports external and local
inputs to the neuron by the feeding and linking part respectively. In the linking
part, external and local stimuli are combined in an internal activation system, which
accumulates the stimuli until it exceeds a dynamic threshold, and then the pulse
generator produces a pulse output. Through iterative computation, PCNN neurons
produce temporal series of pulse outputs. Similarities in the input pixels cause the
35
Page 56
associated neurons to pulse synchronously, thus indicating similar image structures or
textures. These temporal series of pulse outputs contain information of input images
and can be utilized for various image processing applications.
This standard PCNN model is usually described by the following 5 coupled equa-
tions:
Fij(t) = Sij(t) + e−αF · Fij(t− 1) + VF · (M ∗ Y (t− 1))ij (3.1)
Lij(t) = e−αF · Lij(t− 1) + VL · (W ∗ Y (t− 1))ij (3.2)
Uij(t) = Fij(t) · (1 + β · Lij(t)) (3.3)
Yij(t) =
1
0
Uij(t) > �ij(t)
Otherwise
(3.4)
�ij(t) = e−α� ·�ij(t− 1) + V�Yij(t− 1) (3.5)
where t is the iteration step, Fij is the feeding input, Lij is the linking input, Sij
is the intensity of pixel (i, j) , W andM are the weight matrices, ∗ is the convolution
operator, Y is the output of neurons; U is the internal activity, β is the linking
strength; Θ indicates the dynamic threshold; αF , αL and αΘ are the feeding, linking
and threshold delay coefficients respectively; VF , VL and VT are the feeding, linking
and threshold magnitude scales respectively. The dynamic thresholds of all neurons
are zero at t < 1 .
In summary, pulse coupled neural networks are neural models proposed by mod-
eling the cat’s visual cortex and developed for high-performance biologically inspired
36
Page 57
Figure 3.4: The structure of a PCNN neuron
image processing. PCNNs are spatial-temporal-coding models. PCNN neurons pro-
duce temporal series of pulse outputs through iterative computation. The temporal
series of pulse outputs contain important information of input images and can be
utilized for various image processing applications.
3.2 Power Line Detection
The Hough transform is an effective tool for detecting straight lines in images, thus
it is a natural choice for the task of automatic power line detection. In real appli-
cations of straight line detection, an edge detector is often used to remove irrelevant
information and reduce the computational cost prior to the Hough transform being
employed. However, the application of classic edge detectors to the aerial images has
demonstrated that they are sensitive to image noise, due to complex and irregular
ground coverage. In this research, we take advantage of the characteristics of power
37
Page 58
lines in aerial image and propose a filter based on a simplified pulse coupled neural
network (PCNN) model. This filter can simultaneously remove the background noise
of power lines as well as generate edge maps. After that, an improved Hough trans-
form is used by performing knowledge-based line clustering in Hough space to refine
the detection results.
3.2.1 Characteristics of Power Lines
Automatic power line detection from aerial imagery is a rather challenging task,
especially when the background is cluttered. There has been very limited investigation
involved in developing algorithms for automatic extraction of power lines from aerial
images because power lines in traditional aerial images are too small to be detected
due to the flight height and resolution of the camera. Some work on the visual control
of an Unmanned Aerial Vehicle (UAV) for power line inspection has been simulated
using a laboratory test rig (Golightly & Jones, 2005). They proposed an automatic
power line detection method based on the Hough transform, but the approach was
just a simulation of straight line detection and not evaluated in real image data. More
recently, the Radon transform was used to extract line segments of the power lines,
followed by a grouping method to link each segment, and a Kalman filter was finally
applied to connect the segments into an entire line (Yan et al., 2007). Although some
properties of power lines in the aerial image were discussed, the algorithms developed
by Yan et al. only focus on straight line detection, image edges and other mistakable
linear features which are similar to power lines were not considered. Although straight
line detection is a common and well studied research area in machine vision, most
of the existing algorithms take bottom-up approaches which just use the intensity of
single pixels. However, the qualitative performance of these algorithms varies widely
across application domains as our notion of what constitutes a line can vary from one
application area to another. Due to the wide variation of line types encountered in
38
Page 59
Figure 3.5: Power lines from different perspectives
the aerial images that are not of interest, we require a more top-down approach that
takes advantage of our understanding of line in this application area.
Based on our observation, power lines in aerial image have the following charac-
teristics:
(1) A power line has uniform brightness and the color looks different from upward
and downward view. Viewing from the ground power line is usually dark, whereas
viewing from the sky power line is brighter than the background simply because it is
made of specific metal and has larger light reflection.
(2) A power line approximates a straight line although power line sag often exists.
Due to the limited coverage area of a single image, the widths of power lines in the
image tend to be similar. In addition, the lengths of power lines in one image are
similar and power line is usually the longest line as it crosses the entire image.
(3) Power lines are approximately parallel to each other. Due to the forward
angle of imaging sensor and deviation from centre, power lines in the image are not
completely parallel. However, the intersection of two power lines usually occurs far
out of range of the image due to the limited size of images, and the intersecting angle
of two lines is usually very small. Figure 3.5 (a) illustrates the scenario from the above
view and Figure 3.5 (b) shows the case from the forward view and offset centre.
39
Page 60
3.2.2 Design of Pulse Coupled Neural Filter
Given that power lines are made of special metal, they have different solar reflectance
compared to other background materials (e.g. grass, soil, and bitumen). This knowl-
edge can be used for preliminary detection of power lines from aerial images. Using a
filter to remove the irrelevant information will be helpful to reduce the false detection
rate as well as the computational cost of line detection algorithm. Threshold filtering
may be a practical solution. However, it is not robust because filtering by a threshold
is sensitive to image noise and different thresholds may be required due to changing
light conditions of the captured images. In this research, a pulse coupled neural filter
(PCNF) is developed for preliminary detection of power lines as well as edge maps
generation.
One of key problems of using PCNN is selecting the network parameters. The
relationships of network parameters and its performance in image analysis is still
not clear (Ma et al., 2005; Wang et al., 2010). There are so many parameters in
standard PCNN model that it is hard to select appropriate parameters for various
image analysis tasks. In addition, classic PCNN model involves high computation
cost because temporal dependence between iterations is explicitly used in the feeding,
linking and threshold updating components. In this research, a simplified model is
developed inheriting the characteristics of classic PCNN model and is described by
the following 5 equations:
Fij(t) = quantized− I (3.6)
Lij(t) =∑
k,l∈KWLkl × Ykl(t− 1) (3.7)
Uij(t) = Fij(t) · β · Lij(t) (3.8)
40
Page 61
Figure 3.6: Linking weight matrix
Yij(t) =
1
0
Uij(t) > �ij(t)
Otherwise
(3.9)
�ij(t) =
�ij(t− 1)− step
VT ×�ij(t) if Yij(t− 1) 6= 0
(3.10)
The symbols in the above equations represent the same meanings as in the stan-
dard PCNN model discussed in the previous section. We simplified the feeding input
of the neuron to be just external stimulus from the corresponding pixels while the
stimuli from neighboring neurons were not considered. This simplified model still
keeps the characteristics of classic PCNN in that temporal dependence is implicitly
included as the neuron outputs in the linking part come from the previous iteration.
In this research, original RGB images are transformed to HIS color space and the in-
tensity component I is used as the feeding input. Moreover, the intensity component
is uniformly quantized to 64 levels in order to reduce the intensity variation in image
regions. This is helpful for filtering regions with similar intensities.
The linking input has also been simplified in that only 8 neighbors (i.e. 3 × 3
window) are adopted in the linking weight matrix WL. Each element in WL is the
reciprocal of Euclidean distance between this element and the centre of the window
41
Page 62
(Figure 3.6). In this case, neighboring neurons with the closer distance have greater
impact on the central neuron. For the calculation of neuron internal status U , a new
linear modulation of feeding and linking input is used to avoid zero-valued pixel’s
influence to the internal status of its neighboring pixels. The linking strength β in
this research is set to be 0.2. The pulsed output of neuron Y is binary, and if the
neuron pulsed Y = 1, otherwise Y = 0. Initially is set to be a zero-valued matrix.
Whether a neuron can pulse or not depends on the comparison of its internal status
U with the dynamic threshold �. The threshold � is initialized to be larger than the
maximum value of external stimulus and gradually decays. The dynamic threshold �
is changing during the iteration operation to control neuron pulse. If the neuron has
been pulsed, a large threshold is given to this neuron by implying a magnitude scale
VT to make sure it will not pulse in a while. Otherwise the threshold of this neuron
will be decayed by subtracting a step value step.
Given that power lines have higher light reflectance and are usually brighter than
the background, they can be roughly detected from the temporal series of PCNN
pulsed outputs. In the early stage of the iteration, neurons correspond to power lines
pulsed because they have larger external stimulus than most of the background area.
Figure 3.7 shows an aerial image contain power lines and 7 temporal pulse outputs in
different iterations of PCNN. As is shown in Figure 3.7, in the first iteration of PCNN,
no neuron pulses because of the high initial threshold. With the progress of PCNN
iteration, neurons corresponding to power lines pulse earlier than other objects in
the image. From the temporal outputs of PCNN, different objects of interest can be
extracted because PCNN tends to group pixels with similar intensities and structures
and also considers spatial relationships among neurons. The temporal information
generated by PCNN is also useful for image segmentation and image noise location,
which is an advantage over other filters.
In this research, the following rules are used to locate noisy pixels and remove
42
Page 63
Figure 3.7: Original image and 7 pulse outputs of PCNN
them (Zhang et al., 2005): if pixel (i, j) pulsed and most of its neighboring neurons
have not pulsed, which indicates that the intensity of this pixel is too large and can
be considered as a noisy pixel. Usually this type of noise pulse the earliest during
PCNN iteration. For dark noise, the same rule can be applied on the inverse image.
Once noisy pixels are located, a median filter is applied to change the intensities of
these noisy pixels.
Moreover, edges of the binary pulse outputs can also be detected by using the same
PCNN model. The width of edge can be determined by controlling the transmitting
distance of neuron pulses. In this research, the following algorithm is used to detect
edges in the binary filtered image:
43
Page 64
Algorithm 1: Detect edges in binary image using PCNN
Input: binary image Bin
Output: one-pixel width edge set Edge
Step 1: initialize the pulse output Y to be the binary image and save it to Y 0:
Y 0 = Y = Bin
Step 2: calculate the linking input L using equation 3.7 with 3×3 linking weight
matrix
Step 3: calculate the neuron internal status U using equation 3.8
Step 4: calculate the output the each neuron Y using equation 3.9, with a
threshold larger than the minimum value of U : � = min(U) + 0.01; Y =
step(U −�)
Step 5: the edge map can be obtained by logical operation exclusive disjunction
(XOR) on Y 0 and Y : Edge = Y 0⊕ Y1
In summary, our proposed pulse coupled neural filter (PCNF) can be described
by Figure 3.8. The simplified PCNN is used to generate temporal pulse outputs
which contain important information for discriminating image noise, target object
(power line) and image background. However, there is no automatic method to
determine which output contain power lines and which just contain image noise.
According to our experiments, in most cases the output of the third PCNN iteration
is a safe choice because pixels corresponding to power lines pulsed and most of the
background pixels have not pulsed. After that, morphological filter is applied to the
binary pulse image for post-processing purpose which will make the detected object
more continuous. Finally the same PCNN model is used to generate the edge image
according to algorithm 1.
Figure 3.9 compares the results using Canny filter, Sobel filter and our proposed
pulse couple neural filter (PCNF) on synthetic images with and without noise. The
aim of the simulation is try to detect the three white lines in images and generate
44
Page 65
Figure 3.8: The structure of pulse coupled neural filter
Figure 3.9: Comparison of Canny filter, Sobel filter and PCNF
the edge map. As is shown in the figure, the Sobel filter tries to detect all edges in
the image and is very sensitive to image noises. The Canny filter can be less sensitive
to image noise by tuning the parameter σ (the standard deviation of the Gaussian
filter). The proposed pulse coupled neural filter (PCNF) is more flexible because it
can be used to detect the edges of interest rather than detect all edges in the image.
Moreover, PCNF is more robust when image is contaminated with pepper and salt
noise (see the second row of Figure 3.9).
45
Page 66
3.2.3 Knowledge-based Line Clustering in Hough Space
The Hough transform is used to detect parameterized shapes (e.g. lines, circles)
through mapping each point to a new parameter space in which the location and ori-
entation of certain shapes could be identified (Aggarwal & Karl, 2006). When applied
to detect straight lines in an image, the Hough transform usually parameterizes a line
in the Cartesian coordinate to a point in the Polar coordinate (Figure 3.10) based on
the point-line duality using the following equation:
x · cos(θ) + y · sin(θ) = ρ (3.11)
Alternatively, this parametrization maps collinear points into a set of intersecting
sinusoidal curves in the parameter space. The lines in the Cartesian coordinate can
be estimated by detecting points of intersections of these curves (i.e., peaks) in the
Polar coordinate (Agganval & Karl, 2000). These peaks in the parameter space can
be obtained using a voting mechanism. The Hough transform has been proven to be
effective method for line detection. However, it does have some limitations such as
high computational cost and mistaken detection of spurious lines. In order to solve
these problems, Fernandes and Oliveira proposed an improved Hough transform by
introducing a new voting scheme to avoid the brute-force approach of one pixel voting
for all potential lines (Fernandes & Oliveira, 2008). Instead, the approach operates
on clusters of approximately collinear pixels by using an oriented elliptical-Gaussian
kernel that models the uncertainty associated with the best-fitting line with respect
to the corresponding cluster. Figure 3.10 (a) and (b) show their voting procedures
and the 3D visualization of voting maps respectively. The letters A-H indicate the
clustered segments and the peaks that each segment voted for. In this research, we
extended this improved Hough transform for power line detection purpose.
The Hough transform is an effective tool to detect straight lines, but does not
46
Page 67
Figure 3.10: Voting procedures and the 3D visualization of voting maps
47
Page 68
intelligently identify power lines. Any linear objects will be detected, such as edge
of roads and rivers, fences, etc. Although using PCNF can significantly decrease the
influence of other linear edges, problem still exist especially when the linear object
has similar color with power lines. In order to discriminate power lines from other
linear objects, we use a k-means algorithm to cluster all detected lines to identify
the lines of interest. The objective of data acquisition in our project is to achieve a
low flying altitude where a typical 12mm transmission lines will be represented by at
lease two pixels. Therefore, each power line is detected as at least two Hough lines in
the edge image. Power lines are almost parallel with very similar angles, and a power
line is usually the longest line as it crosses the entire image, while other detected
lines do not have this regular property. Based on this idea, a clustering schema is
employed in the Hough transform voting procedure to group the parallel lines and
output the cluster with largest summation of votes as candidate powerlines (as shown
in Algorithm 2). Figure 3.10 (c) and (d) illustrate this clustering schema and show the
3D visualization of voting maps. Parallel lines are grouped together and the cluster
with largest summation of votes indicates that the dominate lines of the image are in
this cluster. Figure 3.11 shows an example of the power line detection results.
48
Page 69
Figure 3.11: An example of power line detection results
Algorithm 2 Knowledge-based line clustering in the Hough space
Input: detected Hough line set Ls(ρ, θ, votes)i (i = 1, 2, · · · , n), where n is
the number of detected lines, ρ and θ are the coordinates of pixels in Hough
parameter space, votes is the accumulate number of votes of each detected
Hough line.
Output: candidate power lines CPLs
Step 1: calculate the line groups Cj(j = 1, 2, · · · , k) using K-means on θ values
of Lsi(i = 1, 2, · · ·n), where k is the number of line clusters (k = 4in this
research).
Step 2: calculate the summation of votes in each cluster SumV otesj =∑kj=1 Ls(votes)i
Step 3: find the cluster with largest Cm value of SumV otes, where
SumV otesm = max(SumV otesj) (j = 1, 2, · · · , k)
Step 4: output the lines in cluster Cm as candidate power lines CPLs = Cm
49
Page 70
3.3 Individual Tree Crown Detection and Delineation
The application of object-based approaches to the problem of extracting vegetation
information from images requires accurate delineation of individual tree crowns. Once
individual trees are delineated, not only spectral but also spatial information within
the tree crowns can be used for classification. In this thesis, an automatic method is
developed to detect and delineate tree crowns from multi-spectral imagery. The devel-
oped method employs spectral features as input to a simplified Pulse Coupled Neural
Network (PCNN), followed by post-processing using morphological reconstruction.
3.3.1 Spectral Properties of Vegetation
In the past decades, a variety of approaches have been proposed for individual tree
crown delineation (Erikson & Olofsson, 2005; Culvenor, 2003). According to our
literature review, these approaches may be broadly categorized as either local max-
ima/minima, template matching, region growing, or edge detection approaches (Li
et al., 2008). Although many classic segmentation algorithms are applied in the RGB
colour space, their effectiveness in visually complex environments is somewhat lim-
ited. A classic example is the segmentation of trees with heavy shadows, as shown in
Figure 3.12. In this case, it is very hard to segment the tree crown accurately since
the tree crown and its shadow are very similar in both colour and texture.
One prospective improvement is through the use of spectral features outside the
visible spectrum. Remote sensing of vegetation has been successful thanks to the
unique spectral characteristics of green vegetation, which have low reflectance in red
and high reflectance in near-infrared (NIR) wavelengths. Spectral vegetation indices
have been used intensively to estimate the vegetation density and plant biophysical
parameters from satellite images for many years. However, little work has been done
on single tree delineation utilizing this important knowledge. With the advent of low-
50
Page 71
Figure 3.12: Example of tree crown and shadow in a RGB image
price multi-spectral cameras, more attention can be paid on applying this knowledge
to vision based plant information extraction.
Sensors collect and store data about the spectral reflectance of natural features
and objects. Different types of land covers can be identified by using the spectral fea-
tures of certain surface materials. The dominant method for interpreting vegetation
biophysical properties from optical remote sensing data is through spectral vegeta-
tion indices. Vegetation indices are combinations of reflectance measured in two or
more spectral bands, which aim at estimating canopy biophysical properties through
enhancing the spectral contribution of vegetation, while minimizing the contribution
of underlying soil or understory vegetation (Rautiainen, 2005). Plants have a distinc-
tive spectral signature, characterized by a low reflectance in the visible part of the
solar spectrum, and a high reflectance in the near-infrared region due to Chlorophylls
absorbing a large amount of radiation in the red band. The lack of absorption in
the adjacent near-infrared region results in a strong absorption contrast between NIR
and the red band (Myneni et al., 1995). Therefore, the band ratio of NIR and red is
used as a spectral feature to discriminate plant from the background.
51
Page 72
3.3.2 Initial Tree Crown Segmentation
The developed tree crown segmentation algorithm employs a simplified PCNN that
uses spectral features as input, post-processed using morphological reconstruction.
PCNN itself has been successfully applied to many image segmentation problems
(Kuntimad & Ranganath, 1999; Stewart et al., 2002). A PCNN is powerful in ex-
tracting the fundamentals of an image (e.g. edges, textures, and segments) and thus
it is used to segment tree crowns from imagery. The basic idea of image segmenta-
tion using PCNN is that groups of neurons (pixels) in a similar state tend to pulse
synchronously. The PCNN model used for this task is expressed as follows.
Fij(t) =ρNIRρred
(3.12)
Lij(t) =∑
k,l∈KWLkl × Ykl(t− 1) (3.13)
Uij(t) = Fij(t)× (1 + β · Lij(t)) (3.14)
Yij(t) =
1
0
Uij(t) > �ij(t)
Otherwise
(3.15)
�ij(t) =
�ij(t− 1)− step
VT ×�ij(t) if Yij(t− 1) 6= 0
(3.16)
The symbols in the above equations represent the same meanings as in the PCNN
model discussed in the previous section. The only difference is the feeding part,
where the external stimulus of the neuron is calculated from the feature space using
spectral band ratio. ρNIR and ρred are the spectral reflectance of NIR and red band
52
Page 73
respectively.
Due to the strong absorption contrast between NIR and red band in trees, the
corresponding neurons have greater external stimuli and thus pulse more frequently.
By accumulating pulse outputs, a threshold can be applied to segment tree foliage
from background pixels. It should be noted that this ratio takes the form of the
vegetation index, Ratio Vegetation Index (RVI) (Jordan, 1969). Although other veg-
etation indexes could have been used, including the well known Normalized Difference
Vegetation Index (NDVI) (Rouse et al., 1973), RVI was found to better suit the prob-
lem, maximizing the contrast between vegetation and non-vegetation. An example of
this is shown in Figure 3.13, where NDVI and RVI outputs are shown for the same
multi-spectral image.
Figure 3.14 (b) shows an example of the initial tree crown segmentation results
by using PCNN in the spectral feature space. As we can see from the result, trees are
successfully detected but noise and discontinuities still exist requiring post process-
ing to delineate the whole crown. A basic morphological opening operator is applied
to remove noise in the binary image, and morphological reconstruction is used for
hole-filling (Figure 3.14 (c)). In this research, the ‘hole’ in binary image is defined
to be an area of dark pixels surrounded by lighter pixels. It looks like “valley” point
(minimum value), because it is located inside of the objects and is not connected
to the boundary of the enclosing object, whereas the surrounding points are treated
as “peak” points (maximum value) which indicate the locations of objects. The re-
construction starts from these peak points and spreads out over the whole image to
remove these regional valley points based on the connectivity of pixels. Morphologi-
cal reconstruction is a morphological transformation which includes two images and
a structure element (Soille, 2003). One image is called “marker image” which are the
starting points of morphological processing, and the other image, the “mask image”,
constrains the transformation procedure. The structure element defines the direction
53
Page 74
Figure 3.13: Comparison of two vegetation indexes
in which the reconstruction progresses and neighborhood directions determine the
number of objects and their boundaries. If the size of the neighborhood is too big,
more unrelated objects would be connected, and if the size of the neighborhood is too
small the related objects will remain separated. In the experiments, four-connected
background neighbours are used which worked well. The marker image fm is set
to the maximum pixel value except on the border where the original image value is
kept as shown in Equation 3.17, and the original image is used as the mask f . In
the equation, fp is the value of pixel p, vmax indicates the maximum value in the
four-connected neighborhoods of pixel p.
fm(p) =
f(p)
vmax
p is on the border
otherwise
(3.17)
54
Page 75
Figure 3.14: Tree crown segmentation from CIR image
55
Page 76
3.3.3 Decomposition of Tree Clusters Using Watershed Algo-
rithm
The major problem for the initial tree crown segmentation algorithm is that under-
segmentation of tree clusters occasionally happen. It is often not capable to segment
individual trees, especially when trees grows in a cluster are closely contacted to each
other. Figure 3.15 compares the initial segmentation result with the ground truth
(obtained by manual segmentation). Although the segmentation seems satisfactory
from visual assessment, two adjacent trees has not been separated and one small tree
with sparse crown has not been detected in the dashed rectangular area. In order to
improve the decomposition of tree clusters, a watershed-based algorithm is used after
the initial tree crown segmentation algorithm.
The watershed algorithm has proven to be powerful and fast in image segmen-
tation (Bleau & Leon, 2000). It is based on the topographic representation of a
gray-scale image: light and dark spots are seen as hills and hollows in a landscape.
The watershed algorithm developed by Fernand Meyer (Meyer, 1994) is used in this
research. This algorithm is based on the Immersion Approach. The immersion pro-
cess is describes as below: The most intuitive way to explain watershed segmentation
is the Immersion Approach: imagine the surface being immersed in water, with holes
pierced in local minima; water fills up hollows starting at these holes, and dams are
built to prevent the merging of different catchment due to further immersion; this
immersion process will eventually reach a stage when only the boundaries of dam
(the watershed lines) is visible. The objective of watershed segmentation is to find
all the watershed lines. In principal, watershed segmentation depends on ridges to
perform a proper segmentation. In order to apply the watershed algorithm in binary
image, we use a distance transform to convert the binary image to gray-scale image.
The distance from object pixels (i.e. tree crowns, represented as white regions in
the binary image) to the nearest background pixel (zero-valued pixel). A City Block
56
Page 77
Figure 3.15: Comparison of initial tree crown segmentation and the ground truth
57
Page 78
Figure 3.16: Tree cluster decomposition using watershed algorithm
distance function is used in this research, in which the distance of pixel (xi, yi) and
(xj, yj) is defined as
d = |xi − xj|+ |yi − yj| (3.18)
Figure 3.16 shows an example of the watershed-based tree cluster decomposition
using colour infrared (CIR) imagery.
58
Page 79
3.4 Combining LiDAR Data and Multi-spectral Im-
agery for Improved Tree Crown Segmentation
The developed tree crown segmentation algorithm for multi-spectral imagery success-
fully discriminates vegetation and non-vegetation. However, the algorithm fails to
discriminate grass, shrub and trees because they are often very similar in both colour
and texture. The discrimination is even more difficult especially when they grow close
to each other. The 3D nature of LiDAR data makes it especially suited in this situ-
ation. Airborne laser scanning has been successfully applied in single tree detection
and modeling in a number of (Brandtberga et al., 2003; Koch et al., 2006). However,
the success and quality of the results depend on the point density of LiDAR data as
well as the size, shape and distribution of trees. Multi-spectral imagery has comple-
mentary information with LiDAR data because both spectral and texture information
are provided. Therefore, combining LiDAR data and multi-spectral imagery has great
potential to improve individual tree crown detection and delineation. In this study,
a ground filtering algorithm is used to separate terrain and object points and then a
region level fusion method is used to improve individual tree crown segmentation.
3.4.1 Ground Filtering Using Statistical Analysis
LIDAR systems provide both elevation and intensity data records for each laser return.
In theory, LIDAR intensity is defined as the ratio of the strength of reflected light to
that of emitted light (Chust et al., 2008). Studies on LIDAR have primarily focused on
geometric information rather than radiometric information while the characteristics
of laser intensity are not well understood (Yoon et al., 2008). However, intensity
information is very useful in the classification of LIDAR point cloud data as different
materials often have quite different reflectance. In this study, we take advantage of
the LIDAR intensity and try to classify points of ground, non-ground points.
59
Page 80
In raw LIDAR point cloud data, both bare-ground and non-ground objects gen-
erate backscatter. Ground points need to be identified and eliminated to accurately
identify non-ground objects, such as power lines and trees. In this case, an effective
method for automatic segmentation of ground and object points is a critical process.
Currently, most ground filtering methods are based on the assumption that natural
terrain variations are gradual, rather than abrupt. Therefore, LIDAR elevation data
is often used to calculate elevation differences and slopes based on pixels within a
roving, two-dimensional window or along a scan line in a specified direction (Meng
et al., 2009). However, intensity data is seldom used for point classification. In this
research, an algorithm based on a statistical analysis of the data is developed to seg-
ment object points from ground points. After that, the Hough transform is employed
to discriminate power lines from other objects (i.e. vegetation).
According to the probability theory, probability distributions can be uniquely
characterized by its moments (Stricker & Orengo, 1995). If we interpret ground and
objects as distinctly different probability distributions, moments can be used to char-
acterize them. However, it is not an easy task to perform a rigorous statistical test to
discriminate between them. Fortunately, we do not need to know the exact probabil-
ity distributions that represent ground and objects. A function that yields a relative
difference suffices for separation of ground and object points. The central limit the-
orem states that the naturally measured samples will follow the normal distribution.
Based on this idea, Bartels and Wei assumed that ground points of LIDAR data
follow the normal distribution while other object points may disturb the distribution
(Bartels et al., 2006). They use moments to describe the distribution of the point
cloud data and separate ground and object points. In this research, we adopt and
extend this idea for identifying the objects of interest in power line corridors. The
normal distribution is uniquely characterized by its first two moments (mean and
variance) (Najim, 2004). However, according to our assumption that objects will not
60
Page 81
follow the normal distribution, mean and variance may not be representative mea-
sures. Therefore, higher order moments are used to characterize the distribution of
object points. Skewness is the third moment about the mean. Its probability distri-
bution characterizes the degree of asymmetry of the distribution around its mean and
is defined as below. A skewness of zero (sk = 0) is indicative of a normal distribution,
that is, skewness of the symmetric is zero. Negative skewness indicates dominance of
valleys in the data, while positive skewness indicates dominance of peaks.
sk =
(1
N × δ3×
N∑i=1
(si − µ)3
) 13
(3.19)
where N is the total number of the point cloud; si is the value (e.g., height,
intensity) of the point; µ and δ are the arithmetic mean and standard variance of all
points, and they are defined as:
µ =1
N
N∑i=1
si (3.20)
δ =
√√√√ 1
N
N∑i=1
(si − µ)2 (3.21)
Kurtosis is the forth moment about the mean. This characteristic of the proba-
bility distribution measures the relative flatness or peakness of the distribution about
its mean. The normal distribution has a kurtosis equals to 3. Kurtosis larger than
three indicates the peakness of the distribution, while smaller than three indicates
the flatness of the distribution.
ku =
(1
N × δ4×
N∑i=1
(si − µ)4
) 14
(3.22)
We assume that skewness and kurtosis can describe the characteristics of the
distribution of LIDAR object points and employ these two measures to identify critical
61
Page 82
Figure 3.17: Change sequence of skewness and kurtosis
values that separate ground from non-ground points. The change between ground
points and object points is observed through the change sequences of skewness and
kurtosis as shown in Figure 3.17, which is generated as follows: first, skewness and
kurtosis of the point cloud data are calculated, and then points with largest value of
height or intensity are removed. Next, skewness and kurtosis of the remaining points
are calculated. This procedure iterates until the data is exhausted. Bartels and Wei
only used skewness measures and assumed that the points are objects if skewness is
larger than 0 (Bartels et al., 2006). However, the LIDAR data is not balanced, and
different scenes will have different skewness and kurtosis sequences (Bao et al., 2008).
The biggest inflection of the sequence is viewed as the position at which ground and
object points are separated from each other. For example, location A of skewness
curve and location B of kurtosis curve indicate the separating points in Figure3.17.
Location of A and B is the separating line between ground and object points.
Object points are on the right side and the ground points are on the left side. A
62
Page 83
and B represent the number of points which are removed to calculate skewness and
kurtosis. The points on the left of A and B are ground points, whereas the points on
the right are object points. Thus, we can calculate the number of object points based
on the total point cloud number and the location A and B as following equation.
nObjectPnts = nTotalPnts− nAB (3.23)
In this research, skewness and kurtosis are combined together as a measure to
decide the threshold for discriminating ground and object points from LIDAR data.
The algorithm is described as follows:
Algorithm 3 Separating ground and object points using statistical analysis
Input: LiDAR raw point cloud data
Output: Object points ObjectPnts
Step 1: Load the raw point cloud data and remove any noisy points having
much larger intensity and height values than the surrounding points
Step 2: Calculate skewness and kurtosis change sequence on intensity values as
described above, let them be skList and kuList
Step 3: Find the last local maxima location of skList and kuList, let them be
A and B respectively
Step 4: Calculate the number of object points based on skewness and kurtosis
sequence following equation 3.23, let them be nA and nB respectively
Step 5: Sort skList and kuList ascendingly, let sorting results be skListSort
and kuListSort respectively
Step 6: Select the first nA points in skListSort as object points skObjectPnts
Step 7: Select the first nB points in kuListSort as object points kuObjectPnts
Step 8: Calculate the intersection of skObjectPnts and kuObjectPnts as final
object points, ObjectPnts = skObjectPnts ∩ kuObjectPnts
63
Page 84
3.4.2 Region-level Fusion of LiDAR and Georeferenced Multi-
spectral Imagery
The first step towards fusing LiDAR and multi-spectral imagery is referencing. This
is also know as sensor alignment or registration which establishes a common reference
frame for different sensor data. If the two sensors are mounted on the same aerial
platform then the navigation system (GPS/IMU) provides position and attitude data
for the aerial camera and the LiDAR system. Since the GPS/IMU units and the two
sensors are physically separated, the success of direct orientation relies on how well the
relative position and attitude of the various system components can be determined.
The data used has already been georeferenced by the commercial data provider, which
make it simple in this application.
Assuming that the georeference accuracy is good enough, LiDAR data can be
considered as an additional image layer of the multi-spectral imagery. After ground
filtering, object points are obtained to refine the tree crown segmentation. The fusion
process is described in Figure 3.18. First, a pair of LiDAR point cloud data and
georeferenced multi-spectral imagery are processed separately. On one side, an initial
segmentation is conducted in spectral feature space using the algorithm described
in the previous section. After that, regions in the initial vegetation segmentation
map are labeled for the following fusion process. On the other side, a ground filtering
algorithm using statistical analysis is conducted to separate terrain and object points.
Then the object points are gridded to create a 2.5D depth image by putting a uniform
fix-sized grid over the points. The lowest Z coordinate (elevation) is kept if multiple
points drop to one grid and the value of a grid (i.e. pixel) is set to be zero if no
points drop to that grid. The 2.5D depth image is obtained by a gridding space of
15 centimeters , so each pixel of the depth image corresponds the same size with
the multi-spectral image. The 2.5D depth image is then integrated with the labeled
vegetation segmentation map. A simple thresholding process is used in order to
64
Page 85
Figure 3.18: Framework of LiDAR and georeferenced multi-spectral imagery fusionfor individual tree crown segmentation
remove grass and low vegetation. The region mean height histogram is calculated to
visualize the height difference among regions. It is observed that the mean height of
a region which contains grass and low vegetation points is much lower than a region
which contains only trees. Finally, a watershed-based segmentation is employed to
the further decompose the tree clusters to individual trees.
3.5 Summary
In this chapter, an overview of geographic object based image analysis is first intro-
duced. After that, the basic theory of pulse coupled neural networks is presented and
followed by the technique details of the developed algorithms for power line detection
and individual tree crown segmentation from aerial imagery. A method of combin-
ing LiDAR data and georeferenced multi-spectral imagery for improving tree crown
segmentation is also presented.
65
Page 87
Chapter 4
Visual Feature Extraction and Data
Classification
Classification based on statistical machine learning techniques is one of the most
often used methods of information extraction from remotely sensed images. In this
chapter, a brief review of techniques in the area of remote sensing image classification
is conducted with special attention on visual features that may applied to object-
based tree species classification. Novel methods in spectral-texture feature extraction
and classification is developed and applied to object-based tree species classification
using multi-spectral imagery.
4.1 Related Work
4.1.1 Major Steps for Thematic Information Extraction From
Imagery
The major steps of image classification for thematic information extraction may in-
clude definition of the classification problem, selection of remotely sensed data and
training samples, visual feature extraction and selection of appropriate classification
67
Page 88
approaches, and accuracy assessment (Jenson, 2005; Lu & Weng, 2007).
(1) Definition of the classification problem
Stating the nature of the classification problem is a prerequisite for any successful
remote sensing image classification system. The analyst first specifies the geographic
region of interest on which to test hypotheses. The classes of interest to be examined
are then carefully defined in a classification scheme.
(2) Selection of remotely sensed data and training samples
Understanding the strengths and weaknesses of different types of sensor data is
essential for the selection of remotely sensed data. User’s need, the scale and char-
acteristics of the study area, the availability of various image data and their char-
acteristics (spatial and spectral resolution). A sufficient number of training samples
and their representativeness are critical for image classifications. Training samples
are usually collected from fieldwork, high resolution images or personal experience.
(3) Extraction and selection of visual features
Selecting suitable variables is a critical step for successfully implementing an image
classification. Many potential variables mat be used in image classification, such as
spectral features, object shapes or textures, contextual information, multi-temporal
images and multi-sensor images. The use of too many variables in a classification
procedure may decrease classification accuracy. Therefore, it is important to select
only the most useful variables for classification.
(4) Design of classification algorithms
Various classification algorithms may be used to assign an unknown pixel to one
of more possible classes. The choice of particular classifier or decision rule depends
on the nature of input data and the desired output. Different classification results
may be obtained depending on the classifiers selected.
(5) Evaluation of classification performance
Evaluation of classification results is an important process in the classification
68
Page 89
Figure 4.1: Tree crown shapes from triple views
procedure. Different approaches may be employed, ranging from a qualitative eval-
uation based on expert knowledge to a quantitative accuracy assessment based on
sampling strategies. In the process of accuracy assessment, it is commonly assumed
that the difference between an image classification and the reference data is due to
the classification error.
4.1.2 Visual Feature Extraction
The use of appropriate features to characterize an output class or object is funda-
mental for any classification problem. Generally, image features can be categorized to
spectral features, shape features and texture features. Shape features are very signif-
icant features which are very close to human perception (Loncaric, 1998). However,
due to the limitation of image segmentation and view angle variations, trees present
different shapes (Figure 4.1). Therefore, this section only reviews spectral and texture
features which may be used for tree species classification.
• Spectral Features
69
Page 90
Spectral features, known as colour features in the visual spectrum, have been widely
used for many image analysis applications. In visual spectrum, colour is represented
by colour space, which is an abstract mathematical model describing colours, typically
as three or four values or color components (e.g. RGB and CMYK are color models).
In computer vision, where HSV stands for hue, saturation, and value(or intensity),
and is also often called HSB (B for brightness). HSV is a cylindrical-coordinate rep-
resentations of points in an RGB color model, which rearrange the geometry of RGB
in an attempt to be more perceptually relevant than the Cartesian representation. In
RGB colour space, an object’s colour in a digital image are all correlated with the
amount of light hitting the object, and therefore with each-other, image descriptions
in terms of RGB components make object discrimination difficult. Descriptions in
terms of hue, saturation and value are often more relevant. They can be thought of as
similar as the neural processing used by human color vision, but there is no particular
reason to strictly mimic human color response (Schwarz et al., 1987).
In remote sensing, spectral features usually indicate the spectral reflectance of
different land covers. Sensors collect and store data of the spectral reflectance of
natural features and objects. This radiation can be quantified on an electromag-
netic spectrum. The electromagnetic spectrum is a continuum of electromagnetic
energy arranged according to its frequency and wavelength (Figure 4.2 1). By using
the spectral features of certain surface materials, many land covers can be identi-
fied. The dominant method for interpreting vegetation biophysical properties from
optical remote sensing data is through spectral vegetation indices. Vegetation in-
dices are combinations of reflectance measured in two or more spectral bands and
used to retrieve various biophysical variables, which aim at estimating canopy bio-
physical properties through enhancing the spectral contribution of vegetation while
minimizing the contribution of underlying soil or understory vegetation (Rautiainen,1Source: www.csc.noaa.gov
70
Page 91
Figure 4.2: Electromagnetic spectrum
2005). Vegetation indices are popular because they require very little expertise of the
physical principles of remote sensing or modeling and are computationally efficient.
Normalized Difference Vegetation Index (NDVI) (Rouse et al., 1973) was one of the
most successful of many attempts to simply and quickly identify vegetated areas and
their condition and it remains the most well-known and used index to detect live green
tree canopies in multi-spectral remote sensing data. Besides NDVI, the Soil Adjusted
Vegetation Index (SAVI) (Huete, 1988), and the 2-band Enhanced Vegetation Index
(EVI2) (Jiang et al., 2008)are also evaluated in this research. There are defined as:
NDV I =ρNIR − ρredρNIR + ρred
(4.1)
SAV I =ρNIR − ρred
ρNIR + ρred + L(1 + L) (4.2)
EV I2 = 2.5
(ρNIR − ρred
ρNIR + 2.4ρred + 1
)(4.3)
where ρNIR and ρred are the spectral reflectance of near-infrared and red band
respectively. L in SAV I is the soil adjustment factor, which is in the range [0, 1] and
typically set to 0.5.
71
Page 92
• Texture Features
Texture contains important information in image classification, as it represents the
content of many real-world images. Textures are characteristic intensity (or colour)
variations that typically originate from roughness of object surfaces (Davies, 2009).
Texture has always been a primary visual cue for defining area and relates to the
visual perception of coarseness or smoothness of image features. When defined in a
quantitative sense, texture is the property that relates to the nature of the variability
of pixel values (Coburn & Roberts, 2004). Image texture analysis is particularly
recommended for classification of digital images where the objects in the image (e.g.,
trees) are larger than pixel size, which is just the case in high resolution aerial images.
There are many different methods used to model texture features from images.
These approaches can be categorized as statistical, structural, signal processing based
and model-based features (Xie & Mirmehdi, 2009). Statistical texture features mea-
sure the spatial distribution of pixel values. Numerous statistical texture features have
been proposed, including the most well know grey-level co-occurrence matrix (GLCM)
(Haralick et al., 1973) and local binary patterns (LBP) (Ojala et al., 2002). Structural
approaches to texture analysis are based on the theory that textures are composed
of repeating elements called primitives. Structural method is limited because of the
amount of information that is required to adequately characterize texture. Signal
processing based texture features are commonly extracted by applying filter banks
to the image and computing the energy of the filter response. These features can be
derived from the spatial domain, the frequency domain and the joint spatial/spatial-
frequency domain. One of the well-know signal processing based texture features
is the Gabor filters, which have been successfully used in many texture analysis
applications (Turner, 1986; Manjunath & Ma, 1996). Model based texture feature
extraction approaches attempt to find stochastic processes that are able to model
texture. Model-based methods include, among many others, fractal models (Man-
72
Page 93
delbrot, 1983), random field models (Li, 2009) and auto-regressive models (Comer &
Delp, 1999). These techniques have had success in analyzing micro-textures but they
are not useful when little is known about the texture, or more than one texture exists.
Gabor filters are used to model the spatial summation properties of simple cells
in the visual cortex and have been adapted and popularly used in texture analysis.
Gabor filters are considered as a set of orientation, scale tunable edge and line detec-
tors, so the statistics of these features has been successfully applied to characterize
texture information (Manjunath & Ma, 1996; Clausi & Deng, 2005).Gabor filters are
used in this research as they provide both a local and global description for texture
information. Gabor filters can be categorized into two components: a real part as the
symmetric component and an imaginary part as the asymmetric component. The 2D
Gabor function can be mathematically formulated as:
gm(x, y) = a−mg(x′, y′) a > 1 (4.4)
g(x, y) =
(1
2πδxδyexp
[−1
2
(x2
δ2x
+y2
δ2y
)+ 2πju0x
])(4.5)
where x′ = a−m(x·cosθ+y·sinθ), y′ = a−m(−x·cosθ+y·sinθ), θ = 2πK,K is the number
of orientation, and a−mis the scale factor, δx and δy define the Gaussian envelope along
with the x and y directions respectively, u0 denotes the radial frequency of the Gabor
function, and j =√−1. Figure 4.3 shows the filter expressed in intensity levels for
four different orientations.
Local binary patterns (LBP) was first proposed by Ojala et al. to encode the
pixel-wise information in the texture images (Ojala et al., 2002). The LBP method
attempts to decompose the texture into small texture units and the texture features
are defined by the distribution (histogram) of the LBP code calculated for each pixel
in the region under analysis. The LBP value for the centre pixel is calculated using
73
Page 94
Figure 4.3: Gabor filter response to θ = 0◦, 60◦, 120◦, 180◦
the following equation:
LBPP,R =P−1∑i=0
u(ti − tc) · 2i (4.6)
where P is the total number of neighbouring pixels, R is the radius used to form
circularly symmetric set of neighbours. Figure 4.4 gives an example of binary code
in a neighbourhood which generates 28 possible standard texture units. The binary
labels of the neighbouring pixels is obtained by applying a simple threshold operation
with respect to the centre pixel tc. u(ti−tc) represents a step function, where u(x) = 1
when x > 0; else, u(x) = 0.
Although LBP has proven to be a powerful texture descriptor, a number of exten-
sions have been proposed to improve or supplement the classic LBP operators. We
also evaluated several extensions to the conventional LBP operator including: uni-
74
Page 95
Figure 4.4: Example of binary code calculation in a neighbourhood
form LBP, rotation-invariant LBP, and dominant LBP (DLBP) Ojala et al. (2002);
Liao et al. (2009). The uniform LBP is used to represent the most important mi-
crostructures, which contain at most two bitwise (0 to 1 or 1 to 0) transitions. The
rotation-variant LBP is produced by circularly rotating the original LBP code until
its minimum value is attained, making LBP code invariant with respect to rotation of
the image domain. DLBP only considers the most frequently occurred patterns, and
try to avoid the information loss caused by just considering the uniform LBP and the
unreliability by considering all possible patterns.
4.2 Spectral-Texture Feature Extraction Using PCNN
The use of appropriate features to represent an output class or object is critical for all
classification problems. As discussed in the previous section, due to the perspective
view and the spatial resolution of aerial photographs, the sizes and shapes of tree
crowns can look quite different. This motivates the use of appropriate features to
represent image structures which are invariant to rotation and scale changes. Rota-
tion invariant features have been investigated in image texture classification for a long
period, with many of them generated from filtered images or by converting rotation
variant features to rotation invariant features using a circular neighbor set (Ojala
75
Page 96
et al., 2002). The human eye is remarkable in its ability to interpret colour-textured
objects and there are a number of models developed for image feature analysis based
on biological models of the visual cortex (Bhatt et al., 2007; Zhan et al., 2009). To
the best of our knowledge, there has been little research to validate the capabilities of
biologically inspired feature extraction mechanisms in remote sensing image classifica-
tion problems. In this research, a biologically inspired object descriptor is developed
to represent the spectral-texture patterns of image-objects. The feature descriptor is
derived from the pulse spectral frequencies (PSF) of a pulse coupled neural network
(PCNN), which is invariant to rotation, translation and small scale changes.
4.2.1 Multi-spectral Unit-linking PCNN
In this thesis, a simplified spiking cortical model called a unit-linking PCNN (Gu,
2008) is employed to generate image features. We introduce multi-spectral channels
to this PCNN model. Compared with original unit-linking PCNN, the advantage
of this model is that it has more external inputs so that both spectral and spatial
information are considered in the derived features. Figure 4.5 illustrates the structure
of this multi-spectral PCNN model. Each neuron corresponds to one pixel in an input
image, receiving its corresponding pixel’s information as an external stimulus. Each
neuron is coupled with its 3 × 3 neighboring neurons, receiving local stimuli (i.e.
the outputs) from them. The major reason of using a unit-linking PCNN is that it is
easy to analyse the pulse dynamics and control the performance, due to the simplified
feeding part and unified linking part.
The model can be mathematically represented as:
Fmij (t) = Smij (t) (4.7)
76
Page 97
(a) Each neuron is coupled with its 3× 3 neighboring neurons, receiving localstimuli (i.e. pulse outputs) from its neighboring neurons and also the externalstimuli from the corresponding pixel values.
(b) local stimuli and external stimuli are modulated and input to the pulsegenerator. The neuron pulses if the modulated input is larger than a dynamicthreshold.
Figure 4.5: The structure of the multi-spectral PCNN
77
Page 98
Lij(t) =
1
0
if∑
k,l∈N(i,j) Yk,l > 0
otherwise
(4.8)
Uij(t) = (1 + β · Lij(t))∑
FMm=1(wmF
mij (t))· (4.9)
Yij(t) =
1
0
if Uij(t) > �ij(t)
otherwise
(4.10)
delta�i,j (t) =d�i,j (t)
dt= −αti,j + V t
i,jYi,j(t− 1) (4.11)
where t refers to time (the number of iterations); (i, j) indicates the index of the
current neuron (i.e. pixel ) and (k, l) indicates the neighboring field of the neuron (i.e.
3× 3 window)); m indicates that the external input from mth channel of the image;
Yi,j is the pulse output of the neuron (i, j) ; Ui,j is the internal activity of the neuron;
V Ti,j is the threshold magnitude scale (greater than 1); �i,j is the dynamic threshold
which controls whether the neuron pulse or not. In this paper, the firing threshold is
linear-depending threshold. αTij is the maximum value of the input image; The linking
strength coefficient β determines the weight of linking input to the internal status of
the neuron. The weight factor wm is the importance of mth spectral channel (M is
the total number of channels,∑M
m=1wm = 1).
4.2.2 Properties and Behaviors of multi-spectral PCNN
Unlike most other neural network models, the processing is automatic and there is no
training required in a PCNN. The PCNN algorithm consists of iteratively computing
until some stopping criterion is reached. Through iterative computation, neurons
produce a temporal series of pulse outputs, which indicates the pulse status of each
78
Page 99
Figure 4.6: Periodical pulse of the neuron (i, j)
neuron (pixel). At each iteration, different neurons fire sequentially according to
the internal status of neurons and the firing threshold. Similarities in the input
pixels cause the associated neurons to pulse synchronously, thus indicating similar
structures.
The dynamic properties of the multi-spectral PCNN are very complex. In this
section, the property and behavior of a single neuron is analyzed under the assumption
that there are no linking connections. Biologically there is a fatigue period called
the refractory period after a neuron fires. A neuron cannot be captured by other
neurons if it is in the refractory period. Figure 4.6, obtained from equation 4.9-4.11,
illustrates the periodical pulse of a neuron (i, j), and its capture and refractory period.
Ui,j max is the possible maximum value of the internal activities, TRi,j is the refractory
period and TCi,j is the capture period, Fi,j is the combination of the external stimulus
Fi,j =∑M
m=1(wm × Fmi,j(t)).
When a neuron (i, j) fires, its threshold increases to V Ti,j, and then linearly de-
creases to Ui,jmax to make the neuron fire again. The pulse process continues period-
79
Page 100
ically with the time Ti,j. During the refractory period TRij, the neuron cannot fire no
matter whether its neighbors fire or not because its threshold �i,j(t) is larger than the
maximum internal activity Ui,jmax. Only during the capture time TCij, the neuron
can pulse or be excited by other neurons as the threshold is lower than the maximum
internal activity Ui,jmax. There is a period changing from the refractory time to the
capture time. Therefore, PCNN mimics the mechanism of the biological neuron.
4.2.3 Rotational and Scale Invariant Feature Extraction Using
Pulse Spectral Frequency
The spectral time sequence generated from PCNN iterations reflects the image value
distribution pattern and thus it can be used to represent image features. Lindblad
and Kinser found that PCNN and wavelet transforms have many similarities, however
PCNNs are unique in that they generate rotation, translation and scale invariant time
signals (Lindblad & Kinser, 1999). This invariance property is identified by observing
the same period and number of peaks in the output time signals of a PCNN. The
invariance property is not hard to understand if we consider a PCNN as an image
representation that summarizes the firing neurons in the whole image (region). Since
an isotropic neighborhood is adopted, it does not matter which neuron is used to
provide the local stimulus. The output of a PCNN is not inherently invariant to
scale changes because the number neurons affected by the rescaled patch is changed.
However, the output pulse frequency of the rescaled patch is stable and the scale
changes will only be reflected in the outputs of a PCNN by a scale-factor. Here
we borrow the analysis method from literature (Johnson, 1994) to explain the scale
invariance properties of our PCNN model. For simplicity, we consider image patches
rather than single pixels. Assuming one image consists of a certain number of patches
and the patch number is independent to scale changes. Each image patch is considered
as a whole with its own intensity. The number of pixels in each patch depends on the
80
Page 101
Figure 4.7: Geometry for scale invariance
scale factor. As shown in Figure 4.7, after an image is rescaled, the distances of image
patches change but the intensity per image patch is constant. When a neuron at patch
A receives a linking contribution from a neuron at patch B, the image patch at A
goes to kA and B goes to kB after the image is rescaled. However, the feeding input
of the neuron does not change (e.g. F (A) = F (kA)). Moreover, in the unit-linking
PCNN model, the linking input is unified to 0 or 1 and thus it does not depend on
the scale factor dependence k. Therefore, the internal activity of the rescaled patch
remains the same as that of the original patch and thus the pulse dynamics of each
image patch will not change. The only change is the actual number of pixels in each
image patch. As a result, if we can normalize this difference PCNN will be invariant
to scale changes.
In this research, the pulse spectral frequency (PSF) is used for rotation and scale
invariant object feature extraction. PSF is defined as a normalized histogram which
indicates the number of firing pixels in a specified time period (i.e. equation 4.12).
PSF (t) = N(t)/max(N) (4.12)
where N(t) is the number of firing pixels at time t; max(N) is the maximum
number of firing pixels in a time period. In order to achieve scale invariance we
normalize the number of firing pixels to [0, 1] in the discrete time steps. The dimension
81
Page 102
of the histogram is equal to the total number of iterations. However, as stated by
Johnson in (Johnson, 1994), this scale invariance property may not hold for very small
image and large scale changes because the local group around a neuron also changes in
scale which cause the internal activity change as well. The calculation of PSF feature
consists of iteratively computing until the user decides to stop. There is currently
no automated stop mechanism built. Theoretically, the more the iteration time, the
richer information can be derived to characterize the image or object. However, at
the meantime, it will cause the high dimensionality of the feature vectors.
4.3 Colour and Texture Feature Fusion
4.3.1 Framework
Colour and texture are two fundamental features in describing an image, but prior
research generally focus on extracting colour and texture feature as separate entities
rather than a unified image descriptor (Whelan & Ghita, 2009). The use of colour
and texture information collectively has strong links with the human perception, and
this motivates investigating how to effectively fuse colour and texture as a unified
descriptor to improve the discrimination over viewing colour and texture features
independently. Although the motivation of using colour and texture information
jointly in object-based image classification is clear, how best to combine colour and
texture in a unified object descriptor is still an open issue. Huang el al. (Huang
et al., 2008) proposed a multiscale spectral and spatial feature fusion method based on
wavelet transform and evaluated in very high resolution satellite image classification.
Zhang et al. (Zhang et al., 2008) extracted texture features using multi-channel Gabor
filters and Markov random fields integrated the two features using a neighbourhood-
oscillating tabu search approach for high-resolution image classification. However,
these methods extract features from fixed window size and do not consider all pixels
82
Page 103
Figure 4.8: Framework of object-level colour and texture feature fusion
within an object as a whole. Moreover, heavy computational burden is induced
by combining multiple features, which may cause the ‘the curse of dimensionality’
problem and decrease the performance of the classifier.
It is often difficult to classify objects using single feature descriptor. Therefore,
feature-level fusion plays an important role when multiple features are used in the
process of object classification. The advantages of feature fusion are: 1) the most
discriminatory information from original multiple feature sets can be derived by the
fusion process; 2) the noisy information can be eliminated from the correlation be-
tween different feature sets. In other words, feature fusion is capable of deriving and
gaining the most effective and least-dimensional feature vectors that benefit the final
classification (Yang et al., 2003). The feature fusion framework is illustrated in Figure
4.8. After object segmentation, colour and texture features are extracted from image-
objects. Different feature vectors are normalized and then serially integrated. After
that, kernel PCA is used to globally extract the nonlinear features from the integrated
feature sets as well as to reduce the dimensionality. The intrinsic dimensionality of
the serial fused features is estimated using a maximum likelihood method to select the
target dimensionality from the kernel PCA fused feature. Finally, features selected
from Kernel PCA are used as the input to classifiers for further analysis.
83
Page 104
4.3.2 Feature Fusion Based on Kernel PCA
A serial fusion strategy (Yang et al., 2003) is used by simply combining different
feature vectors into one set of feature union-vector. Since the features are different
on the value scope, they are normalized into range [-1,1] by the Gaussian criterion.
Given a n-dimensional feature vector F = fij, where fij is the jth feature component
in feature vector Fi. Assume that fij is a Gaussian sequence, we compute the mean
mj and the standard deviation σj. Feature fij is normalized by f ′ij =
fij−mj
δj. Suppose
α and β are two feature vectors which are extracted from the same image-object.
The integrated feature union-vector is defined by γ =
α
β
. Obviously, if feature
vector α is m-dimensional and β is n-dimensional, then the dimension of the serial
integrated feature vector is (m+n).
Traditional linear feature selection and extraction methods such as PCA are con-
ducted in original input space, and thus cannot handle nonlinear relationships in the
data well (Cao et al., 2007). For example, the principal components of features may
not be linearly related to the input variables and the features of different categories
cannot be separated by a hyperplane. To solve this problem, kernel methods are intro-
duced to map original data to a kernel space using a mapping function. Kernel PCA
is one of these kernel methods which reformulate traditional linear PCA in a high-
dimensional space using a kernel function. Given M input vectors xp(p = 1, · · · ,M),
kernel PCA firstly map the original input vectors xp into a high-dimensional feature
space φ(xp) and then calculate the linear PCA in φ(xp). Performing PCA in the high-
dimensional feature space can obtain high-order statistics of the input variables, which
is also the initial motivation of kernel PCA (Xie & Lam, 2006). In PCA, the principal
component of xp is the product xpand the eigenvectors of the covariance matrix of M
input vectors. However, it is difficult to directly compute both the covariance matrix
of the high dimensional feature space φ(xp) and its corresponding eigenvectors and
84
Page 105
eigenvalues in the high-dimensional feature space. Therefore, kernel tricks are em-
ployed to avoid this difficulty and the principal eigenvectors are computed from the
kernel matrix, rather than the covariance matrix of the high-dimensional feature.
Assuming that the kernel matrix is centered, i.e.∑M
p=1,q=1 K(xp, xq) = 0, intro-
ducing the kernel matrix K makes the mapping implicit without manipulating high
dimensional space φ(xp) explicitly in terms of Mercer’s condition. Each element of
K is a inner product in the high-dimensional space (i.e. K(xp, xq) = φ(xp) · φ(xq)).
Assuming C is the covariance matrix of φ(xp) , µi and βi are the ith eigenvalues and
eigenvectors of C respectively, and λi and αi are the ith eigenvalues and eigenvectors
of the kernel matrix K. The relationships between eigenvectors and eigenvalues of C
and K are
µi =λiM
(4.13)
βi =M∑j=1
αji × φ(xj) (4.14)
where αji is the jth element of αi (j = 1, · · · ,M).
In order to obtain the low-dimensional feature representation, the data is projected
onto the eigenvectors of the covariance matrix C. The result of low-dimensional data
representation Y is obtained by computing the principal eigenvectors of components
of xp in the space φ(xp).
yip =M∑j=1
α̃ji ×K(xj, xp) (4.15)
where yip is the ith element of yp and α̃i is the normalized αi(α̃i = αi√λi)
The mapping performed by kernel PCA relies on the choice of the kernel function.
In this paper, Gaussian kernel is employed which is widely used in many applications.
85
Page 106
The kernel function is defined as:
K(xi, xj) = exp
(−‖ xi − xj ‖
2
2δ2
)(4.16)
where δ is the shape parameter.
Using kernel PCA to find the underlying structure of and the correlations of mul-
tiple feature sets has important benefits. First, the most discriminatory information
from original feature sets can be derived and redundant information can be eliminated
from the correlation between different feature sets by the fusion process. Second, the
dimension of feature sets can be reduced and thus the computational cost of the sub-
sequent classification stage is reduced. However, it is still a problem to find a criterion
for selecting optimal features using kernel PCA.
4.3.3 Intrinsic Dimensionality Estimation
In the previous discussion we assumed that the target dimensionality of the low-
dimensional feature representation was known and specified by the user based on
the descending order of the eigenvalues. Ideally, the optimal dimensionality needs to
be estimated automatically. A possible solution is to estimate the intrinsic dimen-
sionality of the high-dimensional feature set and use it as the target dimensionality.
Intrinsic dimensionality is the minimum number of variables that is necessary in order
to represent all the information in a dataset. In this paper a maximum likelihood
estimator (MLE) (Levina and Bickel, 2004) is employed to estimate the intrinsic di-
mensionality. MLE is a local intrinsic dimensionality estimator which is based on the
observation that the intrinsic dimensionality of the data manifold around one data
point can be estimated by measuring the number of data points covered by a hyper-
sphere with a growing radius. MLE considers the data points in the hypersphere as a
Poisson process, in which the estimated intrinsic dimensionality d around data point
86
Page 107
xi in given k nearest neighbours is given by
d̂k(xi) =
[1
k − 1
k−1∑j=1
logTk(xi)
Tj(xi)
]−1
(4.17)
where Tk(xi) represents the radius of the smallest hypersphere with centre xi that
covers k neighbouring data points.
Different numbers of neighbouring data points can be treated as the different
scales. It was clear from equation 4.17 that the calculation of intrinsic dimension d̂
depends on scale parameter k. In this thesis, the intrinsic dimension is obtained by
averaging d̂ over a scale range [k1, k2]. It is calculated by the following equations.
d̂k =1
M
M∑i=1
d̂k(xi) (4.18)
d̂ =
∑k2j=k1
d̂k
k2 − k1 + 1(4.19)
where M is the number of input vectors, d̂k is the estimated intrinsic dimension at
scale k, and d̂ is the final estimation value.
4.4 Machine Learning Based Classification
Since object-based image classification is adopted, the classification was conducted in
object-feature space. After image segmentation, both spectral and texture features
of each image-object (i.e. tree crown polygon) were extracted and input to a variety
of classification algorithms for analysis. In this research, supervised machine learning
was adopted to classify certain vegetation species. Machine learning techniques are
now widely used in remote sensing data classification. A machine learning algorithm
is one that can learn from experience (observed examples) with respect to some class
of tasks and a performance measure (Mitchell, 1997). In supervised machine learn-
87
Page 108
ing, the data is a sample of input-output patterns: given a input, a target output
is yielded. In the problem of supervised machine learning, given a sample of input-
output pairs, called the training sample, the task is to find a deterministic function
that maps any input to an output that predict future input-out observations, min-
imizing the errors as much as possible (Camastra & Vinciarelli, 2008). In order to
compare and justify the effectiveness of different classification models, three widely
used machine learning techniques were employed as benchmark classifiers in this re-
search: multilayer perceptron neural networks (MLP), decision tree forest (DTF),
and support vector machines (SVM). The theory regarding these machine learning
techniques is out the scope of this thesis. Only the basic idea of these techniques is
presented in this section but with a few more discussion on SVM which is believed to
be better.
4.4.1 Multilayer Perceptron Neural Networks
Artificial neural networks have been successfully in pattern recognition for many
years. A multilayer perceptron neural network (MLP) is a feed-forward fully con-
nected network. Although MLP can have an arbitrary number of hidden layers it has
been proven that one hidden layer is sufficient to guarantee that MLP has universal
approximation property (Camastra & Vinciarelli, 2008). Figure 4.9 shows a typical
three layer MLP with an input layer, a hidden layer and an output layer. In the input
layer, the predictor variable values (x1, · · · , xp) represents the input vector. Besides
the input vector, there is a constant input of 1.0 (called bias) which is multiplied
by a weight w and added to the sum going to the neuron. The weighted sum (uj)
is fed into an activation function (δ) in the hidden layer and outputs a value hj.
The outputs from the hidden layer are distributed to the output layer, in which they
are multiplied by a weight again to produce a combined value (vk). Afterwards, the
weighted sum (vk) is fed into an activation function (δ) to generate the final output
88
Page 109
Figure 4.9: A multilayer perceptron neural network
(yk).
In order to use MLP to approximate a specific mapping, it is necessary to find
the parameter set (i.e. the weight values) that corresponding to such mapping. This
can be done through a training procedure where the network adapts the parameters
based on a set of labeled examples. There are several issues involved in designing and
training a MLP such as deciding the number of neurons in the hidden layer, finding a
globally optimal solution that avoids local minima, converging to an optimal solution
in a reasonable period of time and validating the neural network to test for overfitting.
4.4.2 Decision Tree Forest
Decision Tree Forests (DTF), also known as Random Forests, are ensembles of tree-
type classifiers. Decision tree forest classifiers were shown to be a highly accurate
model and comparable to other ensemble methods (i.e. bagging and boosting) in
terms of accuracy, but computationally much less intensive (Breiman, 2001). DTF is
a general term for ensemble methods using tree-type classifiers {h(x,�k), k = 1, · · · , }
where {�k} are independent identically distributed random vectors and x is an input
pattern. DTF has two stochastic (randomizing) elements: (1) the selection of data
rows used as input for each tree, and (2) the set of predictor variables considered
89
Page 110
as candidates for each node split. A DTF grows a number of independent trees
in parallel, and they do not interact until after all of them have been built. For
classification, each tree in the Decision Tree Forest casts a unit vote for the most
popular class at input, while the output of the classifier is determined by a majority
vote of the trees.
Here is an outline of the DTF algorithm used in this research which was imple-
mented by the DTREG software(Sherrod, 2009):
(1) Take a random sample of N observations from the data set with replacement
(called “bagging”, some rows may not be selected while other may be selected multiple
times). On average, about 2/3 of the rows will be selected. The remaining 1/3 of the
rows are called the “out of bag (OOB)” rows.
(2) Construct a decision tree for each sample selected in step (1). Consider only
a subset of predictor variables as possible splitters for each node and perform a new
random selection for each split. Some predictors will not be considered for each split,
but a predictor excluded from one split may be used for another split in the same
tree.
(3) Repeat steps (1) and (2) a large number of times to construct a forest of trees.
(4) Just like the single-tree model, to “score” a row, run the row through each tree
in the forest and record the predicted value (i.e., terminal node) that the row ends
up in. Use the predicted categories for each tree as “votes” for the best category, and
use the category with the most votes as the predicted category for the row.
4.4.3 Support Vector Machine
SVM is a machine learning technology which has been successfully used in a variety of
pattern recognition tasks and often outperforming other classification methodologies
(e.g. Artificial Neural Networks) (Mills, 2008). SVM is a supervised non-parametric
statistical learning technique, therefore there is no assumption made on the underlying
90
Page 111
Figure 4.10: A linear SVM example
data distribution. The basic idea of SVM training algorithm is to find a hyperplane
that separates the dataset into a discrete predefined number of classes. The term
optimal separation hyperplane is used to refer to the decision boundary that mini-
mizes misclassifications, obtained in the training step. Learning refers to the iterative
process of finding a classifier with optimal decision boundary to separate the training
patterns (in potentially high-dimensional space) and then to separate simulation data
under the same configurations (Zhu & Blumberg, 2002).
Figure 4.10 illustrates the simplest form of SVM: a linear binary classifier case.
To separate the training data {x1, x2, · · · , xn} with a label yi ∈ {+1,−1} into the
positive (+1) or negative (-1) classes, SVM tries to find an optimal decision function
(hyperplane) with the maximum margin ε between the points of each of the two
classes. These closest points are called the support vectors ( x1 and x2 are examples
of support vectors). The decision function is described as equation 4.11. The decision
can be made according to that when f(x) = 0, x is classified as +1, otherwise, x is
classified as -1.
For data not linearly separable in the input space, SVM would map the data
from the initial space to a (usually significantly higher dimensional) Euclidean space
91
Page 112
H by computation of inner-product kernels K(xi, x). After the mapping, the data,
which is not linearly separable in the input space, become linearly separable in the H
space. A kernel function typically needs to fulfill Mercer’s Theorem in order to be a
valid kernel in SVMs (Scholkopf & Smola, 2001). Various classification methods are
constructed by employing different kernel functions K(xi, x) (e.g., linear, polynomial,
RBF, sigmoid). Radial basis function (RBF) is selected in this research as it often
suggested as the first choice since it has several advantages over other common kernel
functions (Hsu et al., 2008): 1) unlike linear kernel, RBF nonlinearly maps samples
into a high dimensional space, so it can handle the case when the relation between
class labels and attributes is nonlinear; 2) RBF kernel has less hyperparameters than
the polynomial kernel which make it less complex in model selection.
f(x) =N∑i=1
yiαiK(xi, x) + b (4.20)
RBF Kernel : K(x, y) = exp (−γ ‖ x− y ‖)2 (4.21)
where 0 ≤ αi ≤ C is the maximal margin hyperplane in the H space, C is a penalty
parameter and γ is the kernel parameter. When the maximal margin hyperplane is
found, only those points that lie closest to the hyperplane are the support vectors.
Overfitting is also another interesting concept that serves as a key attraction to
SVMs. Ideally an SVM analysis should produce a hyperplane that completely sep-
arates the feature vectors into non-overlapping groups. However, perfect separation
may not be possible, or it may result in a model which does not generalize well to
other data. SVM-based classification has been known to strike the right balance be-
tween accuracy attained on a given finite amount of training patterns and the ability
to generalize to other data (Mountrakis et al., 2011). To allow some flexibility in sep-
arating the categories, SVM models have a penalty parameter, C, that controls the
92
Page 113
trade off between allowing training errors and forcing rigid margins. It creates a soft
margin that permits some misclassifications. A larger value of C increases the cost
of misclassifying points and forces the creation of a model perfectly fit the training
data but may not generalize well. Figure 4.11 illustrates an example of the tradeoff
between underfitting and overfitting. The accuracy of an SVM model is largely de-
pendent on the selection of the model parameters such as C, γ, etc. The DTREG
software package used in this research, provides two methods for finding optimal pa-
rameter values, a grid search and a pattern search (Sherrod, 2009). A grid search tries
parameter values across the specified search range using geometric steps. A pattern
search starts at the center of the search range and makes trial steps in each direction
for each parameter. If the fit of the model improves, the search center moves to the
new point and the process is repeated. If no improvement is found, the step size is
reduced and the search is tried again. The pattern search stops when the search step
size is reduced to a specified tolerance.
SVMs were originally designed for binary classification. Multi-classification prob-
lem should be decomposed into several binary classification problems. Currently two
popular decomposing strategies: one-against-one and one-against-all. As to the one-
against-one method, each SVM should be trained for each pair of classes, while in
one-against-all approach, one SVM should be built for each class. Hsu and Lin have
proven that the one-against-one strategy is as accurate to the one-against-all strategy,
but the former strategy is more practical as it requires less training time (Hsu & Lin,
2002). Therefore, one-against-one strategy is adopted in this research for multi-class
tree species classification.
SVMs are particular appealing in the remote sensing field due to their ability
to successfully handle small training datasets, often producing higher classification
accuracy than the traditional methods (Mountrakis et al., 2011). Many statistical
techniques such as maximum likelihood estimation usually assume that data distri-
93
Page 114
Figure 4.11: Tradeoff between underfitting and overfitting
94
Page 115
bution is known as a priori. While the major benefit of SVM that SVMs are devel-
oped around the principle of Structural Risk Minimization to address issues concerned
with generalization. Under this scheme, SVMs minimize classification error on un-
seen data without prior assumptions made on the probability distribution of the data.
This is particularly appealing in remote sensing applications since data acquired from
remotely sensed imagery usually have unknown distributions, and methods which as-
sume data as a normal distribution do not necessarily match that reality.
4.5 Summary
This chapter briefly reviews the major steps for thematic information extraction from
remote sensing image classification with special attention on visual features that may
applied to object-based tree species classification. Novel methods in spectral-texture
feature extraction, feature fusion and machine learning based classification were de-
veloped and applied to object-based tree species classification using multi-spectral
imagery.
95
Page 117
Chapter 5
Experiments and Results
5.1 Object Detection and Segmentation
To evaluate the proposed algorithms for power line detection and individual tree
crown segmentation, experiments were conducted on the collected aerial image data.
Both qualitative and quantitative measures are used in the evaluation.
5.1.1 Power Line Detection
The first experiment was conducted on the collected high-spatial resolution natural
colour UAV images. In the experiment, we compare Hough line detection results on
edge maps generated from Canny filter and the proposed pulse coupled neural filter
(PCNF). The results before and after using knowledge-based line clustering in Hough
space were also compared. As is shown in Figure 5.1 (a), there are many linear features
in the original image: power lines, edges of road, shadows, etc. These linear features
are detected by Hough transform (see Figure 5.1 (c), shown in red lines). Although
some of these lines can be eliminated by applying knowledge based post-processing,
lines such as road edges are not removed because they are parallel to power lines (see
Figure 5.1(d), shown in green lines). A better choice is trying to avoid the misleading
97
Page 118
information before detecting power lines. As is shown in Figure 5.1 (e), after using the
proposed PCNF for preliminary detection of power lines and edge maps generation,
most irrelevant points are filtered, though a few noises still exist. This is because
PCNF has the characteristic of grouping pixels according to the spatial and spectral
similarity. It reduces the local gray-level differences of images and makes up local
tiny discontinuous points in image regions. Power lines are made of special metal and
have uniform brightness on images while the background is different on textures and
intensities. Neurons stimulated by power lines generate different spectral stimuli from
that of the background, and then they pulse non-synchronously. Thus, power lines are
discriminated from the background. According to our experiment, the pulse output
of PCNF at the third iteration is a safe choice because pixels corresponding to power
lines pulsed and most of the background pixels have not pulsed at that time. However,
automatic selecting of temporal pulse outputs is required in the future work. From
Figure 5.1 (g) and (h), we can see that after using PCNF, power lines are correctly
detected no matter using knowledge based post-processing or not.
Figure 5.2 shows more results of the experiment. The first row shows the original
images. Row 2 and Row 3 are Hough line detection results on Canny edge image and
PCNF edge image without using knowledge-base post-processing. Row 4 and row
5 are the results after using knowledge based line clustering. From the experiment,
it is clear that the proposed pulse coupled neural filter (PCNF) is very useful as a
pre-processing tool. Most noises are filtered and power lines are prominent in the
images. After using PCNF, fewer irrelevant lines exist. Applying knowledge-based
post-processing by clustering lines in the Hough space also increases the accuracy of
power line detection. Combination of these techniques can significantly increase the
accuracy of power line detection in a complex environment.
An quantitative evaluation was conducted on a set of 15 images containing 53
powerlines located in 9 spans (the ground truth was manually labeled by visual ob-
98
Page 119
Figure 5.1: Comparison of power line detection results
servation). The algorithm obtained an overall accuracy of 88.68% with a false positive
rate of 7.84% and false negative rate of 11.76%. It is noted that metallic fence line is
also detected (see the left line in Figure 5.1), because it has very similar characteristics
with power lines. In some countries such as Australia, it is not uncommon that the
fence lines exist near power line corridors and in many cases they are parallel to power
lines. Besides the difficulty to discriminate these very mistakable linear features, low
spatial resolution and motion blur are the other two major problems for the failure
of the power line detection algorithm. In general, the performance of the algorithms
depends on the quality of the image. Figure 5.3 shows some failure cases due to the
low quality of the collected imagery. In Figure 5.3 (a), it is hard even for human to
discriminate power lines without noticing the shadow of power pole. In Figure 5.3
(b), it is noticed that the illumination condition and the shadows of trees significantly
influence the visibility of power lines, thus causing difficulty for computer algorithms.
Compared to image based approaches, LiDAR is more popular and reliable for
power line survey since it can provide high density point cloud data and does not reply
99
Page 120
Figure 5.2: Power line detection results
100
Page 121
Figure 5.3: Failure examples for power line detection
Figure 5.4: Multi-layer powerlines and crossing powerlines
101
Page 122
on illumination conditions. The developed knowledge-based line detection algorithm
does not consider the scenario of crossing power lines. Moreover, using image-based
approach is not possible to detect multi-layer power lines, especially when trees grow
over power lines (Figure 5.4). LiDAR data provides 3D information and make it
possible to detect multi-layer structured power lines. In addition, LiDAR can more
effectively generate accurate elevation and terrain models, which can also help to
remove terrain points and other similar linear features (e.g. fences). Therefore, to
provide a reliable solution for power line detection, future work is to develop effective
algorithm for LiDAR point cloud data processing and 3D modeling of catenary curve.
5.1.2 Individual Tree Crown Segmentation
The criterion for successful vegetation detection in this study is defined as an indi-
vidual tree having been delineated while preserving the contour of the crown. The
developed tree crown segmentation algorithm was evaluated against two existing im-
age segmentation algorithms (Deng & Manjunath, 2001), JSEG and TreeAnalysis
(Erikson, 2003). The JSEG algorithm is a classic region-based image segmentation
method which takes advantage of both colour and texture properties and it has been
widely used in natural image segmentation in many computer vision application.
TreeAnalysis is a fuzzy region-growing segmentation algorithm specifically designed
for tree crown delineation. Figure 5.5 shows a comparison of the three segmentation
algorithms and also the ground truth image. The ground truth data was created
by manual segmentation of individual trees. From visual assessment, the developed
algorithm obtained much better results than the other two algorithms. Since spec-
tral, texture and morphological features were considered in the developed algorithm,
trees were successfully detected and shadows were removed, thus the boundaries of
individual tree crowns were delineated more accurately. In contrast, the results by
TreeAnalysis and JOSEF showed significant under-segmentation caused by the con-
102
Page 123
fusion between tree crown and its shadows.
In order to produce a quantitative evaluation, an analysis of under-segmentation
and over-segmentation was conducted using a set of four metrics: 1-to-1, 1-to-M,
M-to-1, and missing (Wang et al., 2006). The first criterion 1-to-1 indicates the suc-
cessful mapping of a single tree crown in the real world to a single tree crown by the
segmentation algorithm. The criterion 1-to-M defines a single tree crown that has
been incorrectly segmented into several portions; likewise, M-to-1 describes a cluster
or group of trees that have been segmented as one. Missing indicates that a tree
has been misclassified as ground. The accuracy of any algorithm is then calculated
as the proportion of correct 1-to-1 mappings to the total number of trees present.
In certain instances, it will be necessary to inspect vegetation from the ground to
acquire an accurate truth data set. Of the multi-spectral images captured, a series
of 13 images were selected for processing with a total number of 183 trees. Frames
were removed from the full sequence of images to minimize overlap that would see
some trees processed more than once. Furthermore, the algorithm was only applied
to those areas of the image that contained the power-line corridor. Table 5.1 shows a
quantitative comparison of JSEG, TreeAnalysis and the developed algorithm. Table
5.2 shows the quantitative evaluation results spread to each image. Overall, 178 of
183 trees imaged were detected by the developed algorithm, thus yielding a detection
rate of 97.27%. Although over- and under segmentations are undesired, detection is
achieved all the same, whereas those missed by the algorithm are potential threats
that go unchecked. It appeared that sparse foliage and small crowns were the ma-
jor reason of trees missed by the algorithm, where the combination of low spatial
resolution and low foliage density produced limited contrast for segmentation. With
regard to correct segmentation, the algorithm was found to achieve an overall accu-
racy of 84.7%, with the contribution of errors stemming from under-segmentation,
over-segmentation and missing of trees. Over-segmentation is less serious a prob-
103
Page 124
Figure 5.5: Ground truth and segmentation results
lem than under-segmentation, since a posteriori merging of segments after the final
object-based classification is easier if the segments are correctly classified as the same
species. While under-segmentation makes trouble to the classification algorithm if
the trees in one segment are of different species. Of these errors, under-segmentation
is of greatest concern as it is expected at this stage that individual trees are selected
for further processing. Even to the naked eye, some instances are hard to detect
as under-segmentation typically occurs when trees have grown in a tight group and
have overlapping crowns. Add to that clusters containing grass as the background
is further complicated, with similar co lours and even texture. Figure 5.6 shows an
example of the failure cases where the algorithm failed to discriminate background
grass and trees. Combining Li DAR elevation data and Multi-spectral image will be
helpful to solve the under-segmentation problem.
104
Page 125
Figure 5.6: A failure example of individual tree crown delineation
Table 5.1: Quantitative comparison of three segmentation algorithmsMethod 1-to-1 1-to-M M-to-1 Missing AccuracyJSEG 126 11 34 12 68.85%
TreeAnalysis 130 12 32 9 71.04%Our Algorithm 155 11 12 5 84.7%
Table 5.2: Quantitative analysis of individual tree crown detection and delineationImage Truth 1-to-1 1-to-M M-to-1 Missing Accuracy
1 21 17 0 2 2 81.95%2 11 11 0 0 0 100%3 25 21 2 2 0 84.0%4 21 20 1 0 0 95.24%5 11 9 0 2 0 81.82%6 15 13 1 0 1 86.67%7 3 3 0 0 0 100%8 11 10 0 0 1 90.91%9 11 9 1 0 1 81.82%10 13 11 2 0 0 84.62%11 13 9 2 2 0 69.23%12 22 16 2 4 0 72.73%13 6 6 0 0 0 100%
Overall 183 155 11 12 5 84.7%
105
Page 126
5.1.3 Fusion of LiDAR and Multi-spectral Imagery
Ground filtering is an important step in the LiDAR and multi-spectral imagery fusion
algorithm. Therefore, the first experiment in this section is to evaluate the ground
filtering algorithm. The experiment is conducted on both LIDAR height data and
intensity data. Figure 5.7 (a) and (b) show 3D views of height and intensity values of
the LIDAR data. Different colours represent ranges of values while blue indicates the
lowest value range and white indicates the highest value range. From the example data
it is clear that ground points have the lower height, while object points have the lower
intensity. The skewness and kurtosis sequences were calculated from both intensity
and height value of the LIDAR data as shown in Figure 5.7 (c) and (d). Object
points are separated from ground points based on skewness and kurtosis sequences
as described by Algorithm 3 in chapter 3. Figure 5.7 (e) and (f) present the ground
filtering results on height and intensity data, in which object points are shown in blue
while ground points have been removed. From visual assessment, using intensity data
obtains better results than using height data. It is noted in Figure 5.7 (e) that some
small trees has been removed together with ground points and the power lines were
significantly broken. Although there is more noise exist in the result on intensity data
(Figure 5.7(f)), object points were preserved better. The obtained object points, are
then used to further improve individual tree crown detection and delineation together
with multi-spectral imagery.
Ergon’s network is mostly in rural areas, where vegetation detection is relatively
easier because there are less number of land cover types. However, in urban scenes, the
detection and delineation of individual trees become more difficult since more types of
land covers exist (e.g. trees, shrubs, grass, buildings, roads, swimming pools, etc.). In
this case, it is very hard to achieve robust tree crown detection and delineation using
only imagery or LiDAR data. Combining the complementary height information in
LiDAR data is very helpful.
106
Page 127
Figure 5.7: Comparison of LIDAR intensity and height data by skewness and kurtosisanalysis
107
Page 128
Figure 5.8: A pair of CIR image and LiDAR point cloud data in urban area
Figure 5.8 shows a pair of CIR image and LiDAR point cloud data in urban
areas. Figure 5.9 shows the fusion process and the result using the image and LiDAR
data. As is show in Figure 5.9 (a), an initial segmentation was conducted on the
CIR image without any post-processing. The initial segmentation detects trees as
well as other vegetation segments (e.g. grass). Figure 5.9 (b) shows the 2.5D depth
image of LiDAR object points after ground filtering. Each connected region in the
initial segmentation map was labeled, showing different colours in Figure 5.9 (c). The
2.5D depth image is then integrated with the labeled vegetation segments map. A
simple thresholding process is used in order to remove grass and low vegetation. The
mean height of each region is calculated and the regions with a mean height less than
1.5 meters are removed. An overlay of the LiDAR points after the fusion process
on the initial segmentation map is shown in Figure 5.9 (d). From this figure, we
can see that the low mean height regions representing grass and other low vegetation
were separated. Figure 5.9 (e) shows the tree segments after the fusion process.
Afterwards, an watershed algorithm described in chapter 3 was applied on the Figure
108
Page 129
Figure 5.9: Fusion of LiDAR and multi-spectral imagery for tree crown delineation
109
Page 130
5.9 (e) to decompose the tree clusters to individual tree crowns. As can be seen from
the final segmentation result, low vegetation regions have been successfully removed.
However, a critical limitation from this fusion process is that it depends on the high
point density of LiDAR data. For a small-sized tree crown and low point density
LiDAR data, no point or only a few points hit the tree which may cause the tree been
removed due to low region mean height.
5.2 Feature and Classifier Evaluation
5.2.1 Experiment Setup
In order to compare different feature descriptors and justify the effectiveness of dif-
ferent classification models, a number of experiments were conducted using three
benchmark classifiers: multilayer perceptron neural networks, decision tree forest,
and support vector machines. In this study, the implementation of DTREG software
is used for the three classifiers (Sherrod, 2009). V-fold cross validation technique was
employed in the experiment, and 10 folders were selected for the cross validation. The
dataset is partitioned into 10 groups, which is done using stratification methods so
that the distributions of categories of the target variable are approximately the same
in the partitioned groups. 9 of the 10 partitions are collected into a pseudo-learning
dataset and A classification model is built using this pseudo-learning dataset. The
rest 10% (1 out of 10 partitions) of the data that was held back and used for testing
the built model and the classification error for that data is computed. After that, a
different set of 9 partitions is collected for training and the rest 10% is used for testing.
This process is repeated 10 times, so that every row has been used for both training
and testing. The classification accuracies of the 10 testing datasets are averaged to
obtain the overall classification accuracy.
The basic design of the benchmark classifiers is described as follows:
110
Page 131
1) Multilayer Perceptron Neural Networks (MLP): Three layers are used with one
input layer, one hidden layer and one output layer. The number of neurons in the
hidden layer is automatically optimized in DTREG. A logistic (sigmoid) activation
function is employed in both the hidden layer and output layer. Overfitting detection
and prevention: 20 instances from the training data are removed and used as a
validation set to check for overfitting as model tuning is performed. The error from
that test is compared with the error computed using previous parameter values. If
the error on the test rows does not decrease after 10 iterations then the training is
stopped and the parameters which produced the lowest error on the test data are
selected.
2) Decision Tree Forest (DTF): In the experiments, a maximum number of 200
trees was used when constructing a forest, and each tree can be grown to up to 50
levels (depth). In addition, a node in a tree will not split if it has fewer than 2 rows in
it. When a tree is constructed in a forest, a random subset of the predictor variables
are selected as candidate splitters for each node. In the experiments, the square root
of the number of total predictor variables was used as the candidates for each node
split, which is suggested by Leo Breimen (Breiman, 2001).
3) Support Vector Machines (SVM): RBF kernel was used for all SVM models
in the experiments. The one-against-one strategy was used because the classification
task in this study is a multi-classification problem. A grid search method was used
to find the optimal SVM model parameters such as C and γ. The searching rages are
0.1 < C < 50000 and 0.001 < γ < 20.
The image-objects generated from segmentation is arbitrary-shaped, however, tex-
ture measurements are usually extracted based on the texture property of pixels or
small blocks within the rectangular shaped region. Therefore, the arbitrary-shaped
objects are extended to a rectangular area for texture extraction. This can be achieved
by padding zero or mean value outside the object boundary, or obtaining the inner
111
Page 132
rectangle from the object. Zero padding introduces spurious high frequency compo-
nents leading to degrading the performance of the texture feature, while the inner
rectangle cannot usually represent the property of the entire object well. Mean-
intensity padding has shown better performance than the other two approaches (Liu
et al., 2006a) and thus was adopted in this study. Firstly, the minimum bounding
rectangle was obtained from the image segment, and then the area outside the seg-
ment and inside of the minimum bounding rectangle was padded using the mean
value of pixels in the region.
5.2.2 Performance Measure
Given a certain application, more than one method is applicable. This motivates
evaluating the performance of these classification methods empirically in a specific
application. That is, given several classification algorithms, how can we say one has
less error than the others for a given application? Having selected a classification al-
gorithm to train a classifier, can we tell an expected error rate with enough confidence
that later on when it is used in a new dataset?
In this study, several most commonly used metrics were discussed for evaluating
different classification algorithms: overall accuracy, precision/recall, F-measure, ROC
analysis, and computational cost. All of these measures are based on the definition
of a confusion matrix. An example of confusion matrix for binary classification is
described in Table 5.3. To help the definition that follows, we define the following
symbols: TP: True Positive count; FN: False Negative count; FP: False Positive
count; TN: True Negative count.
The overall accuracy is the simplest and most intuitive evaluation measure for
classifiers. It is defined as
Accuracy =Number of correct predictions
Total number of samples=TP + TN
P +N(5.1)
112
Page 133
Table 5.3: A confusion matrixPredicted Actual CategoryCategory Positive NegativePositive TP FPNegative FN TN
P=TP+FN N=FP+TN
It is worth noting that the overall accuracy does not distinguish between types
of errors the classifier makes (i.e. False Positive versus False Negative) (Japkowicz,
2006). For example, two classifiers may obtain the same accuracy but they may
behave quite differently on each category. If one classifier obtains 100% accuracy on
one category but only 41% on the other category, while another classifier generate
70% for each category, it is hard to claim that the first classifier is better. Therefore,
overall accuracy may not be use blindly as the evaluation method for classifiers on
a dataset. Precision and Recall can avoid the problem encountered by Accuracy.
Precision can be seen as a measure of exactness or fidelity, whereas Recall is a measure
of completeness. Their definitions are:
Precision =TP
(TP + FP )(5.2)
Recall =TP
P(5.3)
The goal of Precision/Recall space is to be in the upper-right-hand corner, which
means that the higher value of measure, the better classifier’s performance. However,
Precision and Recall do not judge how well a classifier decides that a negative example
is, indeed, negative. Receiver Operating Characteristic (ROC) analysis can solve both
the problems of Accuracy and Precision/Recall. ROC analysis plots the False Positive
Rate (FPR) on the x-axis of a graph and True Positive Rate (TPR) on the y-axis.
TPR is equal to Recall and FPR is defined as FPR = FPN. A ROC graph depicts
113
Page 134
Figure 5.10: Illustration of ROC space analysis
relative trade-offs between true positive (benefits) and false positive (costs), and the
goal in ROC space is to be in the upper-left-hand corner (Davis & Goadrich, 2006).
The (0,1) point of the ROC space is also called a perfect classification. The diagonal
line from the left bottom to the right top corner is also called the random guess
line, which can be used to judge the whether it is good or bad classification. Points
above the random guess line indicate good classification results, while points below
the line are considered as bad classification results. The shorter the distance to the
(0,1) point, the better the classification is. Figure 5.10 1illustrates the evaluation of
classifiers in ROC space.1Source: http://en.wikipedia.org/wiki/Receiver_operating_characteristic
114
Page 135
5.2.3 Evaluation of PSF feature in Rotation and Scale Invari-
ant Texture Classification
The first experiment assessed the developed PSF features in rotation and scale invari-
ant texture classification. A dataset of rotated texture images from the University
of Southern California Signal and Image Processing Institute (USC-SIPI) texture
database 2 is used in the first experiment. The dataset consists of the 13 Brodatz
textures digitized at seven different rotation angles: 0, 30, 60, 90, 120, 150, and 200
degrees (91 images). To test the scale invariance properties, each image was resized
to 4 scales (0.25, 0.5, 0.75, and 1). A total of 364 images were used in the experiment
with 13 textures (i.e. wool, bark, brick, bubbles, grass, leather, pigskin, raffia, sand,
straw, water, weave, and wood).
PSF features were extracted from the gray-scale texture images and then used
as input to build the classifier. Figure 5.11 shows some examples of texture images
and their corresponding PSF features. Figure 5.11 (a), (b) and (c) shows the PSF
features of bark texture at different scales and rotation angles. As we can see, the PSF
histograms of three bark texture images are not exactly the same, but the changing
trends of PSF histograms in a time period are approximately the same. However, as
shown in Figure 5.11 (d), (e) and (f), the PSF features of the other 3 textures (i.e.
bubbles, brick and water) are very different from each other. A well-known texture
descriptor, local binary patterns (LBP), is also evaluated for comparison purposes. In
this experiment, the rotation invariant LBP (Ojala et al., 2002) and a SVM classifier
were employed in the classification test. Table 5.4 compares the performance of LBP
and PSF when textures are rotated only and with both rotation and scale changes.
From the results we can see LBP performs slightly better than PSF when images
are with rotation only. However, PSF generated much higher classification accuracy
when the images have both rotation and scale changes. Table 5.5 compares the average2USC-SIPI Database: http://sipi.usc.edu/database/database.cgi?volume=textures
115
Page 136
Table 5.4: Accuracies of PSF and LBP in texture classification (in percent)Rotation Only Rotation and Scale
PSF 98.9 99.18LBP 100 94.51
Table 5.5: Averaging computational costs of PSF and LBP (in seconds)PSF LBP
computational cost 0.937 0.136
computational costs of PSF and LBP per image. The experiment is conducted under
a desktop PC configuration of core duo 2.66GHz CUP and 2GB memory. Since PSF
involves iterative computation of the neural network, the computational cost is much
higher than LBP.
5.2.4 Evaluation of Features and Classifiers for Tree Species
Classification
The second experiment evaluates the PSF feature in individual tree species classifi-
cation using the collected multi-spectral imagery. It should be noted that classifying
all types of species in power line corridors requires significantly more resources than
what is currently available. In this research, we focus on three dominant species in
our test field. We abbreviate the species names to Euc-Ter, Euc-Mel and Cor-Tes.
Through a field survey with a botanist’s participation, 121 trees were selected and
labeled for the experiment with 64 Euc-Ter, 30 Euc-Mel and 27 Cor-Tes trees.
Since object-based classification is used in this study, individual tree crowns are
firstly segmented from image and local features are extracted from the crown regions
and after that the classification is conducted in object-feature space. Since the main
aim of this experiment is to evaluate the effectiveness of object feature descriptors,
we assume that the segmentation is perfect and thus individual tree crowns are man-
ually segmented from the images during the field survey. For comparison purposes,
116
Page 137
Figure 5.11: Examples of texture images and their PSF features
117
Page 138
some classic colour and texture feature descriptors were also evaluated. These include
GLCM, Gabor filters, LBP and colour histogram features extracted from 4 spectral
bands and also HSV colour space. It is also worth mentioning that plants have
distinctive spectral signature which is often modeled by combinations of reflectance
measured in two or more spectral bands. This motivates us to investigate whether
spectral-texture features extracted from spectral vegetation indices could help in veg-
etation species classification. In the experiment, three widely used vegetation index
maps are employed: the Normalized Difference Vegetation Index (NDVI), the Soil Ad-
justed Vegetation Index (SAVI), and the 2-band Enhanced Vegetation Index (EVI2).
PSF histogram features are generated from both the original spectral bands and the
three vegetation index maps.
Table 5.6 compares the overall classification accuracies of the PSF feature and
three classic texture descriptors using different machine classifiers. The results clearly
show that the selection of both feature descriptors and classifiers will strongly in-
fluence the classification accuracies. Nevertheless, the PSF feature obtains the best
overall classification accuracy on all three benchmark classifiers, which confirm its use
as an effective feature descriptor for this data. We also evaluated the performance of
PSF features extracted from multiple spectral bands. Table 5.7 summarizes the over-
all classification accuracies of these features. Hist-RGBNIR and Hist-HSV refer to
the colour histograms extracted from four spectral bands (R, G, B and NIR) and HSV
colour space; similar names are used for PSF features extracted from four spectral
bands and also HSV colour space; PSF-HSV-VI represents the PSF feature extracted
from both HSV colour space and three vegetation index maps. From the results,
we can see that PSF features show significant improvement over colour histograms.
While the colour histograms characterize the colour distribution of the pattern, they
do not exploit the spatial layout of the colours. It is also noted that PSF-HSV outper-
forms the PSF feature calculated from original spectral bands. Another interesting
118
Page 139
Table 5.6: Overall classification accuracies of PSF and texture features (in percent)GLCM Gabor LBP PSF
MLP 69.42 71.9 66.94 70.25DTF 56.2 71.07 71.07 77.69SVM 69.42 69.42 77.69 77.69
Table 5.7: Overall classification accuracies of colour histogram and PSF features inmultiple spectral bands (in percent)
Hist-RGBNIR Hist-HSV PSF-RGBNIR PSF-HSV PSF-HSV-VIMLP 71.9 75.21 75.21 80.17 85.12DTF 71.9 78.51 80.17 80.99 78.51SVM 76.03 69.42 79.34 81.82 85.95
result is that when we incorporate the spectral vegetation index into the PSF-HSV
feature, a significant improvement is achieved. For the machine learning algorithms
tested, generally SVM is found to be robust to obtain good classification accucacy
for most of the feature vectors.
Figure 5.12 presents the analysis results of different feature descriptors using a
SVM classifier in ROC space. As we can see, generally most features get better
performance for class ‘Cor-Tes’ than the other two classes. PSF-HSV-VI performs
the best for classes ‘Euc-Ter’ and ‘Cor-Tes’ and PSF_HSV performs the best for
‘Euc_Mol’. Overall the PSF features outperform other colour and texture descriptors
for all three classes. We attribute the success of PSF feature to its capability of
capturing the local structure of image and its unique property of rotation and scale
invariance. These properties make PSF especially useful in object classification from
aerial images because the same object type has different shapes when viewed from
different heights and directions. Moreover, PSF can be easily extended to represent
the spectral-texture patterns by integrating the PSF histograms extracted from the
pulse images of multiple spectral bands.
119
Page 140
Figure 5.12: Analysis of different feature descriptors in ROC space
5.2.5 Evaluation of Colour and Texture Feature Fusion
The third experiment evaluates the performance of the Kernel PCA based colour and
texture feature fusion scheme in object-based tree species classification. Two classic
colour and texture features, colour histogram and LBP, are selected in the experi-
ment because they are of high dimensionality. The overall classification accuracy of
the fused colour-texture feature were compared with single feature vectors and serial
integrated feature vector through the same classifier. SVM was used in the classifi-
cation test. For comparison purpose, another widely used nonlinear feature selection
technique, Generalized Discriminant Analysis (GDA), are also evaluated in the exper-
iment. GDA is also known as Kernel Linear Discriminant Analysis (Kernel LDA), it
is the reformulation of LDA in the high dimensional space constructed using a kernel
function (Baudat & Anouar, 2000). Gaussian kernel function is used to construct
GDA for the fusion of colour-texture features.
120
Page 141
Figure 5.13: Classification accuracies of the fused features at different dimensions
From the experiment, the overall classification accuracies of colour histogram and
LBP texture features are 76.03% and 71.07% respectively. The serial integration of
these two features shows better performance over single feature with an overall ac-
curacy of 83.47%. To evaluate of performance of fused feature using kernel PCA,
we use a step-by-step model justification method (Song & Tao, 2010). We justify
the dimensionality from 2 to 8 with step 2, and from 10 to 100 with step 10 for the
fused feature vectors. Figure 5.13 shows the classification accuracy curve at different
dimensions. As we can see from the figure, the kernel PCA fused feature performs
much better than single feature and serial integrated feature. However, it is still based
on the assumption that user can specify a good target dimensionality. The estima-
tion of intrinsic dimensionality using MLE is employed as the automatic selection of
optimal number of dimensions. From our experiment, the intrinsic dimensionality of
the integrated LBP and Colour histogram feature is 39.3016, which conforms to the
result from Figure 5.13 where the best accuracy (95.04%) is obtained in dimension
40.
In the experiment, the computational costs of the classifiers using different fea-
121
Page 142
Table 5.8: The classification results of single and fused colour and texture featuresHist-RGBNIR LBP Serial_Fusion KPCA-40 GDA-40
Overall Accuracy 76.03% 71.07% 83.47% 95.04% 91.74%Analysis Time 79.46 s 246.29 s 305.83 s 13.63 s 4.63 s
Table 5.9: The confusion matrix of SVM classification using the fused featurePredicted Actual CategoryCategory Euc-Ter Euc-Mel Cor-TesEuc-Ter 60 1 0Euc-Mel 4 28 0Cor-Tes 0 0 27
ture vectors are also compared. The analysis time is recorded under a desktop PC
configuration of core duo 2.66GHz CUP and 2GB memory. Table 5.8 summarizes the
overall accuracies and analysis time using different feature sets. The optimal dimen-
sion of kernel PCA and GDA fused feature is 40, which is derived from the estimation
of the intrinsic dimensionality. From the results, we can see that the analysis time
varies a lot for different feature sets. High dimensionality of the original colour and
texture features and the serial fused feature cause high computational costs, while
the using the nonlinear fusion method like kernel PCA and GDA can not only im-
prove the classification accuracy but also significantly reduce the dimensionality and
the computational costs. Table 5.9 shows the confusion matrix of classification using
KPCA-40 feature.
From the experimental results, it is clear that fusion of colour and texture features
provides improved discriminative power over using them independently. Moreover,
the proposed nonlinear feature fusion strategy using kernel PCA has shown great
improvement over the serial fusion strategy, not only on reducing the dimensional-
ity and computational cost, but also on removing noisy information and improving
the discriminative power. The proposed feature fusion strategy can be extended to
combine any other feature vectors if they are considered to have some complemen-
tary information. As an example, the PSF-HSV and LBP features were also tested
122
Page 143
Table 5.10: The classification results of single and fused PSF-HSV and LBP featuresPSF-HSV LBP Serial_Fusion KPCA-35
Overall Accuracy 81.82% 71.07% 82.64% 90.91%Analysis Time 18.80 s 246.29 s 263.37 s 11.56 s
using the same fusion strategy. PSF-HSV and LBP features are serially integrated
as an union vector and then the intrinsic dimensionality of the serially fused fea-
ture was estimated using MLE method. According to the experiment, the intrinsic
dimensionality of the fused feature is 34.6609. Therefore, the first 35 eigenvectors
with the largest eigenvalue were selected to represent the serially fused feature. Table
5.10 shows the classification results of single and fused PSF-HSV and LBP features.
The fused PSF-HSV and LBP feature reached an overall accuracy of 92.13%, which
showed significant improvement over single and serially fused features. However, it is
also noted that higher accuracy of single features does not necessarily lead to higher
accuracy for the fused feature. For example, PSF-HSV showed better performance
than Hist-RGBNIR but the its fusion with LBP did not obtain better result than the
fused Hist-RGBNIR and LBP features.
5.3 Summary
This chapter presents the experiments and results for evaluating the developed algo-
rithms in object detection and segmentation, feature extraction and image classifica-
tion. The findings through the experiments include:
(1) The developed pulse coupled neural filter has been successfully applied as a
pre-processing method for initial detection of power lines prior to the Hough transform
being employed. Knowledge-based line clustering in Hough space further improved
the detection accuracy. However, it is observed that difficulties occurs for power line
detection from aerial imagery due to the low quality of the collected imagery (e.g.
illumination changes, motion blur and low spatial resolution).
123
Page 144
(2) By using a PCNN in spectral feature space, followed by post-processing using
a watershed algorithm, the developed tree crown segmentation algorithm achieved a
detection rate of 97.27% and a segmentation accuracy of 84.7% from the collected
multi-spectral imagery. The major problems are the under-segmentation of tree clus-
ters and the inefficiency of discriminating trees from grass and other low vegetation.
(3) Fusion of multi-spectral imagery and LiDAR data has great potential to achieve
better object detection results. Object points were successfully detected by using a
statistical-based ground filtering algorithm on LiDAR intensity data. Region-level
fusion of initial vegetation segmentation map and LiDAR object points make it easier
to filter low vegetation regions. However, a critical limitation from this fusion process
is that it depends on the high point density of LiDAR data.
(4) The developed PSF feature is invariant to rotation and scale changes. The
experimental results on USC-SIPI texture database demonstrated its pros and cons
compared to LBP features. The experiments in vegetation species classification fur-
ther compared the discriminative power of PSF and some classic colour and texture
descriptors. Overall, PSF-HSV-VI feature obtained the best classification accuracy
and SVM generally performed the best among the three tested classifiers.
(5) Colour and texture features contain complementary information and thus mo-
tivate the fusion of feature descriptors. The experimental results demonstrated the
effectiveness of the kernel PCA based feature fusion method. Intrinsic dimensional-
ity also played an important role to determine the optimal dimension of the fused
features. The fused feature showed significantly higher classification accuracy than
using each feature independently and the serial integrated feature.
124
Page 145
Chapter 6
Conclusion and Future Work
This thesis comprehensively investigated the use of aerial remote sensing and com-
puter vision techniques for power line corridor monitoring applications. Theoretically,
a biologically inspired spiking cortical model named pulse coupled neural network
(PCNN) was intensively studied and successfully applied to the specific aerial image
analysis application. Some novel algorithms in object detection and feature fusion
were also developed. The concepts proved in this thesis and the knowledge gained
from this research project offers a good reference to our industry partner and other
energy utilities who wants to improve their vegetation management activities. This
chapter summarizes the results and discussions by concluding the findings and making
recommendations for possible further research.
6.1 Summary of Findings and Contributions
The major findings and contributions of this thesis are summarized as follows:
• Vegetation management using aerial remote sensing techniques
This thesis presented a comprehensive study of vegetation management approaches in
power line corridor monitoring based on aerial remote sensing techniques. The results
125
Page 146
from a series of experiments demonstrated the potential of moving from traditional
vegetation management strategy to a more automated, accurate and cost-effective so-
lution using aerial remote sensing techniques. Regarding aerial platforms, unmanned
aerial systems (UASs) are supposed to be a future solution but the major limitation
of using UASs is their current ability to carry power-demanding and heavy payloads.
Regarding sensor options, the combination of high resolution multi-spectral camera
and LiDAR sensor is highly appreciated in data collection due to their complementary
nature of spectral and 3D geometry information.
• Simplification of biologically-inspired spiking cortical models and application to
aerial image analysis
As a biologically inspired spiking cortical model, the pulse coupled neural network
(PCNN) emulates the process of visual cortex and are recognized as powerful tools
for many image processing tasks. The major advantages of PCNNs are that they are
self-organized spatial-temporal-coding models which mimic real neurons better and
have more powerful computation performance than traditional neural network models
due to the use of time. In order to better analyze the pulse dynamics and control the
performance, the original PCNN model was simplified in this thesis. This developed
model was applied to tree crown segmentation from multi-spectral imagery and also
used to design the pulse coupled neural filter as a pre-processing tool for power line
detection from aerial imagery. The idea of using pulse spectral frequency of a multi-
spectral unit-linking PCNN for spectral-texture feature extraction gained success in
a serial of experiments on texture classification and tree species classification.
• Multi-source information fusion
Multi-source information fusion plays an important role to increase the reliability of
the information extraction for robust operational performance and decision making in
power line corridor monitoring (e.g. improved classification, increased confidence and
126
Page 147
reduced ambiguity). In this thesis, the fusion of multi-sensor data and multiple feature
vectors have been investigated. The integration of LiDAR data and multi-spectral
imagery was suggested for data collection due to the complementary information
contained the two types of sensor data and the data fusion showed the great advantage
in improving objection results, although it depends on the point density of LiDAR
data. The use of colour (spectral) and texture information collectively has strong
links with the human perception. In this thesis, feature fusion was investigated and
the developed method effectively combined colour and texture as a unified descriptor
and the experimental results demonstrated the improvement of the discriminative
power over using colour and texture features independently.
6.2 Future Work
The efforts and achievements in this thesis offer important knowledge in using bi-
ologically inspired image processing to assist vegetation management in power line
corridors. Possibilities for further improving the developed methods were also identi-
fied and recommended as future research directions:
• Real-time data processing
Currently, processing for object recognition has been designed for off-line use. Data
is collected, stored in a repository and analyzed at some later time. However, some
real-time processing would be beneficial as there is information available to assist in
the decisions made by the navigation and data storage systems. For example, the real-
time identification of regions outside the immediate vicinity of the power-line where
vegetation is sparse could be used to reduce the resolution of data stored. Questions
concerning the algorithms and computing architectures that are best suited to increase
the autonomy of a UAS capturing significant amounts of data need to be addressed. It
should be noted that PCNN has only local connections which make it quite plausible
127
Page 148
for electronic implementation. Therefore, a hardware implementation (e.g. using
FPGA) of the developed algorithms in this thesis will be a possible direction to
increase the autonomy of the system.
• 3D feature extraction from LiDAR data
LiDAR is a relatively young 3D measurement technique offering much potential in
the acquisition of precise 3D geodata and object geometries and it does not reply on
illumination conditions. Therefore, LiDAR is highly recommended for data collection
in power line corridor monitoring. Meanwhile, advanced and intelligent algorithms
need to be developed for accurate 3D feature extraction from LiDAR data. For
example, LiDAR data provides 3D information and make it possible to detect multi-
layer structured power lines and model the 3D catenary curve. Moreover, the 3D tree
structure parameters may also be helpful for identifying the species.
128
Page 149
References
Aardt, Jan A.N. van. 2000. Spectral separability among six southern tree species.
Ph.D. thesis.
Agganval, Nitin, & Karl, William Clem. 2000. Line detection in images through
regularized Hough transform.
Aggarwal, Nitin, & Karl, William Clem. 2006. Line detection in images through
regularized Hough transform. IEEE Transactions on Image Processing, 15(3), 582–
591.
Appelt, Paul J., & Goodfellow, John W. 2004. Research on how trees cause
interruptions- applications to vegetation management.
Bao, Yunfei, Li, Guoping, Cao, Chunxiang, Li, Xiaowen, Zhang, Hao, He, Qisheng,
Bai, Linyan, & Chang, Chaoyi. 2008. Classification of Lidar Point Cloud and
Generation of DTM from LiDAR Height and Intensity Data in Forested Area.
Bar-Cohen, Yoseph. 2006. Biomimetics: Biologically Inspired Technologies. Taylor &
Francis.
Bartels, Marc, Wei, Hong, & Mason, David C. 2006. DTM generation from LIDAR
data using skewness balancing.
Baudat, G., & Anouar, F. 2000. Generalized discriminant analysis using a kernel
approach. Neural Computation, 12(10), 2385–2404.
129
Page 150
Beck, Keith, & Mathieu, Renaud. 2004. Can power companies use space patrols to
monitor transmission corridors?
Beltrame, Alessandra M. Knopik, Jardini, Mauricio G. M., acbsen, Rogerio M., &
uintanilha, Jose Alberto. 2007. Vegetation identification and classification in the
domain limits of powerlines in Brazilian Amazon forest.
Beraldin, A.-Angelo, Blais, Francois, & Lohr, Uwe. 2010. Airborne and Terrestrial
Laser Scanning. Taylor & Francis. Chap. 1, pages 1–39.
Berni, Jose A. J., Zarco-Tejada, Pablo J., Suárez, Lola, & Fereres, Elias. 2009.
Thermal and Narrowband Multispectral Remote Sensing for Vegetation Monitoring
From an Unmanned Aerial Vehicle. IEEE Transactions on Geoscience and Remote
Sensing, 47(3), 722–738.
Bhatt, Rushi, Carpenter, Gail A., & Grossberg, Stephen. 2007. Texture segregation
by visual cortex: Perceptual grouping, attention, and learning. Vision Research,
47(25), 3173–3211.
Blaschke, T. 2010. Object-based image analysis for remote sensing. ISPRS Journal
of Photogrammetry & Remote Sensing, 65(1), 2–16.
Bleau, AndrÂŽe, & Leon, L. Joshua. 2000. Watershed-based segmentation and region
merging. Computer Vision and Image Understanding, 77(3), 317–370.
Brandtberga, Tomas, Warnera, Timothy A., Landenbergerb, Rick E., & McGraw,
James B. 2003. Detection and analysis of individual leaf-off tree crowns in small
footprint, high sampling density lidar data from the eastern deciduous forest in
North America. Remote Sensing of Environment, 85, 290–303.
Breiman, Leo. 2001. Random forests. Machine Learning, 45(1), 5–32.
130
Page 151
Brook, A., Kimmel, R., & Sochen, N.A. 2005. Variational segmentation for color
images.
Camastra, Francesco, & Vinciarelli, Alessandro. 2008. Machine learning for audio,
image and video analysis: theory and applications. Springer.
Cao, Bin, Shen, Dou, Sun, Jian-Tao, Yang, Qiang, & Chen, Zheng. 2007. Feature
selection in kernel space. In: Proceedings of the 24th International Conference on
Machine Learning.
Chust, Guillem, Galparsoro, Ibon, Borja, Angel, Franco, Javier, & Uriarte, Adolfo.
2008. Coastal and estuarine habitat mapping, using LIDAR height and intensity
and multi-spectral imagery. Estuarine, Coastal and Shelf Science, 78, 633–643.
Clausi, David A., & Deng, Huang. 2005. Design-based texture feature fusion us-
ing Gabor filters and co-occurrence probabilities. IEEE Transactions on Image
Processing, 14(7), 925–936.
Clode, Simon, & Rottensteiner, Franz. 2005. Classification of trees and powerlines
from medium resolution airborne laserscanner data in urban environments.
Coburn, C.A., & Roberts, A.C.B. 2004. A multiscale texture analysis procedure
for improved forest stand classification. International Journal of Remote Sensing,
25(20), 4287–4308.
Comer, M., & Delp, E. 1999. Segmentation of textured images using a multiresolution
Gaussian autoregressive models. IEEE Transactions on Image Processing, 8(3),
408–420.
Culvenor, D.S. 2003. Extracting individual tree information: a survey of techniques
for high spatial resolution imagery. Boston: Kluwer Academic Publishers. Chap. 9,
pages 255–277.
131
Page 152
David H. Hubel, Torsten N. Wiesel. 1998. Early exploration of the visual cortex.
Neuron, 20, 401–412.
Davies, E. R. 2009. Introduction to texture analysis. London, UK: Imperial College
Press. Pages 1–31.
Davis, Jesse, & Goadrich, Mark. 2006. The relationship between Precision-Recall and
ROC curve. Pages 233–240 of: The 23rd International Conference on Machine
Learning.
Deng, Yining, & Manjunath, B.S. 2001. Unsupervised segmentation of color-texture
regions in images and video. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 23(8), 800–810.
Eckhorn, R., Reiboeck, H.J., Arndt, M., & Dicke, P.W. 1989. A neural network for
feature linking via synchronous activity: Results from cat visual cortex and from
simulations. Cambridge: Cambridge University Press. Pages 255–272.
Erikson, M., & Olofsson, K. 2005. Comparison of three individual tree crown detection
methods. Machine Vision and Applications, 16(4), 258–265.
Erikson, Mats. 2003. Segmentation of individual tree crowns in colour aerial pho-
tographs using region growing supported by fuzzy rules. Canadian Journal of
Forest Research, 33(8), 1557–1563.
Fernandes, Leandro A.F., & Oliveira, Manuel M. 2008. Real-time line detection
through an improved Hough transform voting scheme. Pattern Recognition, 41,
299–314.
Gerstner, Wulfram. 2001. Pulsed Neural Networks. MIT Press. Chap. 1 Spiking
Neurons, pages 3–54.
132
Page 153
Golightly, I., & Jones, D. 2005. Visual control of an unmanned aerial vehicle for
power line inspection.
Gu, Xiaodong. 2008. Feature Extraction using Unit-linking Pulse Coupled Neural
Network and its Applications. Neural Process Letters, 27, 25–41.
Gurtner, Alex, Greer, Duncan G., Glassock, Richard R., Mejias, Luis, Walker, Rod-
ney, & Boles, Wageeh W. 2009. Investigation of fish-eye lenses for small-UAV
aerial photography. IEEE Transactions on Geoscience and Remote Sensing, 47(3),
709–721.
Haralick, R.M., Shanmugam, K., & Dinstein, I. 1973. Textural features for image
classification. IEEE Transactions on Systems, Man, and Cybernetics, 34(3), 610–
621.
Hay, G. Castilla G.J. 2008. Image objects and geographic objects. Springer. Pages
91–110.
Hay, G.J., & Castilla, G. 2008. Geographic Object-Based Image Analysis (GEOBIA):
A new name for a new discipline. Springer. Pages 75–89.
Hsu, Chih-Wei, & Lin, Chih-Jen. 2002. A comparison of methods for multiclass
support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.
Hsu, Chih-Wei, Chang, Chih-Chung, & Lin, Chih-Jen. 2008. A practical guide to SVM
classification (Technical Report). Tech. rept. Department of Computer Science,
National Taiwan University.
Huang, Xin, Zhang, Liangpei, & Li, Pingxiang. 2008. A multiscale feature fusion
approach for classification of very high resolution satellite imagery based on wavelet
transform. International Journal of Remote Sensing, 29(20), 5923–5941.
133
Page 154
Huete, A. R. 1988. A soil-adjusted vegetation index (SAVI). Remote Sensing of
Environment, 25, 295–309.
Ituen, I., Sohn, G., & Jenkins, A. 2008. A case study: workflow analysis of power
line systems for risk management. In: International Archives of Photogrammetry
and Remote Sensing, vol. 37.
Japkowicz, Nathalie. 2006. Why question machine learning evaluation method? Pages
6–11 of: AAAI workshop on Evaluation Methods for Machine Learning. AAAI
Press.
Jensen, John R. 2005. Classification based on object-oriented image segmentation.
Pearson Education. Pages 393–398.
Jenson, John R. 2005. Thematic information extraction: pattern recognition. Pearson
Education. Pages 337–406.
Jiang, Zhangyan, Huete, Alfredo R., Didan, Kamel, & Miura, Tomoaki. 2008. De-
velopment of a two-band enhanced vegetation index without a blue band. Remote
Sensing of Environment, 112, 3833–3845.
Johnson, John L. 1994. Pulse-coupled neural nets: translation, rotation, scale, distor-
tion and intensity signal invariance for images. Applied Optics, 33(26), 6239–6253.
Johnson, John L., & Padgett, Mary Lou. 1999. PCNN models and applications. IEEE
Transactions on Neural Networks, 10(3), 480–498.
Jones, D., Golightly, I., Roverts, J., Usher, K., & Earp, G. 2005. power line inspection
- a uav concept.
Jordan, C. F. 1969. Derivation of leaf area index from quality of light on the forest
floor. Ecology, 50, 663–666.
134
Page 155
Kobayashi, Yoshihiro, Karady, George G., Heydt, Gerald Thomas, & Olsen, Robert G.
2009. The utilization of satellite images to identify trees endangering transimission
lines. IEEE Transactions on Power Delivery, 24(3), 1703–1709.
Koch, B., Heyder, U., & Weinacker, H. 2006. Detection of individual tree crowns
in airborne lidar data. Photogrammetric Engineering & Remote Sensing, 72(4),
357–363.
Kuntimad, G., & Ranganath, H. S. 1999. Perfect image segmentation using pulse
coupled neural networks. IEEE Transactions on Neural Networks, 10(3), 591–598.
Lefsky, Michael A., & Cohen, Warren B. 2003. Selection of remotely sensed data.
Dordrecht: Kluwer Academic Publishers. Pages 13–46.
Li, Stan Z. 2009. Markov random field modeling in image analysis (3rd Edition).
Springer.
Li, Zhengrong, Hayward, Ross, Zhang, Jinglan, & Liu, Yuee. 2008. Individual
tree crown delineation techniques for vegetation management in power line cor-
ridor. Pages 148–154 of: Digital Image Computing: Techniques and Applications
(DICTA).
Liao, S., Law, Max W.K., & Chung, Albert C.S. 2009. Dominant local binary patterns
for texture classification. IEEE Transactions on Image Processing, 18(5), 1107–
1118.
Lindblad, T., & Kinser, J.M. 2005. Image processing using pulse-coupled neural net-
works. Second edn. Springer.
Lindblad, Thomas, & Kinser, Jason M. 1999. Inherent features of wavelets and pulse
coupled networks. IEEE Transactions on Neural Networks, 10(3), 607–614.
135
Page 156
Liu, Ying, Zhang, Dengsheng, Lu, Guojun, & Ma, Wei-Ying. 2006a. Study on tex-
ture feature extraction in region-based image retrieval system. Pages 264–271 of:
Proceedings of International Conference on Multimedia Modelling.
Liu, Yongxue, Li, Manchun, Mao, Liang, Xu, Feifei, & Huang, Shuo. 2006b. Review
of remotely sensed imagery classification patterns based on object-oriented image
analysis. Chinese Geographical Science, 16(3), 282–288.
Loncaric, Sven. 1998. A survey of shape analysis techniques. Pattern Recognition,
31(8), 983–1001.
Lu, D., & Weng, Q. 2007. A survey of image classification methods and techniques
for improving classification performance. International Journal of Remote Sensing,
28(5), 823–870.
Lu, M. L., & Kieloch, Z. 2008. Accuracy of transmission line modeling based on aerial
LiDAR survey. IEEE Transactions on Power Delivery, 23(3), 1655–1663.
Ma, Yide, Li, Lian, Wang, Yafu, & Dai, Ruolan. 2005. Principle and applications of
pulse-coupled neural networks. Beijing: Science Press.
Mandelbrot, B. B. 1983. The fractral geometry of nature. New York: W.H. Freeman.
Manjunath, B.S., & Ma, W.Y. 1996. Texture features for browsing and retrieval of
image data. IEEE Transactions on Pattern Analysis and Machine Intelligence,
18(8), 837–842.
Meng, Xuelian, Wang, Le, Silvan-Cardenas, Jose Luis, & Currit, Nate. 2009. A
multi-directional ground filtering algorithm for airborne LIDAR. ISPRS Journal
of Photogrammetry & Remote Sensing, 64, 117–124.
Meyer, Fernand. 1994. Topographic distance and watershed lines. Signal Processing,
38(1), 113 – 125.
136
Page 157
Mills, Henny. 2008. Analysis of The transferability of support vector machines for veg-
etation classification. The International Archives of the Photogrammetry, Remote
Sensing and Spatial Information Sciences, XXXVII, 557–563.
Mitchell, Tom. 1997. Machine Learning. McGraw Hill.
Mountrakis, Giorgos, Im, Jungho, & Ogole, Caesar. 2011. Support vector machines in
remote sensing: a review. ISPRS Journal of Photogrammetry & Remote Sensing,
in press.
Myneni, Ranga B., Hall, Forrest G., Sellers, Piers J., & Marshak, Alexander L. 1995.
The interpretation of spectral vegetation Indexes. IEEE Transactions on Geo-
science and Remote Sensing, 33(2), 481–486.
Najim, K. 2004. Stochastic processes : estimation, optimization, and analysis. Lon-
don: Kogan Page Science.
Ng, Jeffrey, Bharath, Anil A., & Zhaoping, Li. 2007. A survey of architecture and
function of the primary visual cortex (V1). EURASIP Journal on Advances in
Signal Processing, 2007, 124–141.
Ojala, Timo, Pietikainen, Matti, & Maenpaa, Topi. 2002. Multiresolution grey-scale
and rotation invariant texture classification with local binary patterns. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.
Polyak, Stephen. 1957. The vertebrate visual system. The University of Chicago Press.
Ranganath, H. S., & Kuntimad, G. 1999. Object detection using pulse coupled neural
networks. IEEE Transactions on Neural Networks, 10(3), 615–620.
Rautiainen, Miina. 2005. The spectral signature of coniferous forests: the role of stand
structure and leaf area index. Ph.D. thesis.
137
Page 158
Rouse, J. W., Haas, R. H., Schell, J. A, & Deering, D. W. 1973. Monitoring vegetation
systems in the great plains with ERTS. Pages 309–317 of: The 3rd Earth Resources
Technology Satellite-1 Symposium. Scientific and Technical Information Office.
Scholkopf, Bernhard, & Smola, Alexander J. 2001. Learning with kernels. The MIT
Press.
Schwarz, Michael W., Cowan, William B., & Beatty, John C. 1987. An experimental
comparison of RGB, YIQ, LAB, HSV, and opponent color models. ACM Transac-
tions on Graphics, 6(2), 123–158.
Sherrod, Phillip H. 2009. DTREG predictive modeling software (Users Manual).
Shi, Weiren, Li, Zuojin, & Shi, Xin. 2009. A survey of biologically inspired image
processing for objects recognition. International Journal of Image and Graphics,
9(4), 495–509.
Soille, P. 2003. Morphological image analysis. Springer. Pages 189–209.
Song, Dongjin, & Tao, Dacheng. 2010. Biologically inspired feature manifold for scene
classification. IEEE Transactions on Image Processing, 19(1), 174–184.
Stewart, Robert D., Fermin, Iris, & Opper, Manfred. 2002. Region growing with
pulse-coupled neural networks: an alternative to seeded region growing. IEEE
Transactions on Neural Networks, 13(6), 1557–1562.
Stricker, Markus, & Orengo, Markus. 1995. Similarity of color images. Pages 381–392
of: SPIE Conference on Storage and Retrieval for Image and Video Databases, vol.
2420.
Sun, Changming, Jones, Ronald, Wu, Hugues Talbot Xiaoliang, Cheong, Kevin,
Beare, Richard, Buckley, Michael, & Berman, Mark. 2006. Measuring the distance
138
Page 159
of vegetation from powerlines using stereo vision. ISPRS Journal of Photogramme-
try & Remote Sensing, 60(4), 269–283.
Turner, M. 1986. Texture discrimination by Gabor functions. Biological Cybernetics,
55, 71–82.
Vreeken, Jilles. 2003. Spiking neural networks, an introduction. Tech. rept. Utrecht
University.
Waldemark, Karina, Lindblad, Thomas, Becanovic, Vlatko, Guillen, Jose L.L., &
Klingner, Phillip L. 2000. Patterns from the sky Satellite image analysis using
pulse coupled neural networks for pre-processing, segmentation and edge detection.
Pattern Recognition Letters, 21, 227–237.
Wang, Yakun, Soh, Young Sung, & Schultz, Howard. 2006. Individual tree crown
segmentation in aerial forestry images by mean shift clustering and graph-based
cluster merging. International Journal of Computer Science and Network Security,
6(11), 40–45.
Wang, Zhaobin, Ma, Yide, Cheng, Feiyan, & Yang, Lizhen. 2010. Review of pulse-
coupled neural networks. Image and Vision Computing, 28, 5–13.
Watt, A., & Polocarpo, F. 1998. Image segmentation. Addison Wesley Longman
Limited. Pages 290–311.
Whelan, Paul F., & Ghita, Ovidiu. 2009. Color texture analysis. Imperial College
Press. Pages 129–164.
Wood, David. 2007. Risk Assessment for Rare & Threatened species near Ergon
powerlines. Tech. rept. Ergon Energy.
Xie, Xianghua, & Mirmehdi, Majid. 2009. A galaxy of texture features. Imperial
College Press. Pages 375–406.
139
Page 160
Xie, Xudong, & Lam, Kin-Man. 2006. Gabor-based kernel PCA with doubly nonlinear
mapping for face recognition with a single face image. IEEE Transactions on Image
Processing, 15(9).
Yan, Gao, Mas, J.F., Maathuis, B.H.P., Zhang, Xiangmin, & Dijk, PM. Van. 2006.
Comparison of pixel-based and object-oriented image classification approaches - a
case study in a coal fire area, Wuda, Inner Mongolia, China. International Journal
of Remote Sensing, 27(18), 4039–4055.
Yan, Guangjian, Li, Chaoyang, Zhou, Guoqing, Zhang, Wuming, & Li, Xiaowen.
2007. Automatic extraction of power lines from aerial images. IEEE Geoscience
and Remote Sensing Letters, 4(3), 387–391.
Yang, Jian, Yang, Jing-yu, Zhang, David, & Lu, Jian-feng. 2003. Feature fusion:
parallel strategy vs. serial strategy. Pattern Recognition, 36, 1369–1381.
Yoon, Jong-Suk, Shin, Jung-Il, & Lee, Kyu-Sung. 2008. Land Cover Characteristics
of Airborne LiDAR Intensity Data: A Case Study. IEEE Geoscience and Remote
Sensing Letters, 5(4), 801–805.
Zeki, Semir. 1993. A vision of the brain. London: Blackwell Scientific.
Zhan, Kun, Zhang, Hongjuan, & Ma, Yide. 2009. New Spirking Cortical Model for
Invariant Texture Retrieval and Image Processing. IEEE Transactions on Neural
Networks, 20(12), 1980–1986.
Zhang, Junying, Dong, Jiyang, & Shi, Meihong. 2005. An adaptive method for image
filtering with pulse-coupled neural networks.
Zhang, Liangpei, Zhao, Yindi, Huang, Bo, & Li, Pingxiang. 2008. Texture feature
fusion with neighborhood oscillating tabu search for high resolution image. Pho-
togrammetric Engineering & Remote Sensing, 74(12), 1585–1596.
140
Page 161
Zhang, Y. 2006. An overview of image and video segmentation in the last 40 years.
Hershey: IRM Press. Pages 1–15.
Zhou, Guoqing, Ambrosia, Vince, Gasiewski, Albin J., & Bland, Geoff. 2009. Fore-
word to the Special Issue on Unmanned Airborne Vehicle (UAV) Sensing Systems
for Earth Observations. IEEE Transactions on Geoscience and Remote Sensing,
47(3), 687–689.
Zhu, Guobin, & Blumberg, Dan G. 2002. Classification using ASTER data and SVM
algorithms; The case study of Beer Sheva, Israel. Remote Sensing of Environment,
80(2), 233–240.
141