Aerial Image Analysis Using Spiking Neural Networks ... - CORE

Aerial Image Analysis Using SpikingNeural Networks with Application toPower Line Corridor Monitoring

Zhengrong LiFaculty of Science and Technology

Queensland University of Technology

A thesis submitted for the degree of

Doctor of Philosophy

May 2011

x

Statement of Originality

The work contained in this thesis has not been previously submitted to

meet requirements for an award at this or any other higher education

institution. To the best of my knowledge and belief, the thesis contains

no material previously published by another person except where due

reference in made.

Signature

Date

Administrator

文本框

31/05/2011

Administrator

图章

-

-

-

-

-

-

Dedicated to my wife and parents

Acknowledgements

First, I would like to thank my supervisors Dr. Ross Hayward, Prof. Rodney

Walker and Dr. Jinglan Zhang, for their guidance and support during my candi-

dature. I really appreciated all the discussion with my supervisors, not only the

scientific ones but also the nonscientific ones. I am grateful that the university and

my supervisors provide me the opportunity to join the world-level research group, an

excellent working environment and great opportunities for collaboration with external

institutions. Additionally, the Cooperative Research Centre for Spatial Information

(CRCSI) and the Australian Research Centre for Aerospace Automation (ARCAA)

played a large part in providing me with this excellent opportunity.

I was lucky to have enough research funding for data collection, field survey and

attending several international conferences. I have really appreciated the communi-

cation with many world-level researchers through attending the conferences. I was

also fortunate to visit Prof. Wolfgang Förstner’s lab at the Institute of Geodesy and

Geoinformation, University of Bonn in Germany.

I would like to thank my colleagues in the CRCSI research project: Dr. Jinhai

Cai, Dr. Troy Bruggemann, Dr. Luis Mejias, Dr. Jason Ford, David Zuill, Marcos

Gerardo and Steven Mills. I also acknowledge the assistance of David Wood from

Ergon Energy, Bred Jeffers from Greening Australia and George Curran from CRCSI.

Financially, my doctorate scholarship was supported by QUT and CRCSI. Addi-

tionally, I have received a Chinese government award for outstanding PhD students

abroad from Chinese Scholarship Council (CSC), and also a travel grant from IEEE

Signal Processing Society (SPS) during my study. In this regard, I need to thank all

the financial support from these institutions.

Last but not least, I would like to thank my wife, my parents and all the friends

for their love, attention, and continuous support during my study.

5

x

6

Abstract

Trees, shrubs and other vegetation are of continued importance to the

environment and our daily life. They provide shade around our roads

and houses, offer a habitat for birds and wildlife, and absorb air pollu-

tants. However, vegetation touching power lines is a risk to public safety

and the environment, and one of the main causes of power supply prob-

lems. Vegetation management, which includes tree trimming and vegeta-

tion control, is a significant cost component of the maintenance of elec-

trical infrastructure. For example, Ergon Energy, the Australia’s largest

geographic footprint energy distributor, currently spends over $80 million

a year inspecting and managing vegetation that encroach on power line

assets. Currently, most vegetation management programs for distribution

systems are calendar-based ground patrol. However, calendar-based in-

spection by linesman is labour-intensive, time consuming and expensive.

It also results in some zones being trimmed more frequently than needed

and others not cut often enough. Moreover, it’s seldom practicable to

measure all the plants around power line corridors by field methods. Re-

mote sensing data captured from airborne sensors has great potential in

assisting vegetation management in power line corridors.

This thesis presented a comprehensive study on using spiking neural net-

works in a specific image analysis application: power line corridor mon-

itoring. Theoretically, the thesis focuses on a biologically inspired spik-

ing cortical model: pulse coupled neural network (PCNN). The original

PCNN model was simplified in order to better analyze the pulse dynamics

and control the performance. Some new and effective algorithms were de-

veloped based on the proposed spiking cortical model for object detection,

image segmentation and invariant feature extraction. The developed algo-

rithms were evaluated in a number of experiments using real image data

collected from our flight trails. The experimental results demonstrated the

effectiveness and advantages of spiking neural networks in image process-

ing tasks. Operationally, the knowledge gained from this research project

offers a good reference to our industry partner (i.e. Ergon Energy) and

other energy utilities who wants to improve their vegetation management

activities. The novel approaches described in this thesis showed the poten-

tial of using the cutting edge sensor technologies and intelligent computing

techniques in improve power line corridor monitoring. The lessons learnt

from this project are also expected to increase the confidence of energy

companies to move from traditional vegetation management strategy to a

more automated, accurate and cost-effective solution using aerial remote

sensing techniques.

8

Keywords

• Biologically inspired image processing

• Pulse coupled neural network

• Power line corridor monitoring

• Aerial remote sensing

• Vegetation management

• Geographic object based image analysis

• Image segmentation

• Visual feature extraction

• Machine learning

10

Contents

1 Introduction 1

1.1 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Vegetation Management in Power Line Corridors . . . . . . . 1

1.1.2 Advanced Remote Sensing Techniques . . . . . . . . . . . . . 3

1.1.3 Biologically Inspired Image Processing . . . . . . . . . . . . . 5

1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Research Outcomes and Contributions . . . . . . . . . . . . . . . . . 8

1.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.5 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Remote Sensing Data Collection 15

2.1 Remote Sensing Platforms . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Remotely Sensed Data . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 Optical Remote Sensing Imagery . . . . . . . . . . . . . . . . 18

2.2.2 Airborne Laser Scanning data . . . . . . . . . . . . . . . . . . 20

2.3 Data Collection in Power Line Corridors . . . . . . . . . . . . . . . . 21

2.3.1 Aerial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.2 Field Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

i

3 Object Detection and Segmentation 29

3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.1 Geographic Object Based Image Analysis . . . . . . . . . . . . 30

3.1.2 Pulse Coupled Neural Networks . . . . . . . . . . . . . . . . . 33

3.2 Power Line Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.1 Characteristics of Power Lines . . . . . . . . . . . . . . . . . . 38

3.2.2 Design of Pulse Coupled Neural Filter . . . . . . . . . . . . . 40

3.2.3 Knowledge-based Line Clustering in Hough Space . . . . . . . 46

3.3 Individual Tree Crown Detection and Delineation . . . . . . . . . . . 50

3.3.1 Spectral Properties of Vegetation . . . . . . . . . . . . . . . . 50

3.3.2 Initial Tree Crown Segmentation . . . . . . . . . . . . . . . . 52

3.3.3 Decomposition of Tree Clusters Using Watershed Algorithm . 56

3.4 Combining LiDAR Data and Multi-spectral Imagery for Improved Tree

Crown Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.4.1 Ground Filtering Using Statistical Analysis . . . . . . . . . . . 59

3.4.2 Region-level Fusion of LiDAR and Georeferenced Multi-spectral

Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4 Visual Feature Extraction and Data Classification 67

4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.1.1 Major Steps for Thematic Information Extraction From Imagery 67

4.1.2 Visual Feature Extraction . . . . . . . . . . . . . . . . . . . . 69

4.2 Spectral-Texture Feature Extraction Using PCNN . . . . . . . . . . . 75

4.2.1 Multi-spectral Unit-linking PCNN . . . . . . . . . . . . . . . . 76

4.2.2 Properties and Behaviors of multi-spectral PCNN . . . . . . . 78

4.2.3 Rotational and Scale Invariant Feature Extraction Using Pulse

Spectral Frequency . . . . . . . . . . . . . . . . . . . . . . . . 80

ii

4.3 Colour and Texture Feature Fusion . . . . . . . . . . . . . . . . . . . 82

4.3.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.3.2 Feature Fusion Based on Kernel PCA . . . . . . . . . . . . . . 84

4.3.3 Intrinsic Dimensionality Estimation . . . . . . . . . . . . . . . 86

4.4 Machine Learning Based Classification . . . . . . . . . . . . . . . . . 87

4.4.1 Multilayer Perceptron Neural Networks . . . . . . . . . . . . . 88

4.4.2 Decision Tree Forest . . . . . . . . . . . . . . . . . . . . . . . 89

4.4.3 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . 90

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5 Experiments and Results 97

5.1 Object Detection and Segmentation . . . . . . . . . . . . . . . . . . . 97

5.1.1 Power Line Detection . . . . . . . . . . . . . . . . . . . . . . . 97

5.1.2 Individual Tree Crown Segmentation . . . . . . . . . . . . . . 102

5.1.3 Fusion of LiDAR and Multi-spectral Imagery . . . . . . . . . . 106

5.2 Feature and Classifier Evaluation . . . . . . . . . . . . . . . . . . . . 110

5.2.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.2.2 Performance Measure . . . . . . . . . . . . . . . . . . . . . . . 112

5.2.3 Evaluation of PSF feature in Rotation and Scale Invariant Tex-

ture Classification . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.2.4 Evaluation of Features and Classifiers for Tree Species Classifi-

cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

5.2.5 Evaluation of Colour and Texture Feature Fusion . . . . . . . 120

5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

6 Conclusion and Future Work 125

6.1 Summary of Findings and Contributions . . . . . . . . . . . . . . . . 125

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

iii

x

iv

List of Figures

2.1 Optical passive remote sensing from satellite platforms . . . . . . . . 19

2.2 The basic components and principle of airborne laser scanning . . . . 21

2.3 The UAS platforms and examples of the collected data . . . . . . . . 22

2.4 Data collections systems from commercial data providers . . . . . . . 24

2.5 An example of the collected multi-spectral imagery . . . . . . . . . . 24

2.6 An example of the collected LiDAR point cloud data . . . . . . . . . 25

2.7 Experiment Test Site . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.8 Vegetation Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1 Two segmentation levels overlaid a colour aerial image . . . . . . . . . 32

3.2 Visual cortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 The Eckhorn-type neuron . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 The structure of a PCNN neuron . . . . . . . . . . . . . . . . . . . . 37

3.5 Power lines from different perspectives . . . . . . . . . . . . . . . . . 39

3.6 Linking weight matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.7 Original image and 7 pulse outputs of PCNN . . . . . . . . . . . . . . 43

3.8 The structure of pulse coupled neural filter . . . . . . . . . . . . . . . 45

3.9 Comparison of Canny filter, Sobel filter and PCNF . . . . . . . . . . 45

3.10 Voting procedures and the 3D visualization of voting maps . . . . . . 47

3.11 An example of power line detection results . . . . . . . . . . . . . . . 49

3.12 Example of tree crown and shadow in a RGB image . . . . . . . . . . 51

v

3.13 Comparison of two vegetation indexes . . . . . . . . . . . . . . . . . . 54

3.14 Tree crown segmentation from CIR image . . . . . . . . . . . . . . . 55

3.15 Comparison of initial tree crown segmentation and the ground truth . 57

3.16 Tree cluster decomposition using watershed algorithm . . . . . . . . . 58

3.17 Change sequence of skewness and kurtosis . . . . . . . . . . . . . . . 62

3.18 Framework of LiDAR and georeferenced multi-spectral imagery fusion

for individual tree crown segmentation . . . . . . . . . . . . . . . . . 65

4.1 Tree crown shapes from triple views . . . . . . . . . . . . . . . . . . . 69

4.2 Electromagnetic spectrum . . . . . . . . . . . . . . . . . . . . . . . . 71

4.3 Gabor filter response to θ = 0◦, 60◦, 120◦, 180◦ . . . . . . . . . . . . . 74

4.4 Example of binary code calculation in a neighbourhood . . . . . . . . 75

4.5 The structure of the multi-spectral PCNN . . . . . . . . . . . . . . . 77

4.6 Periodical pulse of the neuron (i, j) . . . . . . . . . . . . . . . . . . . 79

4.7 Geometry for scale invariance . . . . . . . . . . . . . . . . . . . . . . 81

4.8 Framework of object-level colour and texture feature fusion . . . . . . 83

4.9 A multilayer perceptron neural network . . . . . . . . . . . . . . . . . 89

4.10 A linear SVM example . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.11 Tradeoff between underfitting and overfitting . . . . . . . . . . . . . . 94

5.1 Comparison of power line detection results . . . . . . . . . . . . . . . 99

5.2 Power line detection results . . . . . . . . . . . . . . . . . . . . . . . 100

5.3 Failure examples for power line detection . . . . . . . . . . . . . . . . 101

5.4 Multi-layer powerlines and crossing powerlines . . . . . . . . . . . . . 101

5.5 Ground truth and segmentation results . . . . . . . . . . . . . . . . . 104

5.6 A failure example of individual tree crown delineation . . . . . . . . . 105

5.7 Comparison of LIDAR intensity and height data by skewness and kur-

tosis analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

vi

5.8 A pair of CIR image and LiDAR point cloud data in urban area . . . 108

5.9 Fusion of LiDAR and multi-spectral imagery for tree crown delineation 109

5.10 Illustration of ROC space analysis . . . . . . . . . . . . . . . . . . . . 114

5.11 Examples of texture images and their PSF features . . . . . . . . . . 117

5.12 Analysis of different feature descriptors in ROC space . . . . . . . . . 120

5.13 Classification accuracies of the fused features at different dimensions . 121

vii

List of Tables

5.1 Quantitative comparison of three segmentation algorithms . . . . . . 105

5.2 Quantitative analysis of individual tree crown detection and delineation 105

5.3 A confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.4 Accuracies of PSF and LBP in texture classification (in percent) . . . 116

5.5 Averaging computational costs of PSF and LBP (in seconds) . . . . . 116

5.6 Overall classification accuracies of PSF and texture features (in percent)119

5.7 Overall classification accuracies of colour histogram and PSF features

in multiple spectral bands (in percent) . . . . . . . . . . . . . . . . . 119

5.8 The classification results of single and fused colour and texture features 122

5.9 The confusion matrix of SVM classification using the fused feature . . 122

5.10 The classification results of single and fused PSF-HSV and LBP features123

viii

List of Abbreviations

GEOBIA Geographic Object-based Image Analysis

LiDAR Light Detection and Ranging

UASs Unmanned Aerial Systems

GSD Ground Sample Distance

AGL Above Ground Level

NIR Near Infrared

CIR Colour Infrared

VI Vegetation Index

NDVI Normalized Difference Vegetation Index

SAVI Soil Adjusted Vegation Index

EVI2 2-band Enhanced Vegetation Index

RVI Ratio Vegetation Index

PCNN Pulse Coupled Neural Network

LBP Local Binary Pattern

GLCM Grey-level Cooccurence Matrix

PSF Pulse Spectral Frequency

MLP Multilayer Perceptron Neural Network

DTF Decision Tree Forest

SVM Support Vector Machine

PCA Principal Component Analysis

ix

MLE Maximum Likelihood Estimator

ROC Receiver Operating Characteristic

FPR False Positive Rate

TPR Ture Positive Rate

x

Chapter 1

Introduction

Vegetation management in power line corridors is essential for the preservation of

public safety, environment and reliability of electricity supply. Remote sensing tech-

nologies have great potential to provide a more reliable and cost-effective solution.

In this chapter, the background, the research objectives and outcomes as well as the

outline of this thesis are introduced.

1.1 Research Background

1.1.1 Vegetation Management in Power Line Corridors

Surveillance and maintenance of electrical infrastructure is a critical issue for the

reliability of electricity transmission. One of the most important tasks is the veg-

etation management of power line corridors. Efficient vegetation management not

only reduces the overall cost but also aids in continuous electricity supply. Ineffective

vegetation management has lead to loss of reliability of electricity transmission and

produce serious hazards. For example, contact of trees with power lines has caused

power outages and significant wild land fires in Canada and USA in 2003 [Appelt &

Goodfellow (2004); Beck & Mathieu (2004)] .

1

Management of vegetation around power lines is essential for preservation of public

safety, the environment and reliability of electricity supply. Vegetation management

including tree trimming and vegetation control is a significant cost component of the

maintenance of electrical infrastructure. For example, Ergon Energy, the Australia’s

largest geographic footprint energy distributor, currently spends over $80 million a

year inspecting and managing vegetation that encroaches on power line assets. Ergon

Energy maintains one of the largest electrical distribution systems in the world, cov-

ering over one million square kilometers, including approximately 150,000 kilometers

of poles and wires. Queensland is subject to extreme weather conditions, ranging

from drought to cyclones. Dry conditions increase the risk of fire. Strong winds

and waterlogged ground can result in trees falling across, and bringing down power

lines, especially when inappropriate vegetation species are growing too close to power

lines. Correct and efficient vegetation management not only reduces the overall cost

but also aids in continuous electricity supply by preventing damage to power lines

through removal of the tall-growing trees. Ineffective procedures can result in the loss

of reliability in electricity transmission, produce serious hazards and expose electrical

companies to significant financial penalties.

The reliability of electricity supply and distribution is of the highest priority in

power line corridor monitoring. A short-term strategy is to identify and remove

nearby objects (i.e. buildings and vegetation) around power lines. In urban areas,

vegetation encroachment is less serious than in rural areas as access is much easier

and prompt maintenance can be achieved. Moreover, local councils and private land

owners regularly maintain their trees facilitating the overall maintenance process.

Generally, the risk of man-made structures can be controlled through building reg-

ulations. However, in rural areas, inspection and maintenance becomes difficult due

to limited access and large distances to cover. Vegetation is naturally growing and

particularly in rural areas the growth is unman aged. Strong winds and storms can

2

bring branches or even entire trees into contact with power lines. These problems

motivate detection of all vegetation that have the potential to pose risks to power

lines and use this to guide the field workers for vegetation clearance in the corridors.

Ergon Energy has a long-time strategy of managing vegetation according to different

species. The species they are interested in can be generally categorized as desirable

species and undesirable species. Species with fast growth rates and that also have the

potential to reach a mature height of more than four meters are defined as undesirable

species (Wood, 2007). These undesirable species often pose high risks to electrical

infrastructure and therefore should be identified and removed. It is also worth men-

tioning that in the long-term maintenance strategy, low-growing trees or shrubs are

encouraged because they are expected to compete with tall growing species and de-

prive immature taller trees of light and nutrients. These low growing species, along

with the rare and endangered species, are defined as desirable species that should be

managed differently.

In rural areas, traditional calendar-based tree trimming is performed by contrac-

tors. This process is time consuming, labor-intensive and expensive. It also results in

zones being trimmed more frequently than others, or not cut often enough. Satellites

and aerial vehicles can pass over more regularly and automatically than the ground

patrol and therefore, remote sensing approaches have great potential in assisting veg-

etation management in power line corridors.

1.1.2 Advanced Remote Sensing Techniques

Remote sensing is the stand-off collection through the use of a variety of devices

for gathering information on a given object or area 1. Actually, aerial vehicles had

been intensively used in power line inspection for a long period. The present practice

is to fly helicopters/airplanes along the corridor and try to identify dangerous trees1HTTP://en.wiped.org/Viki/Remote_sensing

3

and assess the condition of overhead lines assets by visual observation. Such visual

inspection is time consuming and labour intensive. Over the past decades, a number

of ideas have come forth seeking to reduce this workload. These include improved

data collection using satellite sensors (Kobayashi et al., 2009; Beltrame et al., 2007),

airborne laser scanning system (Lu & Kieloch, 2008; Clode & Rottensteiner, 2005),

stereo vision system (Sun et al., 2006), and unmanned aerial systems (UASs) (Jones

et al., 2005).

Satellites and air crafts are the most widely used platforms for remote sensing

in earth observing data collection. Current satellite sensors are not the best choice

for monitoring power line corridors due to two critical limitations: the unfavorable

revisit time and lack of choices in optimum spatial and spectral resolutions. At the

most practical level, most collections of data gathered from satellites are available

only on predetermined schedules, and even those with an “on-demand” capability are

also limited by their orbits and the demands of other users. In contrast, airborne data

collection offers a much greater level of flexibility. An airborne system can capture

data at any time of the day whereas satellites generally pass over one site at the same

time of a day. Another advantage of airborne platform is that different sensor payloads

can be easily fitted, while the sensors launched on a satellite are often not changeable.

As a consequence, airborne systems can be regularly upgraded as sensor technology

advances. Improvements to sensors include systems with higher spectral and spatial

resolution, and advanced microwave or LiDAR sensors. In addition, higher spatial

resolutions are easier to obtain from airborne platforms, due to their low altitude. A

limitation which impedes large-scale airborne remote sensing applications is that the

traditional piloted airborne platforms involve high operational costs. Moreover, using

piloted aircraft for power line inspection will place the operators at a greater level of

risk.

Remote sensors mounted on unmanned aerial systems (UASs) could fill this gap,

4

providing a cheap and flexible way to gather spatial data from power line corridors

which can also meet the requirements of spatial, spectral, and temporal resolutions.

Recent development in the aerial vehicles themselves and associated sensing system

make UAS platforms increasingly attractive for both research and operational map-

ping (Berni et al., 2009; Gurtner et al., 2009). One of the main limitations of using

UASs is their ability to carry power-demanding and heavy payloads. Airborne laser

scanning systems (LiDAR) are too heavy for small/medium sized UAS platforms.

This limitation may be overcome in the near future as smaller LiDAR systems be-

come more readily available in the market. However, the performance of these units

in terms of the quality of data collected is currently well away from their full-sized

counterparts. When combined with LiDAR type systems UASs would represent the

technology of choice for future uses.

Aerial remote sensing is a fast and cost-effective technique for power line corridor

mapping. There are some commercial data providers available in the market that

can collect high quality LiDAR data and referenced imagery for general mapping

purposes. Compared to corridor mapping, automated and intelligent information

extraction from remotely sense data is more challenging. One special need for power

line corridor monitoring is to detect the objects of interest for further interpretation

and decision making. The major objects of interest include power line assets and

vegetation. Automated data processing aims to automatically detect these objects

from aerial imagery, and try to extract more specific information such as vegetation

species and height information.

1.1.3 Biologically Inspired Image Processing

Biology offers a great model for mimicking, copying and learning, and also serves as

inspiration for many new technologies (Bar-Cohen, 2006). Humans have learned much

from biology and the results offers enormous potential for inspiring new capabilities

5

for exciting technologies. There are numerous examples of the success of biologically

inspired technologies. Examples in engineering include the hulls of boats imitating

the thick skin of dolphins; sonar, radar, and medical ultrasound imaging imitating

the echolocation of bats. In the field computer science, great success has also been

achieved by using biologically inspired technologies. As a branch of computer sci-

ence, artificial intelligence tries to understand the mechanisms underlying thought

and intelligent behavior and studies the computational requirements for allow the

development of systems that perform such tasks as perception, reasoning and learn-

ing. For example, artificial immune systems have been applied to protect computers

from malicious viruses; artificial neural networks have been used for weather forecast-

ing and data classification; evolutionary algorithms have been successfully used for a

number of optimization process.

Image processing has been a science for decades. Since fast and cheap comput-

ers and signal processors become available, digital image processing has become the

most common form of image processing and generally, is used because it is not only

efficient, but also the cheapest. However, even though many scientists are working

in this area and numerous computer based image processing algorithms have been

developed, progress towards achieving recognition capability similar to humans has

been very slow. Computer algorithms can perform specific functions well but it is

difficult for machines to perform robust recognition tasks. In contrast, the human

vision system has outstanding ability to recognize and classify objects in a variety

of complex environments. For example, humans can recognize different plant species

after seeing only a few examples, even when the species are very similar. Humans can

also recognize a car with different sizes, shapes and colours. This excellent recognition

ability also holds true for many other animals. It is obvious that humans use many

elegantly structured process to achieve their image processing goals and we are begin-

ning to understand only a few of them (Lindblad & Kinser, 2005). Current computer

6

algorithms are incredibly simple compared to what we know of the biological systems

and the algorithms fail in attempting to perform image recognition at the level of

a human. Therefore, emulation of some biological systems is necessary to advance

current computer vision systems.

One important step towards bio-inspired computer vision systems is to emulate the

process of visual cortex. In the past decades, visual cortex theory has been intensively

studied and many computational models have been proposed (David H. Hubel, 1998;

Ng et al., 2007). Although there still exists significant debate on the theory, the

known processes of visual cortex has already lead to new tools in image processing

and recognition. This thesis focuses on the theory and application of one spiking

neural network model: the pulse coupled neural network (PCNN). Spiking neural

networks are considered as the third generation of neural network models, which

increase the level of realism in a neural simulation Gerstner (2001). Biological neurons

use short and sudden increases in voltage to send information. These signals are more

commonly known as action potentials, spikes or pulses. Neurological research has

shown that neurons encode information in the timing of single spikes, and not only

just in their average firing frequency Vreeken (2003). Spiking neural networks raise

the level of biological realism by using individual spikes, which allows incorporating

spatial-temporal information in communication and computation, like real neurons

do. A PCNN is a kind of spiking neural network developed based on the Eckhorn

model, which is inspired from the phenomena of synchronous pulse bursts in the

cat visual cortex (Eckhorn et al., 1989). PCNNs are spatial-temporal-coding models

which attract much attention from researchers in that they mimic real neurons better

and have more powerful computation performance than traditional neural network

models due to the use of time. PCNNS can be applied to a variety of image processing

applications, such as image segmentation, edge detection, feature generation, noise

reduction, etc. (Lindblad & Kinser, 2005). More discussions of this biologically

7

inspired image processing model are given in the following chapters of this thesis.

1.2 Research Objectives

The overall objective of this study is to develop novel and effective computer vision

algorithms in the context of an aerial remote sensing application: vegetation man-

agement in power line corridors. Addressing the problem in this specific application

requires solving a number of sub-problems spanning many disciplines including remote

sensing, image processing and machine learning. A wide range of domain knowledge

is also required to better understand the problems. Specifically, the objectives of this

research project are:

• To review the existing technologies for power line corridor monitoring and to

identify appropriate platforms and sensors for operational data collection;

• To develop new and effective algorithms for object detection and segmentation

from the collected remote sensing data ;

• To develop new feature extraction method using biologically inspired spiking

cortical models which can better model the objects of interest in the specific

classification tasks.

1.3 Research Outcomes and Contributions

This research is the first comprehensive study of using remote sensing techniques in

power line corridor vegetation management. Theoretically, the thesis focuses on a

biologically inspired spiking neural network model: the pulse coupled neural network

(PCNN). The original PCNN model was simplified in order to better analyze the

pulse dynamics and control the performance. Some new and effective algorithms were

developed based on the developed spiking neural network model for object detection,

8

image segmentation and invariant feature extraction. Several journal and conference

publications are also generated from this project. Operationally, the knowledge gained

from this research project offers a good reference to our industry partner (i.e. Ergon

Energy) and other energy utilities who want to improve their vegetation management

activities. The novel approaches described in this thesis show the potential of using

the cutting edge technologies to reduce the cost of power line corridor monitoring and

will help the energy companies to improve their traditional vegetation management

strategy.

• Several aerial platforms and sensors were evaluated in order to collect high-

quality data in power line corridors. High spatial resolution natural colour and

multi-spectral imagery as well as laser scanning data are collected using pi-

loted and unmanned aerial platforms. These data provides a perfect simulation

environment for developing and evaluating the data processing algorithms.

• A novel method is developed specifically for power line detection from aerial

images. A pulse couple neural filter is developed to remove the background

noise and generate an edge map prior to the Hough transform being employed

to detect straight lines. An improved Hough transform is used by performing

knowledge-based line clustering in Hough space to refine the detection results.

• Individual tree crown detection and delineation from multi-spectral imagery is

achieved by applying a PCNN in spectral feature space and a watershed algo-

rithm in the post-processing stage. Multi-spectral imagery and laser scanning

data are combined through a region-level fusion method to further improve the

segmentation results.

• A biologically inspired spectral-texture feature extraction method is developed

by using pulse spectral frequency (PSF) of a pulse coupled neural network. The

PSF feature is rotation and scale invariant and it has been evaluated against

9

several classic feature descriptors in texture classification as well as vegetation

species classification.

1.4 Publications

Journal Papers

[1] Z. Li, Y. Liu, R. Walker, R. Hayward, J. Zhang, "Towards automatic power

line detection for a UAV surveillance system using pulse coupled neural filter and an

improved Hough transform." Machine Vision and Applications, vol. 21, pp. 677-686,

2010.

[2] S. Mills, M. Gerardo, Z. Li, J. Cai, R. Hayward, L. Mejias, R. Walker, "Eval-

uation of aerial remote sensing techniques for vegetation management in power line

corridors," IEEE Transactions on Geoscience and Remote Sensing, vol. 48, pp. 3379-

3390, 2010.

[3] Z. Li, R. Hayward, R. Walker, Y. Liu, "A Biologically Inspired Object Spectral-

Texture Descriptor and Its Application to Vegetation Classification in Power Line

Corridors" IEEE Geoscience and Remote Sensing Letters, vol. 8, pp. 631-635, 2011.

[4] Z. Li, R. Hayward, Y. Liu, R. Walker "Spectral and Texture Feature Extrac-

tion Using Statistical Moments with Application to Object-based Vegetation Species

Classification" International Journal of Image and Data Fusion (Accepted, in press,

2011).

Peer-reviewed Conference Papers

[1] Z. Li, R. Hayward, J. Zhang, Y. Liu, "Individual tree crown delineation tech-

niques for vegetation management in power line corridor," Proceedings of the In-

ternational Conference on Digital Image Computing: Techniques and Applications

(DICTA), Canberra, 2008.

[2] Z. Li, Y. Liu, R. Hayward, J. Zhang, J. Cai, "Knowledge-based power line

10

detection for UAV surveillance and inspection systems," Proceedings of the 23rd

International Conference on Image and Vision Computing New Zealand (IVCNZ),

Christchurch, 2008.

[3] Z. Li, R. Hayward, J. Zhang, Y. Liu, R. Walker, "Towards automatic tree

crown detection and delineation in spectral feature space using PCNN and morpho-

logical reconstruction," Proceedings of the IEEE International Conference on Image

Processing (ICIP), Cairo, 2009.

[4] Y. Liu, Z. Li, R. Hayward, R. Walker, H. Jin, "Classification of airborne lidar

intensity data using statistical analysis and Hough transform with application to

power line corridors," Proceedings of the International Conference on Digital Image

Computing: Techniques and Applications (DICTA), Melbourne, 2009.

[5] H. Jin, Y. Feng, Z. Li, "Extraction of road lanes from high-resolution stereo

aerial imagery based on maximum likelihood segmentation and texture enhancement,

" Proceedings of the International Conference on Digital Image Computing: Tech-

niques and Applications (DICTA), Melbourne, 2009.

[6] Z. Li, R. Hayward, J. Zhang, H. Jin, R. Walker, "Evaluation of spectral

and texture features for object-based vegetation species classification using support

vector machines," International Archives of the Photogrammetry, Remote Sensing and

Spatial Information Sciences. vol. XXXVIII, Part 7A (ISPRS TC VII Symposium-

100 years ISPRS), Vienna, 2010

[7] Z. Li, Y. Liu, R. Hayward, R.Walker, "Empirical comparison of machine

learning algorithms for image texture classification with application to vegetation

management in power line corridors," International Archives of the Photogrammetry,

Remote Sensing and Spatial Information Sciences. vol. XXXVIII, Part 7A (ISPRS

TC VII Symposium-100 years ISPRS), Vienna, 2010

[8] Z. Li, Y. Liu, R. Hayward, R. Walker, "Color and texture feature fusion

using kernel PCA with application to object-based vegetation species classification,"

11

Proceedings of the IEEE International Conference on Image Processing (ICIP), Hong

Kong, 2010. (IEEE Signal Processing Society Travel Grant)

[9] Z. Li, R. Walker, R. Hayward, L. Mejias. "Advances in vegetation management

for power line corridor montoring using aerial remote sensing techniques" Proceedings

of the First International Conference on Applied Robotics for the Power Industry

(CARPI), Montreal, 2010.

1.5 Thesis Structure

This thesis is structured in the following manner:

Chapter 1 provides an overview of the background, objectives and outcome of

this research.

Chapter 2 briefly reviews the advantage and disadvantage of current sensors and

remote sensing platforms. Aerial and ground survey data collection trials for this

study and the characteristics of the collected data are also given in this chapter.

Chapter 3 explains the concepts in geographic object based image analysis and

pulse coupled neural networks. The technique details of the developed algorithms for

power line detection and individual tree crown segmentation are given in this chapter.

Chapter 4 discusses visual feature extraction and machine learning techniques

in image classification tasks. The idea of using pulse spectral frequency of PCNN as

spectral-texture feature descriptors is presented. The developed colour and texture

feature fusion based on kernel principal component analysis is also presented in this

chapter.

Chapter 5 presents and discusses the results of a series of experiments conducted

to evaluate the effectiveness of the developed algorithms.

Chapter 6 concludes this dissertation with (i) a summary of the lessons learnt

throughout the thesis and the novel contributions that were made in this study and

12

(ii) an outline of some new research directions which are considered important to

further advance the methods in this study and apply to real applications.

13

x

14

Chapter 2

Remote Sensing Data Collection

Remote sensing is an effective tool for land cover mapping in large areas. In the

past a few years, an increasing number of sensors and platforms become available for

people seeking to map land cover, forest structure and the change of earth’s surface.

The choices involved in the selection of a remote sensing data type are increasingly

complicated.

Associated with every research project is the collection of adequate data for veri-

fication and validation of research outcomes. As a part of the CRCSI project 6.07, a

review of the current remote sensing platforms and sensors is conducted in order to

suggest the best data capture solutions. Commercial data providers were employed

in the data collection and the access to such data is highly beneficial given that

the verification and validation will correspond to actual needs of the industry. The

application specific nature of this research project also requires real world practical

verification.

2.1 Remote Sensing Platforms

Traditional aerial survey in power line corridor monitoring employs helicopter patrols

to fly over the network, trying to identify dangerous trees and assess the condition of

15

overhead lines assets by visual observation. However, it is not possible to consistently

and accurately determine the distance between vegetation and powerlines by human

eye (Ituen et al., 2008). The aerial survey often needs to be supplemented with

ground patrols, which make it even more time consuming and labour intensive. A

better way of aerial survey is to map the corridor by collecting data using LiDAR

sensors and geo-referenced cameras. Afterwards, computer systems can be used in the

automated data processing and analysis to provide information for decision making.

Over the past decades, a number of ideas have come forth seeking to collect remote

sensing data in power line corridor mapping. These include improved data collection

using satellite sensors (Kobayashi et al., 2009; Beltrame et al., 2007), airborne laser

scanning systems (Lu & Kieloch, 2008; Clode & Rottensteiner, 2005), airborne stereo

vision systems (Sun et al., 2006), and unmanned aerial systems (UASs) (Jones et al.,

2005).

Several satellite platforms such as IKONOS and QuickBird can be used to collect

high spatial resolution earth observation data. The IKONOS satellite is the world’s

first commercial satellite to collect panchromatic images with one meter ground sam-

ple distance (GSD) and multi-spectral imagery with 4 meter GSD. QuickBird is the

most widely used high-resolution commercial earth observation satellite. This satel-

lite collects panchromatic imagery at 60-70 centimeter resolution and multi-spectral

imagery at 2.4- and 2.8-meter resolutions. However, it is noted that these satellite

platforms are not the best choice for monitoring power line corridors due to two crit-

ical limitations: the unfavorable revisit time and lack of choices in optimum spatial

and spectral resolutions. Satellite data are only available on predetermined sched-

ules, which is not flexible for the monitoring of power line corridors. Moreover, it is

not possible to capture very small objects (i.e. power lines) and detailed textures of

ground objects (i.e. individual tree crowns) using current satellite data. Airborne

data collection offers a much greater level of flexibility. Moreover, airborne systems

16

can be regularly upgraded as sensor technology advances, such as higher spectral and

spatial resolution, and advanced microwave or LiDAR sensors. A limitation which im-

pedes large-scale airborne remote sensing applications is that the traditional piloted

airborne platforms involve high operational costs. Moreover, using piloted aircraft

for power line inspection will place the operators at a greater level of risk.

Unmanned aerial systems (UASs) could fill this gap, providing a cheap and flexible

way to gather spatial data from power line corridors. Traditionally the use of UASs

has been limited to military applications. As these military systems grow in matu-

rity, a number of UAV systems with various onboard sensors have been developed

for civilian applications such as homeland security, forestry fire monitoring, quick

response measurements for emergency disaster, Earth science research, volcanic gas

sampling, humanitarian observations, and monitoring of gas pipelines (Zhou et al.,

2009). Recent development in the aerial vehicles themselves and associated sensing

system make UAS platforms increasingly attractive for both research and operational

mapping (Berni et al., 2009; Gurtner et al., 2009). One of the main limitations of

using UASs is their ability to carry power-demanding and heavy payloads. Airborne

laser scanning systems (LiDAR) are too heavy for small/medium sized UAS plat-

forms. This limitation may be overcome in the near future as smaller LiDAR systems

suitable for UASs become more readily available in the market.

2.2 Remotely Sensed Data

There are two kinds of remote sensing: passive remote sensing and active remote

sensing. Passive sensors detect natural radiation that is emitted or reflected by the

object or surrounding area being observed. Reflected sunlight is the most common

source of radiation measured by passive sensors. Optical remote sensing images such

as satellite and airborne multi-spectral imagery are collected from passive sensors.

17

Active collection, on the other hand, emits energy in order to scan objects and areas

whereupon a passive sensor then detects and measures the radiation that is reflected

or backscattered from the target. Light detection and ranging (LiDAR) is an example

of active remote sensing.

2.2.1 Optical Remote Sensing Imagery

The spatial resolution and spectral resolution are the two most important character-

istics of optical remote sensing imagery. Spatial resolution commonly referred to as

“pixel size” in digital images, has a close relationship with the information content

that can be extracted from the image. Remote sensing images are usually divided into

two categories: high-resolution and low-resolution. Imagery whose spatial resolution

is coarser than the object to be extracted is low-resolution. Low-resolution images are

widely used for land cover classification in a large scale. Detailed information such

as individual trees can not be extracted from low-resolution images because when a

forested area becomes the central scene, one pixel may represent multiple trees and

their surroundings. In contrast, high-resolution images will contain multiple pixels

for each object, which adds the variance of the image. Higher resolution is not al-

ways necessary. For instance, in an image with trees represented by multiple pixels,

each pixel may be shadowed or sunlit, young foliage or old, or may contain one of a

variety of different understory components. If a forest map is needed, rules should be

developed to incorporate all of these cover types into one. As a consequence, images

with a spatial resolution near the size of the object of interest are usually preferred

(Lefsky & Cohen, 2003).

Spectral resolution is the richness of spectral information in optical remote sens-

ing imagery. Different materials reflect and absorb differently at different wavelengths

(Figure 2.1 1). Spectral features are the specific combination of reflected and absorbed1source: www.crisp.nus.edu.sg

18

Figure 2.1: Optical passive remote sensing from satellite platforms

electromagnetic radiation at varying wavelengths which can uniquely identify an ob-

ject. Increasing the number of spectral bands would seem to be an obvious way to

improve prediction of forest attributes (Lefsky & Cohen, 2003). Multi-spectral sensors

are one strategy for obtaining spectral data. To efficiently record most variation, they

break the spectra into a few bands where there are significant differences between the

spectra of a wide range of materials. As technology improves and more data recorded,

hyper-spectral sensors are more commonly used. These sensors record the intensities

of many wavelengths of light being reflected from the same source. Hyper-spectral

sensors split the reflected light energy at the sensor into many separate, narrow chan-

nels on a pixel-by-pixel basis, and are developed with the assumption that improved

identification of particular spectral features will lead to improved discrimination of

cover attributes. Hyper-spectral imagery makes discernment of an area’s composition

through spectral response discrimination more effective (Aardt, 2000).

19

2.2.2 Airborne Laser Scanning data

Light Detection and Ranging (LiDAR) is a type of active remote sensing techniques

that measures the properties of scattered light to find range or other information

of a distant target. Airborne laser scanning is an application of LiDAR techniques

which capture and record the geometry and sometimes textural information of visible

surface of the ground objects. It is a relatively young 3D measurement technique

offering much potential in topographic and mapping operations to capture precise

and reliable 3D geodata. Traditionally, the primary use of LiDAR data is to obtain

altitude data and generate digital terrain models (DTM). In recent years, however,

the range of applications in which laser scanning can be used has greatly broadened.

With the advancement of sensor technology, the achievable resolution of point clouds

made it possible to map individual trees and power lines from airborne laser scanning

data.

An airborne laser scanning system has two main components: a laser scanner

which measures the distance to a spot on the ground illuminated by the laser and a

GPS/IMU combination to measure exactly the position and orientation of the system

(Beraldin et al., 2010). Figure 2.2 (Beraldin et al., 2010) illustrates the basic compo-

nents and principle of airborne laser scanning. The laser scanner, mounted over a hole

in the aircraft’s fuselage, continuously sends laser pulses towards the terrain as the

aircraft flies. GPS antenna and inertial measurement unit (IMU) are used to record

position and orientation of the system and the data also allows the reconstruction

of the flight path. There is also a control and data recording unit inside the system

which is responsible for time synchronization and the control of the whole system.

Modern laser scanners generate up to 300 000 laser pulses per second and produce

about 20 Gbyte of ranging data per hour (Beraldin et al., 2010). Depending on the

aircraft velocity and survey height, the densities of LiDAR point clouds vary between

0.2 and 50 points per square meter. Typically, commercial airborne laser scanning

20

Figure 2.2: The basic components and principle of airborne laser scanning

systems for land mapping applications operate at wavelengths between 800 and 1550

nm and a spectral width between 0.1 and 0.5 nm. Since the reflectivity of an object

depends on the wavelength, selection of a suitable wavelength should also be consid-

ered in LiDAR mapping applications. For example, water surfaces will rarely be seen

by laser scanners operating at visible part of the spectrum because water will absorb

most of the laser energy.

2.3 Data Collection in Power Line Corridors

Data collection for this study included gathering aerial data for sensor evaluation,

as well as conducting a ground survey for verification. Numerous flight tests were

conducted and a large amount of aerial data were collected. A field survey is also

conducted to evaluate the classification of individual tree species.

21

Figure 2.3: The UAS platforms and examples of the collected data

2.3.1 Aerial Data

The first series of flights were conducted to evaluate the capability of UAS in data

collection. Numerous flight tests were performed using two different UASs carrying

different sensor payloads. Two UAS platforms were used: V-TOL Aerospace BAT-3,

and the ARCAA UAS platform Eleanor. The sensors on board two UAS platforms

were colour digital cameras (Canon IXUS 960IS and Canon 350D DSLR). The flight

tests result in over 3 hours of video footage and over one thousand high resolution

images covering approximately 20km of typical power line. The spatial resolutions

of these images vary depending on different flight heights. Figure 2.3 shows the two

UAS platforms and examples of the collected image data.

The second series of flights were conducted to evaluate commercial data collected

from piloted aircrafts. As no single provider could capture the full range of sensor

22

data required for the experiment, two separate providers were contracted to collect

data over the test site.

(1) Provider of multi-spectral imagery

The first series of flights occurred on the 25th of November 2008 by a Queensland

based company. Contracted to collect multi-spectral data, their system consists of

a DuncanTech MS-4100 multi-spectral camera with DGPS/INS. This equipment is

mounted in the cargo area of a Piper Cub as pictured in Figure 2.4(a). Multi-spectral

data is captured over 4 spectral bands: NIR (800-966nm), red (670-840nm), green

(540-640nm), blue (460-545nm). Traveling at approximately 34m/s (65 knots) and an

altitude of 350m AGL, multi-spectral images were captured at approximately 15cm

GSD.

(2) Provider of LiDAR data

The second data provider was a also a local company supplying aerial mapping,

LiDAR and other GIS services to government and industry. Their system consisted

of an Integrated LiDAR and Digital Photography System mounted in the cargo area

of a modified Cessna U206G as shown in Figure 2.4(b). The flight for data collection

occurred on the 9th of December 2008, during which the aircraft was flown at ap-

proximately 55m/s (106kts) at an altitude of 500m AGL. LiDAR data was collected

at 200kHz, with a scan angle of ±30º with an average sample rate of 9 points per

square meter.

Figure 2.5 and Figure 2.6 show examples of the collected multi-spectral imagery

and LiDAR point cloud data.

2.3.2 Field Survey

A 1.5-km section of line spanning between the rural towns of Murgon and Wondai

in South East Queensland, Australia, was selected as the test site for ground survey.

Figure 2.7 shows a mosaic of the test area generated from the aerial images acquired

23

Figure 2.4: Data collections systems from commercial data providers

Figure 2.5: An example of the collected multi-spectral imagery

24

Figure 2.6: An example of the collected LiDAR point cloud data

from the trial, where white lines indicate power lines and dashed lines indicate those

outside of the test area.

The major task of field survey is to label the species of individual trees in the

collected images. The method we used for field survey including:

(1) Geo-locate the test images: approximately 1.5 kilometers corridor was selected

by considering the quality of images and the feasibility to access.

(2) Each tree has a unique ID and was located by recognizing distinct features

(e.g. roads, houses, and power poles) around it.

(3) Tree species were identified by an expert from Greening Australia.

(4) The field survey data were labeled on multi-spectral images and a vegetation

database was derived.

Figure 2.7 shows an example of the vegetation database. From our field survey,

a total of 353 trees were labeled with 12 species found in the test area. The species

were not evenly distributed. There were three dominant species in our test field: Eu-

calyptus tereticornis, Eucalyptus melanophloia, and Corymbia tesselaris. According

to the field survey, these three species account for over 80% of all the trees in the test

area.

25

Figure 2.7: Experiment Test Site

Figure 2.8: Vegetation Database

26

2.4 Summary

In this chapter, the selection of sensors and platforms was discussed. The basic

theory of optical remote sensing imagery and airborne laser scanning data were also

introduced. A number of flight trails and the collected aerial data as well as the field

survey were also given in this chapter.

27

x

28

Chapter 3

Object Detection and Segmentation

In object based image analysis, partitioning an image into meaningful image-objects

is a critical step. There are two types of objects are of special interest in this research:

trees and power lines. Therefore, the detection and segmentation of trees and power

lines from remote sensing imagery are two important tasks. A suitable methodology

for these tasks is draw inspiration from the processes that humans use to solve similar

problems, followed by a rigorous evaluation on real data. The approach to solve the

problem will utilize both a traditional top-down approach (using knowledge and log-

ical inference) and a bottom-up approach drawing on techniques from computational

intelligence. The algorithm design process involves the use of a biologically inspired

spiking neural network model as well as domain knowledge to solve the problem. In

this chapter, the related work of geographic object based image analysis (GEOBIA)

and pulse coupled neural network are firstly introduced. After that, the details of the

developed algorithms for power line detection and individual tree crown segmentation

are presented respectively.

29

3.1 Related Work

3.1.1 Geographic Object Based Image Analysis

A wide range of classification methods have been developed to derive land cover

information from remotely sensed images. Since remote sensing images consist of

rows and columns of pixels, conventional land-cover mapping has been based on a

per-pixel basis (Yan et al., 2006). Unfortunately, classification algorithms based on

single pixel analysis are often not capable of extracting the information we desire

from high spatial resolution images. For example, the spectral complexity of urban

land-cover materials results in specific limitations using per-pixel analysis for the

separation of human-made materials such as roads and roofs and natural materials

such as vegetation, soil, and water (Jensen, 2005). We need information about the

characteristics of a single pixel and also those of the surrounding pixels so that we

can identify areas (or segments) of pixels that are homogeneous.

Object-based approaches become popular in high spatial resolution remote sensing

image classification, which has proven to be an alternative to the pixel-based image

analysis and a large number of publications suggest that better results can be expected

(Liu et al., 2006b; Blaschke, 2010). It is noted that in the remote sensing and GIS

community, a new discipline of geographic object-based image analysis (GEOBIA)

has gained widespread interest, although a critical discussion has arisen concerning

whether or not geographic space should be included in the name of this concept

in order to discriminate from other disciplines like computer vision and biomedical

imaging, which also conduct object-based image analysis (OBIA) (Hay & Castilla,

2008). GEOBIA is the process of associating initial image-objects (segments) to

geographic-object classes, based on both internal features of the objects and their

mutual relationships. Ideally, there should be a one-to-one correspondence between

image-segments and meaningful image-objects. However, this is hard to achieve in

30

the real classification process. The strengths of GEOBIA (Hay & Castilla, 2008)

include: (1) Partitioning an image into objects which are similar to the way humans

conceptually organize the landscape to comprehend it; (2) Using image-objects as

basic units reduces computational cost of classifier; (3) Image-objects exhibit useful

features (e.g. shape, texture, contextual relations with other objects) that single

pixels lack; (4) Image-objects can be more readily integrated into a vector image

representation for a GIS rather than pixel-wise classified raster maps.

GEOBIA assumes that we can identify entities in remote sensing images that can

be related to real entities in the landscape. A first step in this kind of analysis is seg-

mentation, which partition images into separate image objects. Image-objects have

the potential to correspond to geographic-objects. When an image-object can be

seen as a proper representation of an instance of some type geographic-object, then

we can say it is a meaningful image-object. However, perfect segmentation is very

difficult to achieve in real situations. Image segmentation is the process of breaking

an image up into regions that have some meaning with respect to image content and

application (Watt & Polocarpo, 1998). During the past 40 years, there have been

many segmentation algorithms being proposed and applied to various applications

(Zhang, 2006; Brook et al., 2005). These methods can be roughly broken down into

four main categories: 1) thresholding techniques; 2) boundary-based techniques; 3)

region-based techniques; 4) Hybrid techniques. However, these segmentation algo-

rithms have only been proven to work well in specific applications, and it can not be

conclusively stated that segmentation problem has been solved. As a result, choosing

a particular image segmentation method for a specific domain is still a big problem.

The goodness of image segmentation algorithms has to be established on the basis

of expert consensus. There is a two-sided problem in image segmentation that depends

on human judgmental difference: over-segmentation and under-segmentation (Hay,

2008). Over-segmentation refers to a situation where, in the opinion of the perceiver,

31

Figure 3.1: Two segmentation levels overlaid a colour aerial image

the contrast between some adjacent segmentation is insufficient and should be merged

into a single image object. Under-segmentation refers to the existence of segments

that in the opinion of the perceivers lack coherency and should be split into separate

segments. Figure 3.1 illustrates the two segmentation levels overlaid a colour aerial

image (generated by eCognition 4.0 software). The grass field F has been under-

segmented in Figure 3.1(a) and over-segmented in Figure 3.1(b). In general, over-

segmentation is less serious a problem than under-segmentation, since a posteriori

merging of segments is easier than splitting them. Since there is no straightforward

relationship between similarity in the image and semantic similarity, it is preferable to

err on the side of over-segmentation and relax the external contrast requirement. In

short, a good segmentation is one that shows little over-segmentation and no under-

segmentation, and a good segmentation algorithm is one that enables the user to

derive a good segmentation without excessive fine tuning of input parameters.

32

3.1.2 Pulse Coupled Neural Networks

Traditional computer science approaches in machine vision systems have achieve lit-

tle success when compared with visual analysis capabilities of children or animals,

and much less when compared with trained image analysts (Shi et al., 2009). To

improve the performance of machine vision systems, numerous biologically inspired

models have been developed in the past few decades. A pulse coupled neural network

(PCNN) is a biologically inspired spiking cortical model based on the understanding

of visual cortical models of small mammals. Instead of using rate coding in tradi-

tional neural network models, spiking neuron networks use pulse coding. Neurons

receive and do send out individual pulses, allowing multiplexing of information as

frequency and amplitude of sound Vreeken (2003). Spiking neural networks raise

the level of biological realism by using individual spikes, which allows incorporating

spatial-temporal information in communication and computation, like real neurons

do. PCNN is one of the well-formulated biological inspired spiking cortical models

which can be adopted in computational systems. Before getting down to the details

of PCNN, let us briefly summarize its biological background: visual cortex theory.

The study of the visual cortex has greatly contributed to the current understanding

of mammalian and human visual pathways and their role in the visual perception.

Visual cortex refers to the primary visual cortex (V1) and other extrastriate visual

cortical areas such as V2, V3, V4, and V5. The primary visual cortex (V1) is located

in and around the calcarine fissure in the occipital lobe (Figure 3.2 Polyak, 1957).

The primary visual cortex is the best studied visual area in the brain and also the

simplest, earliest cortical visual area. It is highly specialized for processing informa-

tion about static and moving objects and is excellent in pattern recognition (Ng et al.,

2007). V2 is the the second major area in the visual cortex, which receives strong

feedforward connections from V1 and sends strong connections to V3, V4, and V5.

These cortical areas are interconnected with a high degree of regularity and precision

33

Figure 3.2: Visual cortex

(Zeki, 1993). This visual cortex is of enormous complexity and much is still to be

learned from how the visual cortex processes the information. However, an image

processing engine may never mimic the full functionality of the visual cortex system,

but only use a few of its basic features.

Biological models of the visual cortex depict each neuron as a coupled oscillator

with connections to other neurons (Lindblad & Kinser, 2005). In the past decades,

visual cortex theory has been intensively studied and may computational models have

been proposed. Eckhorn model (Eckhorn et al., 1989) is one of these computational

models which derived from analyzing the cat visual cortex and it is also the basis for

most PCNNs. An illustration of an Eckhorn-type neuron is shown in Figure 3.3. The

neuron contains two inputs: the feeding and the linking compartments. The linking

only receives local stimuli while the feeding receives an external stimuli as well as

internal stimulus. The feeding and the linking are combined together to create the

membrane voltage, which is then compared with a local threshold to generate the

output. Eckhorn model provides a simple and effective way for studding synchronous

pulse dynamics in the neuron networks and paved the way for the generation of

pulse coupled neural network. Afterwards, Johnson et al. carried on a number of

modifications and variations to tailor its performance as image processing algorithms

34

Figure 3.3: The Eckhorn-type neuron

(Johnson, 1994; Johnson & Padgett, 1999).

PCNN has the fundamental characteristic that each neuron has the ability to

capture neighboring neurons in similar states and thus provides a new biologically

inspired parallel algorithm for image processing applications. Over the last decade,

PCNNs have been utilized for a variety of image processing applications, including

image segmentation, edge detection, feature generation, noise reduction, face recogni-

tion, motion detection, and so on (Ranganath & Kuntimad, 1999; Waldemark et al.,

2000; Gu, 2008). Most PCNNs are based on the Eckhorn model sharing a common

mathematical foundation but with variations each having their own unique terms

(Wang et al., 2010). When applied to image processing, PCNN is a single layered,

two-dimensional, laterally connected neural network of pulse coupled neurons. Each

neuron corresponds to one pixel in an input image, receiving its corresponding pixel’s

color information (e.g. intensity) as an external stimulus. The neuron also connects

with its neighboring neurons, receiving local stimuli from them. Figure 3.4 shows

a typical structure of a standard PCNN. The input part imports external and local

inputs to the neuron by the feeding and linking part respectively. In the linking

part, external and local stimuli are combined in an internal activation system, which

accumulates the stimuli until it exceeds a dynamic threshold, and then the pulse

generator produces a pulse output. Through iterative computation, PCNN neurons

produce temporal series of pulse outputs. Similarities in the input pixels cause the

35

associated neurons to pulse synchronously, thus indicating similar image structures or

textures. These temporal series of pulse outputs contain information of input images

and can be utilized for various image processing applications.

This standard PCNN model is usually described by the following 5 coupled equa-

tions:

Fij(t) = Sij(t) + e−αF · Fij(t− 1) + VF · (M ∗ Y (t− 1))ij (3.1)

Lij(t) = e−αF · Lij(t− 1) + VL · (W ∗ Y (t− 1))ij (3.2)

Uij(t) = Fij(t) · (1 + β · Lij(t)) (3.3)

Yij(t) =

1

0

Uij(t) > �ij(t)

Otherwise

(3.4)

�ij(t) = e−α� ·�ij(t− 1) + V�Yij(t− 1) (3.5)

where t is the iteration step, Fij is the feeding input, Lij is the linking input, Sij

is the intensity of pixel (i, j) , W andM are the weight matrices, ∗ is the convolution

operator, Y is the output of neurons; U is the internal activity, β is the linking

strength; Θ indicates the dynamic threshold; αF , αL and αΘ are the feeding, linking

and threshold delay coefficients respectively; VF , VL and VT are the feeding, linking

and threshold magnitude scales respectively. The dynamic thresholds of all neurons

are zero at t < 1 .

In summary, pulse coupled neural networks are neural models proposed by mod-

eling the cat’s visual cortex and developed for high-performance biologically inspired

36

Figure 3.4: The structure of a PCNN neuron

image processing. PCNNs are spatial-temporal-coding models. PCNN neurons pro-

duce temporal series of pulse outputs through iterative computation. The temporal

series of pulse outputs contain important information of input images and can be

utilized for various image processing applications.

3.2 Power Line Detection

The Hough transform is an effective tool for detecting straight lines in images, thus

it is a natural choice for the task of automatic power line detection. In real appli-

cations of straight line detection, an edge detector is often used to remove irrelevant

information and reduce the computational cost prior to the Hough transform being

employed. However, the application of classic edge detectors to the aerial images has

demonstrated that they are sensitive to image noise, due to complex and irregular

ground coverage. In this research, we take advantage of the characteristics of power

37

lines in aerial image and propose a filter based on a simplified pulse coupled neural

network (PCNN) model. This filter can simultaneously remove the background noise

of power lines as well as generate edge maps. After that, an improved Hough trans-

form is used by performing knowledge-based line clustering in Hough space to refine

the detection results.

3.2.1 Characteristics of Power Lines

Automatic power line detection from aerial imagery is a rather challenging task,

especially when the background is cluttered. There has been very limited investigation

involved in developing algorithms for automatic extraction of power lines from aerial

images because power lines in traditional aerial images are too small to be detected

due to the flight height and resolution of the camera. Some work on the visual control

of an Unmanned Aerial Vehicle (UAV) for power line inspection has been simulated

using a laboratory test rig (Golightly & Jones, 2005). They proposed an automatic

power line detection method based on the Hough transform, but the approach was

just a simulation of straight line detection and not evaluated in real image data. More

recently, the Radon transform was used to extract line segments of the power lines,

followed by a grouping method to link each segment, and a Kalman filter was finally

applied to connect the segments into an entire line (Yan et al., 2007). Although some

properties of power lines in the aerial image were discussed, the algorithms developed

by Yan et al. only focus on straight line detection, image edges and other mistakable

linear features which are similar to power lines were not considered. Although straight

line detection is a common and well studied research area in machine vision, most

of the existing algorithms take bottom-up approaches which just use the intensity of

single pixels. However, the qualitative performance of these algorithms varies widely

across application domains as our notion of what constitutes a line can vary from one

application area to another. Due to the wide variation of line types encountered in

38

Figure 3.5: Power lines from different perspectives

the aerial images that are not of interest, we require a more top-down approach that

takes advantage of our understanding of line in this application area.

Based on our observation, power lines in aerial image have the following charac-

teristics:

(1) A power line has uniform brightness and the color looks different from upward

and downward view. Viewing from the ground power line is usually dark, whereas

viewing from the sky power line is brighter than the background simply because it is

made of specific metal and has larger light reflection.

(2) A power line approximates a straight line although power line sag often exists.

Due to the limited coverage area of a single image, the widths of power lines in the

image tend to be similar. In addition, the lengths of power lines in one image are

similar and power line is usually the longest line as it crosses the entire image.

(3) Power lines are approximately parallel to each other. Due to the forward

angle of imaging sensor and deviation from centre, power lines in the image are not

completely parallel. However, the intersection of two power lines usually occurs far

out of range of the image due to the limited size of images, and the intersecting angle

of two lines is usually very small. Figure 3.5 (a) illustrates the scenario from the above

view and Figure 3.5 (b) shows the case from the forward view and offset centre.

39

3.2.2 Design of Pulse Coupled Neural Filter

Given that power lines are made of special metal, they have different solar reflectance

compared to other background materials (e.g. grass, soil, and bitumen). This knowl-

edge can be used for preliminary detection of power lines from aerial images. Using a

filter to remove the irrelevant information will be helpful to reduce the false detection

rate as well as the computational cost of line detection algorithm. Threshold filtering

may be a practical solution. However, it is not robust because filtering by a threshold

is sensitive to image noise and different thresholds may be required due to changing

light conditions of the captured images. In this research, a pulse coupled neural filter

(PCNF) is developed for preliminary detection of power lines as well as edge maps

generation.

One of key problems of using PCNN is selecting the network parameters. The

relationships of network parameters and its performance in image analysis is still

not clear (Ma et al., 2005; Wang et al., 2010). There are so many parameters in

standard PCNN model that it is hard to select appropriate parameters for various

image analysis tasks. In addition, classic PCNN model involves high computation

cost because temporal dependence between iterations is explicitly used in the feeding,

linking and threshold updating components. In this research, a simplified model is

developed inheriting the characteristics of classic PCNN model and is described by

the following 5 equations:

Fij(t) = quantized− I (3.6)

Lij(t) =∑

k,l∈KWLkl × Ykl(t− 1) (3.7)

Uij(t) = Fij(t) · β · Lij(t) (3.8)

40

Figure 3.6: Linking weight matrix

Yij(t) =

1

0

Uij(t) > �ij(t)

Otherwise

(3.9)

�ij(t) =

�ij(t− 1)− step

VT ×�ij(t) if Yij(t− 1) 6= 0

(3.10)

The symbols in the above equations represent the same meanings as in the stan-

dard PCNN model discussed in the previous section. We simplified the feeding input

of the neuron to be just external stimulus from the corresponding pixels while the

stimuli from neighboring neurons were not considered. This simplified model still

keeps the characteristics of classic PCNN in that temporal dependence is implicitly

included as the neuron outputs in the linking part come from the previous iteration.

In this research, original RGB images are transformed to HIS color space and the in-

tensity component I is used as the feeding input. Moreover, the intensity component

is uniformly quantized to 64 levels in order to reduce the intensity variation in image

regions. This is helpful for filtering regions with similar intensities.

The linking input has also been simplified in that only 8 neighbors (i.e. 3 × 3

window) are adopted in the linking weight matrix WL. Each element in WL is the

reciprocal of Euclidean distance between this element and the centre of the window

41

(Figure 3.6). In this case, neighboring neurons with the closer distance have greater

impact on the central neuron. For the calculation of neuron internal status U , a new

linear modulation of feeding and linking input is used to avoid zero-valued pixel’s

influence to the internal status of its neighboring pixels. The linking strength β in

this research is set to be 0.2. The pulsed output of neuron Y is binary, and if the

neuron pulsed Y = 1, otherwise Y = 0. Initially is set to be a zero-valued matrix.

Whether a neuron can pulse or not depends on the comparison of its internal status

U with the dynamic threshold �. The threshold � is initialized to be larger than the

maximum value of external stimulus and gradually decays. The dynamic threshold �

is changing during the iteration operation to control neuron pulse. If the neuron has

been pulsed, a large threshold is given to this neuron by implying a magnitude scale

VT to make sure it will not pulse in a while. Otherwise the threshold of this neuron

will be decayed by subtracting a step value step.

Given that power lines have higher light reflectance and are usually brighter than

the background, they can be roughly detected from the temporal series of PCNN

pulsed outputs. In the early stage of the iteration, neurons correspond to power lines

pulsed because they have larger external stimulus than most of the background area.

Figure 3.7 shows an aerial image contain power lines and 7 temporal pulse outputs in

different iterations of PCNN. As is shown in Figure 3.7, in the first iteration of PCNN,

no neuron pulses because of the high initial threshold. With the progress of PCNN

iteration, neurons corresponding to power lines pulse earlier than other objects in

the image. From the temporal outputs of PCNN, different objects of interest can be

extracted because PCNN tends to group pixels with similar intensities and structures

and also considers spatial relationships among neurons. The temporal information

generated by PCNN is also useful for image segmentation and image noise location,

which is an advantage over other filters.

In this research, the following rules are used to locate noisy pixels and remove

42

Figure 3.7: Original image and 7 pulse outputs of PCNN

them (Zhang et al., 2005): if pixel (i, j) pulsed and most of its neighboring neurons

have not pulsed, which indicates that the intensity of this pixel is too large and can

be considered as a noisy pixel. Usually this type of noise pulse the earliest during

PCNN iteration. For dark noise, the same rule can be applied on the inverse image.

Once noisy pixels are located, a median filter is applied to change the intensities of

these noisy pixels.

Moreover, edges of the binary pulse outputs can also be detected by using the same

PCNN model. The width of edge can be determined by controlling the transmitting

distance of neuron pulses. In this research, the following algorithm is used to detect

edges in the binary filtered image:

43

Algorithm 1: Detect edges in binary image using PCNN

Input: binary image Bin

Output: one-pixel width edge set Edge

Step 1: initialize the pulse output Y to be the binary image and save it to Y 0:

Y 0 = Y = Bin

Step 2: calculate the linking input L using equation 3.7 with 3×3 linking weight

matrix

Step 3: calculate the neuron internal status U using equation 3.8

Step 4: calculate the output the each neuron Y using equation 3.9, with a

threshold larger than the minimum value of U : � = min(U) + 0.01; Y =

step(U −�)

Step 5: the edge map can be obtained by logical operation exclusive disjunction

(XOR) on Y 0 and Y : Edge = Y 0⊕ Y1

In summary, our proposed pulse coupled neural filter (PCNF) can be described

by Figure 3.8. The simplified PCNN is used to generate temporal pulse outputs

which contain important information for discriminating image noise, target object

(power line) and image background. However, there is no automatic method to

determine which output contain power lines and which just contain image noise.

According to our experiments, in most cases the output of the third PCNN iteration

is a safe choice because pixels corresponding to power lines pulsed and most of the

background pixels have not pulsed. After that, morphological filter is applied to the

binary pulse image for post-processing purpose which will make the detected object

more continuous. Finally the same PCNN model is used to generate the edge image

according to algorithm 1.

Figure 3.9 compares the results using Canny filter, Sobel filter and our proposed

pulse couple neural filter (PCNF) on synthetic images with and without noise. The

aim of the simulation is try to detect the three white lines in images and generate

44

Figure 3.8: The structure of pulse coupled neural filter

Figure 3.9: Comparison of Canny filter, Sobel filter and PCNF

the edge map. As is shown in the figure, the Sobel filter tries to detect all edges in

the image and is very sensitive to image noises. The Canny filter can be less sensitive

to image noise by tuning the parameter σ (the standard deviation of the Gaussian

filter). The proposed pulse coupled neural filter (PCNF) is more flexible because it

can be used to detect the edges of interest rather than detect all edges in the image.

Moreover, PCNF is more robust when image is contaminated with pepper and salt

noise (see the second row of Figure 3.9).

45

3.2.3 Knowledge-based Line Clustering in Hough Space

The Hough transform is used to detect parameterized shapes (e.g. lines, circles)

through mapping each point to a new parameter space in which the location and ori-

entation of certain shapes could be identified (Aggarwal & Karl, 2006). When applied

to detect straight lines in an image, the Hough transform usually parameterizes a line

in the Cartesian coordinate to a point in the Polar coordinate (Figure 3.10) based on

the point-line duality using the following equation:

x · cos(θ) + y · sin(θ) = ρ (3.11)

Alternatively, this parametrization maps collinear points into a set of intersecting

sinusoidal curves in the parameter space. The lines in the Cartesian coordinate can

be estimated by detecting points of intersections of these curves (i.e., peaks) in the

Polar coordinate (Agganval & Karl, 2000). These peaks in the parameter space can

be obtained using a voting mechanism. The Hough transform has been proven to be

effective method for line detection. However, it does have some limitations such as

high computational cost and mistaken detection of spurious lines. In order to solve

these problems, Fernandes and Oliveira proposed an improved Hough transform by

introducing a new voting scheme to avoid the brute-force approach of one pixel voting

for all potential lines (Fernandes & Oliveira, 2008). Instead, the approach operates

on clusters of approximately collinear pixels by using an oriented elliptical-Gaussian

kernel that models the uncertainty associated with the best-fitting line with respect

to the corresponding cluster. Figure 3.10 (a) and (b) show their voting procedures

and the 3D visualization of voting maps respectively. The letters A-H indicate the

clustered segments and the peaks that each segment voted for. In this research, we

extended this improved Hough transform for power line detection purpose.

The Hough transform is an effective tool to detect straight lines, but does not

46

Figure 3.10: Voting procedures and the 3D visualization of voting maps

47

intelligently identify power lines. Any linear objects will be detected, such as edge

of roads and rivers, fences, etc. Although using PCNF can significantly decrease the

influence of other linear edges, problem still exist especially when the linear object

has similar color with power lines. In order to discriminate power lines from other

linear objects, we use a k-means algorithm to cluster all detected lines to identify

the lines of interest. The objective of data acquisition in our project is to achieve a

low flying altitude where a typical 12mm transmission lines will be represented by at

lease two pixels. Therefore, each power line is detected as at least two Hough lines in

the edge image. Power lines are almost parallel with very similar angles, and a power

line is usually the longest line as it crosses the entire image, while other detected

lines do not have this regular property. Based on this idea, a clustering schema is

employed in the Hough transform voting procedure to group the parallel lines and

output the cluster with largest summation of votes as candidate powerlines (as shown

in Algorithm 2). Figure 3.10 (c) and (d) illustrate this clustering schema and show the

3D visualization of voting maps. Parallel lines are grouped together and the cluster

with largest summation of votes indicates that the dominate lines of the image are in

this cluster. Figure 3.11 shows an example of the power line detection results.

48

Figure 3.11: An example of power line detection results

Algorithm 2 Knowledge-based line clustering in the Hough space

Input: detected Hough line set Ls(ρ, θ, votes)i (i = 1, 2, · · · , n), where n is

the number of detected lines, ρ and θ are the coordinates of pixels in Hough

parameter space, votes is the accumulate number of votes of each detected

Hough line.

Output: candidate power lines CPLs

Step 1: calculate the line groups Cj(j = 1, 2, · · · , k) using K-means on θ values

of Lsi(i = 1, 2, · · ·n), where k is the number of line clusters (k = 4in this

research).

Step 2: calculate the summation of votes in each cluster SumV otesj =∑kj=1 Ls(votes)i

Step 3: find the cluster with largest Cm value of SumV otes, where

SumV otesm = max(SumV otesj) (j = 1, 2, · · · , k)

Step 4: output the lines in cluster Cm as candidate power lines CPLs = Cm

49

3.3 Individual Tree Crown Detection and Delineation

The application of object-based approaches to the problem of extracting vegetation

information from images requires accurate delineation of individual tree crowns. Once

individual trees are delineated, not only spectral but also spatial information within

the tree crowns can be used for classification. In this thesis, an automatic method is

developed to detect and delineate tree crowns from multi-spectral imagery. The devel-

oped method employs spectral features as input to a simplified Pulse Coupled Neural

Network (PCNN), followed by post-processing using morphological reconstruction.

3.3.1 Spectral Properties of Vegetation

In the past decades, a variety of approaches have been proposed for individual tree

crown delineation (Erikson & Olofsson, 2005; Culvenor, 2003). According to our

literature review, these approaches may be broadly categorized as either local max-

ima/minima, template matching, region growing, or edge detection approaches (Li

et al., 2008). Although many classic segmentation algorithms are applied in the RGB

colour space, their effectiveness in visually complex environments is somewhat lim-

ited. A classic example is the segmentation of trees with heavy shadows, as shown in

Figure 3.12. In this case, it is very hard to segment the tree crown accurately since

the tree crown and its shadow are very similar in both colour and texture.

One prospective improvement is through the use of spectral features outside the

visible spectrum. Remote sensing of vegetation has been successful thanks to the

unique spectral characteristics of green vegetation, which have low reflectance in red

and high reflectance in near-infrared (NIR) wavelengths. Spectral vegetation indices

have been used intensively to estimate the vegetation density and plant biophysical

parameters from satellite images for many years. However, little work has been done

on single tree delineation utilizing this important knowledge. With the advent of low-

50

Figure 3.12: Example of tree crown and shadow in a RGB image

price multi-spectral cameras, more attention can be paid on applying this knowledge

to vision based plant information extraction.

Sensors collect and store data about the spectral reflectance of natural features

and objects. Different types of land covers can be identified by using the spectral fea-

tures of certain surface materials. The dominant method for interpreting vegetation

biophysical properties from optical remote sensing data is through spectral vegeta-

tion indices. Vegetation indices are combinations of reflectance measured in two or

more spectral bands, which aim at estimating canopy biophysical properties through

enhancing the spectral contribution of vegetation, while minimizing the contribution

of underlying soil or understory vegetation (Rautiainen, 2005). Plants have a distinc-

tive spectral signature, characterized by a low reflectance in the visible part of the

solar spectrum, and a high reflectance in the near-infrared region due to Chlorophylls

absorbing a large amount of radiation in the red band. The lack of absorption in

the adjacent near-infrared region results in a strong absorption contrast between NIR

and the red band (Myneni et al., 1995). Therefore, the band ratio of NIR and red is

used as a spectral feature to discriminate plant from the background.

51

3.3.2 Initial Tree Crown Segmentation

The developed tree crown segmentation algorithm employs a simplified PCNN that

uses spectral features as input, post-processed using morphological reconstruction.

PCNN itself has been successfully applied to many image segmentation problems

(Kuntimad & Ranganath, 1999; Stewart et al., 2002). A PCNN is powerful in ex-

tracting the fundamentals of an image (e.g. edges, textures, and segments) and thus

it is used to segment tree crowns from imagery. The basic idea of image segmenta-

tion using PCNN is that groups of neurons (pixels) in a similar state tend to pulse

synchronously. The PCNN model used for this task is expressed as follows.

Fij(t) =ρNIRρred

(3.12)

Lij(t) =∑

k,l∈KWLkl × Ykl(t− 1) (3.13)

Uij(t) = Fij(t)× (1 + β · Lij(t)) (3.14)

Yij(t) =

1

0

Uij(t) > �ij(t)

Otherwise

(3.15)

�ij(t) =

�ij(t− 1)− step

VT ×�ij(t) if Yij(t− 1) 6= 0

(3.16)

The symbols in the above equations represent the same meanings as in the PCNN

model discussed in the previous section. The only difference is the feeding part,

where the external stimulus of the neuron is calculated from the feature space using

spectral band ratio. ρNIR and ρred are the spectral reflectance of NIR and red band

52

respectively.

Due to the strong absorption contrast between NIR and red band in trees, the

corresponding neurons have greater external stimuli and thus pulse more frequently.

By accumulating pulse outputs, a threshold can be applied to segment tree foliage

from background pixels. It should be noted that this ratio takes the form of the

vegetation index, Ratio Vegetation Index (RVI) (Jordan, 1969). Although other veg-

etation indexes could have been used, including the well known Normalized Difference

Vegetation Index (NDVI) (Rouse et al., 1973), RVI was found to better suit the prob-

lem, maximizing the contrast between vegetation and non-vegetation. An example of

this is shown in Figure 3.13, where NDVI and RVI outputs are shown for the same

multi-spectral image.

Figure 3.14 (b) shows an example of the initial tree crown segmentation results

by using PCNN in the spectral feature space. As we can see from the result, trees are

successfully detected but noise and discontinuities still exist requiring post process-

ing to delineate the whole crown. A basic morphological opening operator is applied

to remove noise in the binary image, and morphological reconstruction is used for

hole-filling (Figure 3.14 (c)). In this research, the ‘hole’ in binary image is defined

to be an area of dark pixels surrounded by lighter pixels. It looks like “valley” point

(minimum value), because it is located inside of the objects and is not connected

to the boundary of the enclosing object, whereas the surrounding points are treated

as “peak” points (maximum value) which indicate the locations of objects. The re-

construction starts from these peak points and spreads out over the whole image to

remove these regional valley points based on the connectivity of pixels. Morphologi-

cal reconstruction is a morphological transformation which includes two images and

a structure element (Soille, 2003). One image is called “marker image” which are the

starting points of morphological processing, and the other image, the “mask image”,

constrains the transformation procedure. The structure element defines the direction

53

Figure 3.13: Comparison of two vegetation indexes

in which the reconstruction progresses and neighborhood directions determine the

number of objects and their boundaries. If the size of the neighborhood is too big,

more unrelated objects would be connected, and if the size of the neighborhood is too

small the related objects will remain separated. In the experiments, four-connected

background neighbours are used which worked well. The marker image fm is set

to the maximum pixel value except on the border where the original image value is

kept as shown in Equation 3.17, and the original image is used as the mask f . In

the equation, fp is the value of pixel p, vmax indicates the maximum value in the

four-connected neighborhoods of pixel p.

fm(p) =

f(p)

vmax

p is on the border

otherwise

(3.17)

54

Figure 3.14: Tree crown segmentation from CIR image

55

3.3.3 Decomposition of Tree Clusters Using Watershed Algo-

rithm

The major problem for the initial tree crown segmentation algorithm is that under-

segmentation of tree clusters occasionally happen. It is often not capable to segment

individual trees, especially when trees grows in a cluster are closely contacted to each

other. Figure 3.15 compares the initial segmentation result with the ground truth

(obtained by manual segmentation). Although the segmentation seems satisfactory

from visual assessment, two adjacent trees has not been separated and one small tree

with sparse crown has not been detected in the dashed rectangular area. In order to

improve the decomposition of tree clusters, a watershed-based algorithm is used after

the initial tree crown segmentation algorithm.

The watershed algorithm has proven to be powerful and fast in image segmen-

tation (Bleau & Leon, 2000). It is based on the topographic representation of a

gray-scale image: light and dark spots are seen as hills and hollows in a landscape.

The watershed algorithm developed by Fernand Meyer (Meyer, 1994) is used in this

research. This algorithm is based on the Immersion Approach. The immersion pro-

cess is describes as below: The most intuitive way to explain watershed segmentation

is the Immersion Approach: imagine the surface being immersed in water, with holes

pierced in local minima; water fills up hollows starting at these holes, and dams are

built to prevent the merging of different catchment due to further immersion; this

immersion process will eventually reach a stage when only the boundaries of dam

(the watershed lines) is visible. The objective of watershed segmentation is to find

all the watershed lines. In principal, watershed segmentation depends on ridges to

perform a proper segmentation. In order to apply the watershed algorithm in binary

image, we use a distance transform to convert the binary image to gray-scale image.

The distance from object pixels (i.e. tree crowns, represented as white regions in

the binary image) to the nearest background pixel (zero-valued pixel). A City Block

56

Figure 3.15: Comparison of initial tree crown segmentation and the ground truth

57

Figure 3.16: Tree cluster decomposition using watershed algorithm

distance function is used in this research, in which the distance of pixel (xi, yi) and

(xj, yj) is defined as

d = |xi − xj|+ |yi − yj| (3.18)

Figure 3.16 shows an example of the watershed-based tree cluster decomposition

using colour infrared (CIR) imagery.

58

3.4 Combining LiDAR Data and Multi-spectral Im-

agery for Improved Tree Crown Segmentation

The developed tree crown segmentation algorithm for multi-spectral imagery success-

fully discriminates vegetation and non-vegetation. However, the algorithm fails to

discriminate grass, shrub and trees because they are often very similar in both colour

and texture. The discrimination is even more difficult especially when they grow close

to each other. The 3D nature of LiDAR data makes it especially suited in this situ-

ation. Airborne laser scanning has been successfully applied in single tree detection

and modeling in a number of (Brandtberga et al., 2003; Koch et al., 2006). However,

the success and quality of the results depend on the point density of LiDAR data as

well as the size, shape and distribution of trees. Multi-spectral imagery has comple-

mentary information with LiDAR data because both spectral and texture information

are provided. Therefore, combining LiDAR data and multi-spectral imagery has great

potential to improve individual tree crown detection and delineation. In this study,

a ground filtering algorithm is used to separate terrain and object points and then a

region level fusion method is used to improve individual tree crown segmentation.

3.4.1 Ground Filtering Using Statistical Analysis

LIDAR systems provide both elevation and intensity data records for each laser return.

In theory, LIDAR intensity is defined as the ratio of the strength of reflected light to

that of emitted light (Chust et al., 2008). Studies on LIDAR have primarily focused on

geometric information rather than radiometric information while the characteristics

of laser intensity are not well understood (Yoon et al., 2008). However, intensity

information is very useful in the classification of LIDAR point cloud data as different

materials often have quite different reflectance. In this study, we take advantage of

the LIDAR intensity and try to classify points of ground, non-ground points.

59

In raw LIDAR point cloud data, both bare-ground and non-ground objects gen-

erate backscatter. Ground points need to be identified and eliminated to accurately

identify non-ground objects, such as power lines and trees. In this case, an effective

method for automatic segmentation of ground and object points is a critical process.

Currently, most ground filtering methods are based on the assumption that natural

terrain variations are gradual, rather than abrupt. Therefore, LIDAR elevation data

is often used to calculate elevation differences and slopes based on pixels within a

roving, two-dimensional window or along a scan line in a specified direction (Meng

et al., 2009). However, intensity data is seldom used for point classification. In this

research, an algorithm based on a statistical analysis of the data is developed to seg-

ment object points from ground points. After that, the Hough transform is employed

to discriminate power lines from other objects (i.e. vegetation).

According to the probability theory, probability distributions can be uniquely

characterized by its moments (Stricker & Orengo, 1995). If we interpret ground and

objects as distinctly different probability distributions, moments can be used to char-

acterize them. However, it is not an easy task to perform a rigorous statistical test to

discriminate between them. Fortunately, we do not need to know the exact probabil-

ity distributions that represent ground and objects. A function that yields a relative

difference suffices for separation of ground and object points. The central limit the-

orem states that the naturally measured samples will follow the normal distribution.

Based on this idea, Bartels and Wei assumed that ground points of LIDAR data

follow the normal distribution while other object points may disturb the distribution

(Bartels et al., 2006). They use moments to describe the distribution of the point

cloud data and separate ground and object points. In this research, we adopt and

extend this idea for identifying the objects of interest in power line corridors. The

normal distribution is uniquely characterized by its first two moments (mean and

variance) (Najim, 2004). However, according to our assumption that objects will not

60

follow the normal distribution, mean and variance may not be representative mea-

sures. Therefore, higher order moments are used to characterize the distribution of

object points. Skewness is the third moment about the mean. Its probability distri-

bution characterizes the degree of asymmetry of the distribution around its mean and

is defined as below. A skewness of zero (sk = 0) is indicative of a normal distribution,

that is, skewness of the symmetric is zero. Negative skewness indicates dominance of

valleys in the data, while positive skewness indicates dominance of peaks.

sk =

(1

N × δ3×

N∑i=1

(si − µ)3

) 13

(3.19)

where N is the total number of the point cloud; si is the value (e.g., height,

intensity) of the point; µ and δ are the arithmetic mean and standard variance of all

points, and they are defined as:

µ =1

N

N∑i=1

si (3.20)

δ =

√√√√ 1

N

N∑i=1

(si − µ)2 (3.21)

Kurtosis is the forth moment about the mean. This characteristic of the proba-

bility distribution measures the relative flatness or peakness of the distribution about

its mean. The normal distribution has a kurtosis equals to 3. Kurtosis larger than

three indicates the peakness of the distribution, while smaller than three indicates

the flatness of the distribution.

ku =

(1

N × δ4×

N∑i=1

(si − µ)4

) 14

(3.22)

We assume that skewness and kurtosis can describe the characteristics of the

distribution of LIDAR object points and employ these two measures to identify critical

61

Figure 3.17: Change sequence of skewness and kurtosis

values that separate ground from non-ground points. The change between ground

points and object points is observed through the change sequences of skewness and

kurtosis as shown in Figure 3.17, which is generated as follows: first, skewness and

kurtosis of the point cloud data are calculated, and then points with largest value of

height or intensity are removed. Next, skewness and kurtosis of the remaining points

are calculated. This procedure iterates until the data is exhausted. Bartels and Wei

only used skewness measures and assumed that the points are objects if skewness is

larger than 0 (Bartels et al., 2006). However, the LIDAR data is not balanced, and

different scenes will have different skewness and kurtosis sequences (Bao et al., 2008).

The biggest inflection of the sequence is viewed as the position at which ground and

object points are separated from each other. For example, location A of skewness

curve and location B of kurtosis curve indicate the separating points in Figure3.17.

Location of A and B is the separating line between ground and object points.

Object points are on the right side and the ground points are on the left side. A

62

and B represent the number of points which are removed to calculate skewness and

kurtosis. The points on the left of A and B are ground points, whereas the points on

the right are object points. Thus, we can calculate the number of object points based

on the total point cloud number and the location A and B as following equation.

nObjectPnts = nTotalPnts− nAB (3.23)

In this research, skewness and kurtosis are combined together as a measure to

decide the threshold for discriminating ground and object points from LIDAR data.

The algorithm is described as follows:

Algorithm 3 Separating ground and object points using statistical analysis

Input: LiDAR raw point cloud data

Output: Object points ObjectPnts

Step 1: Load the raw point cloud data and remove any noisy points having

much larger intensity and height values than the surrounding points

Step 2: Calculate skewness and kurtosis change sequence on intensity values as

described above, let them be skList and kuList

Step 3: Find the last local maxima location of skList and kuList, let them be

A and B respectively

Step 4: Calculate the number of object points based on skewness and kurtosis

sequence following equation 3.23, let them be nA and nB respectively

Step 5: Sort skList and kuList ascendingly, let sorting results be skListSort

and kuListSort respectively

Step 6: Select the first nA points in skListSort as object points skObjectPnts

Step 7: Select the first nB points in kuListSort as object points kuObjectPnts

Step 8: Calculate the intersection of skObjectPnts and kuObjectPnts as final

object points, ObjectPnts = skObjectPnts ∩ kuObjectPnts

63

3.4.2 Region-level Fusion of LiDAR and Georeferenced Multi-

spectral Imagery

The first step towards fusing LiDAR and multi-spectral imagery is referencing. This

is also know as sensor alignment or registration which establishes a common reference

frame for different sensor data. If the two sensors are mounted on the same aerial

platform then the navigation system (GPS/IMU) provides position and attitude data

for the aerial camera and the LiDAR system. Since the GPS/IMU units and the two

sensors are physically separated, the success of direct orientation relies on how well the

relative position and attitude of the various system components can be determined.

The data used has already been georeferenced by the commercial data provider, which

make it simple in this application.

Assuming that the georeference accuracy is good enough, LiDAR data can be

considered as an additional image layer of the multi-spectral imagery. After ground

filtering, object points are obtained to refine the tree crown segmentation. The fusion

process is described in Figure 3.18. First, a pair of LiDAR point cloud data and

georeferenced multi-spectral imagery are processed separately. On one side, an initial

segmentation is conducted in spectral feature space using the algorithm described

in the previous section. After that, regions in the initial vegetation segmentation

map are labeled for the following fusion process. On the other side, a ground filtering

algorithm using statistical analysis is conducted to separate terrain and object points.

Then the object points are gridded to create a 2.5D depth image by putting a uniform

fix-sized grid over the points. The lowest Z coordinate (elevation) is kept if multiple

points drop to one grid and the value of a grid (i.e. pixel) is set to be zero if no

points drop to that grid. The 2.5D depth image is obtained by a gridding space of

15 centimeters , so each pixel of the depth image corresponds the same size with

the multi-spectral image. The 2.5D depth image is then integrated with the labeled

vegetation segmentation map. A simple thresholding process is used in order to

64

Figure 3.18: Framework of LiDAR and georeferenced multi-spectral imagery fusionfor individual tree crown segmentation

remove grass and low vegetation. The region mean height histogram is calculated to

visualize the height difference among regions. It is observed that the mean height of

a region which contains grass and low vegetation points is much lower than a region

which contains only trees. Finally, a watershed-based segmentation is employed to

the further decompose the tree clusters to individual trees.

3.5 Summary

In this chapter, an overview of geographic object based image analysis is first intro-

duced. After that, the basic theory of pulse coupled neural networks is presented and

followed by the technique details of the developed algorithms for power line detection

and individual tree crown segmentation from aerial imagery. A method of combin-

ing LiDAR data and georeferenced multi-spectral imagery for improving tree crown

segmentation is also presented.

65

x

66

Chapter 4

Visual Feature Extraction and Data

Classification

Classification based on statistical machine learning techniques is one of the most

often used methods of information extraction from remotely sensed images. In this

chapter, a brief review of techniques in the area of remote sensing image classification

is conducted with special attention on visual features that may applied to object-

based tree species classification. Novel methods in spectral-texture feature extraction

and classification is developed and applied to object-based tree species classification

using multi-spectral imagery.

4.1 Related Work

4.1.1 Major Steps for Thematic Information Extraction From

Imagery

The major steps of image classification for thematic information extraction may in-

clude definition of the classification problem, selection of remotely sensed data and

training samples, visual feature extraction and selection of appropriate classification

67

approaches, and accuracy assessment (Jenson, 2005; Lu & Weng, 2007).

(1) Definition of the classification problem

Stating the nature of the classification problem is a prerequisite for any successful

remote sensing image classification system. The analyst first specifies the geographic

region of interest on which to test hypotheses. The classes of interest to be examined

are then carefully defined in a classification scheme.

(2) Selection of remotely sensed data and training samples

Understanding the strengths and weaknesses of different types of sensor data is

essential for the selection of remotely sensed data. User’s need, the scale and char-

acteristics of the study area, the availability of various image data and their char-

acteristics (spatial and spectral resolution). A sufficient number of training samples

and their representativeness are critical for image classifications. Training samples

are usually collected from fieldwork, high resolution images or personal experience.

(3) Extraction and selection of visual features

Selecting suitable variables is a critical step for successfully implementing an image

classification. Many potential variables mat be used in image classification, such as

spectral features, object shapes or textures, contextual information, multi-temporal

images and multi-sensor images. The use of too many variables in a classification

procedure may decrease classification accuracy. Therefore, it is important to select

only the most useful variables for classification.

(4) Design of classification algorithms

Various classification algorithms may be used to assign an unknown pixel to one

of more possible classes. The choice of particular classifier or decision rule depends

on the nature of input data and the desired output. Different classification results

may be obtained depending on the classifiers selected.

(5) Evaluation of classification performance

Evaluation of classification results is an important process in the classification

68

Figure 4.1: Tree crown shapes from triple views

procedure. Different approaches may be employed, ranging from a qualitative eval-

uation based on expert knowledge to a quantitative accuracy assessment based on

sampling strategies. In the process of accuracy assessment, it is commonly assumed

that the difference between an image classification and the reference data is due to

the classification error.

4.1.2 Visual Feature Extraction

The use of appropriate features to characterize an output class or object is funda-

mental for any classification problem. Generally, image features can be categorized to

spectral features, shape features and texture features. Shape features are very signif-

icant features which are very close to human perception (Loncaric, 1998). However,

due to the limitation of image segmentation and view angle variations, trees present

different shapes (Figure 4.1). Therefore, this section only reviews spectral and texture

features which may be used for tree species classification.

• Spectral Features

69

Spectral features, known as colour features in the visual spectrum, have been widely

used for many image analysis applications. In visual spectrum, colour is represented

by colour space, which is an abstract mathematical model describing colours, typically

as three or four values or color components (e.g. RGB and CMYK are color models).

In computer vision, where HSV stands for hue, saturation, and value(or intensity),

and is also often called HSB (B for brightness). HSV is a cylindrical-coordinate rep-

resentations of points in an RGB color model, which rearrange the geometry of RGB

in an attempt to be more perceptually relevant than the Cartesian representation. In

RGB colour space, an object’s colour in a digital image are all correlated with the

amount of light hitting the object, and therefore with each-other, image descriptions

in terms of RGB components make object discrimination difficult. Descriptions in

terms of hue, saturation and value are often more relevant. They can be thought of as

similar as the neural processing used by human color vision, but there is no particular

reason to strictly mimic human color response (Schwarz et al., 1987).

In remote sensing, spectral features usually indicate the spectral reflectance of

different land covers. Sensors collect and store data of the spectral reflectance of

natural features and objects. This radiation can be quantified on an electromag-

netic spectrum. The electromagnetic spectrum is a continuum of electromagnetic

energy arranged according to its frequency and wavelength (Figure 4.2 1). By using

the spectral features of certain surface materials, many land covers can be identi-

fied. The dominant method for interpreting vegetation biophysical properties from

optical remote sensing data is through spectral vegetation indices. Vegetation in-

dices are combinations of reflectance measured in two or more spectral bands and

used to retrieve various biophysical variables, which aim at estimating canopy bio-

physical properties through enhancing the spectral contribution of vegetation while

minimizing the contribution of underlying soil or understory vegetation (Rautiainen,1Source: www.csc.noaa.gov

70

Figure 4.2: Electromagnetic spectrum

2005). Vegetation indices are popular because they require very little expertise of the

physical principles of remote sensing or modeling and are computationally efficient.

Normalized Difference Vegetation Index (NDVI) (Rouse et al., 1973) was one of the

most successful of many attempts to simply and quickly identify vegetated areas and

their condition and it remains the most well-known and used index to detect live green

tree canopies in multi-spectral remote sensing data. Besides NDVI, the Soil Adjusted

Vegetation Index (SAVI) (Huete, 1988), and the 2-band Enhanced Vegetation Index

(EVI2) (Jiang et al., 2008)are also evaluated in this research. There are defined as:

NDV I =ρNIR − ρredρNIR + ρred

(4.1)

SAV I =ρNIR − ρred

ρNIR + ρred + L(1 + L) (4.2)

EV I2 = 2.5

(ρNIR − ρred

ρNIR + 2.4ρred + 1

)(4.3)

where ρNIR and ρred are the spectral reflectance of near-infrared and red band

respectively. L in SAV I is the soil adjustment factor, which is in the range [0, 1] and

typically set to 0.5.

71

• Texture Features

Texture contains important information in image classification, as it represents the

content of many real-world images. Textures are characteristic intensity (or colour)

variations that typically originate from roughness of object surfaces (Davies, 2009).

Texture has always been a primary visual cue for defining area and relates to the

visual perception of coarseness or smoothness of image features. When defined in a

quantitative sense, texture is the property that relates to the nature of the variability

of pixel values (Coburn & Roberts, 2004). Image texture analysis is particularly

recommended for classification of digital images where the objects in the image (e.g.,

trees) are larger than pixel size, which is just the case in high resolution aerial images.

There are many different methods used to model texture features from images.

These approaches can be categorized as statistical, structural, signal processing based

and model-based features (Xie & Mirmehdi, 2009). Statistical texture features mea-

sure the spatial distribution of pixel values. Numerous statistical texture features have

been proposed, including the most well know grey-level co-occurrence matrix (GLCM)

(Haralick et al., 1973) and local binary patterns (LBP) (Ojala et al., 2002). Structural

approaches to texture analysis are based on the theory that textures are composed

of repeating elements called primitives. Structural method is limited because of the

amount of information that is required to adequately characterize texture. Signal

processing based texture features are commonly extracted by applying filter banks

to the image and computing the energy of the filter response. These features can be

derived from the spatial domain, the frequency domain and the joint spatial/spatial-

frequency domain. One of the well-know signal processing based texture features

is the Gabor filters, which have been successfully used in many texture analysis

applications (Turner, 1986; Manjunath & Ma, 1996). Model based texture feature

extraction approaches attempt to find stochastic processes that are able to model

texture. Model-based methods include, among many others, fractal models (Man-

72

delbrot, 1983), random field models (Li, 2009) and auto-regressive models (Comer &

Delp, 1999). These techniques have had success in analyzing micro-textures but they

are not useful when little is known about the texture, or more than one texture exists.

Gabor filters are used to model the spatial summation properties of simple cells

in the visual cortex and have been adapted and popularly used in texture analysis.

Gabor filters are considered as a set of orientation, scale tunable edge and line detec-

tors, so the statistics of these features has been successfully applied to characterize

texture information (Manjunath & Ma, 1996; Clausi & Deng, 2005).Gabor filters are

used in this research as they provide both a local and global description for texture

information. Gabor filters can be categorized into two components: a real part as the

symmetric component and an imaginary part as the asymmetric component. The 2D

Gabor function can be mathematically formulated as:

gm(x, y) = a−mg(x′, y′) a > 1 (4.4)

g(x, y) =

(1

2πδxδyexp

[−1

2

(x2

δ2x

+y2

δ2y

)+ 2πju0x

])(4.5)

where x′ = a−m(x·cosθ+y·sinθ), y′ = a−m(−x·cosθ+y·sinθ), θ = 2πK,K is the number

of orientation, and a−mis the scale factor, δx and δy define the Gaussian envelope along

with the x and y directions respectively, u0 denotes the radial frequency of the Gabor

function, and j =√−1. Figure 4.3 shows the filter expressed in intensity levels for

four different orientations.

Local binary patterns (LBP) was first proposed by Ojala et al. to encode the

pixel-wise information in the texture images (Ojala et al., 2002). The LBP method

attempts to decompose the texture into small texture units and the texture features

are defined by the distribution (histogram) of the LBP code calculated for each pixel

in the region under analysis. The LBP value for the centre pixel is calculated using

73

Figure 4.3: Gabor filter response to θ = 0◦, 60◦, 120◦, 180◦

the following equation:

LBPP,R =P−1∑i=0

u(ti − tc) · 2i (4.6)

where P is the total number of neighbouring pixels, R is the radius used to form

circularly symmetric set of neighbours. Figure 4.4 gives an example of binary code

in a neighbourhood which generates 28 possible standard texture units. The binary

labels of the neighbouring pixels is obtained by applying a simple threshold operation

with respect to the centre pixel tc. u(ti−tc) represents a step function, where u(x) = 1

when x > 0; else, u(x) = 0.

Although LBP has proven to be a powerful texture descriptor, a number of exten-

sions have been proposed to improve or supplement the classic LBP operators. We

also evaluated several extensions to the conventional LBP operator including: uni-

74

Figure 4.4: Example of binary code calculation in a neighbourhood

form LBP, rotation-invariant LBP, and dominant LBP (DLBP) Ojala et al. (2002);

Liao et al. (2009). The uniform LBP is used to represent the most important mi-

crostructures, which contain at most two bitwise (0 to 1 or 1 to 0) transitions. The

rotation-variant LBP is produced by circularly rotating the original LBP code until

its minimum value is attained, making LBP code invariant with respect to rotation of

the image domain. DLBP only considers the most frequently occurred patterns, and

try to avoid the information loss caused by just considering the uniform LBP and the

unreliability by considering all possible patterns.

4.2 Spectral-Texture Feature Extraction Using PCNN

The use of appropriate features to represent an output class or object is critical for all

classification problems. As discussed in the previous section, due to the perspective

view and the spatial resolution of aerial photographs, the sizes and shapes of tree

crowns can look quite different. This motivates the use of appropriate features to

represent image structures which are invariant to rotation and scale changes. Rota-

tion invariant features have been investigated in image texture classification for a long

period, with many of them generated from filtered images or by converting rotation

variant features to rotation invariant features using a circular neighbor set (Ojala

75

et al., 2002). The human eye is remarkable in its ability to interpret colour-textured

objects and there are a number of models developed for image feature analysis based

on biological models of the visual cortex (Bhatt et al., 2007; Zhan et al., 2009). To

the best of our knowledge, there has been little research to validate the capabilities of

biologically inspired feature extraction mechanisms in remote sensing image classifica-

tion problems. In this research, a biologically inspired object descriptor is developed

to represent the spectral-texture patterns of image-objects. The feature descriptor is

derived from the pulse spectral frequencies (PSF) of a pulse coupled neural network

(PCNN), which is invariant to rotation, translation and small scale changes.

4.2.1 Multi-spectral Unit-linking PCNN

In this thesis, a simplified spiking cortical model called a unit-linking PCNN (Gu,

2008) is employed to generate image features. We introduce multi-spectral channels

to this PCNN model. Compared with original unit-linking PCNN, the advantage

of this model is that it has more external inputs so that both spectral and spatial

information are considered in the derived features. Figure 4.5 illustrates the structure

of this multi-spectral PCNN model. Each neuron corresponds to one pixel in an input

image, receiving its corresponding pixel’s information as an external stimulus. Each

neuron is coupled with its 3 × 3 neighboring neurons, receiving local stimuli (i.e.

the outputs) from them. The major reason of using a unit-linking PCNN is that it is

easy to analyse the pulse dynamics and control the performance, due to the simplified

feeding part and unified linking part.

The model can be mathematically represented as:

Fmij (t) = Smij (t) (4.7)

76

(a) Each neuron is coupled with its 3× 3 neighboring neurons, receiving localstimuli (i.e. pulse outputs) from its neighboring neurons and also the externalstimuli from the corresponding pixel values.

(b) local stimuli and external stimuli are modulated and input to the pulsegenerator. The neuron pulses if the modulated input is larger than a dynamicthreshold.

Figure 4.5: The structure of the multi-spectral PCNN

77

Lij(t) =

1

0

if∑

k,l∈N(i,j) Yk,l > 0

otherwise

(4.8)

Uij(t) = (1 + β · Lij(t))∑

FMm=1(wmF

mij (t))· (4.9)

Yij(t) =

1

0

if Uij(t) > �ij(t)

otherwise

(4.10)

delta�i,j (t) =d�i,j (t)

dt= −αti,j + V t

i,jYi,j(t− 1) (4.11)

where t refers to time (the number of iterations); (i, j) indicates the index of the

current neuron (i.e. pixel ) and (k, l) indicates the neighboring field of the neuron (i.e.

3× 3 window)); m indicates that the external input from mth channel of the image;

Yi,j is the pulse output of the neuron (i, j) ; Ui,j is the internal activity of the neuron;

V Ti,j is the threshold magnitude scale (greater than 1); �i,j is the dynamic threshold

which controls whether the neuron pulse or not. In this paper, the firing threshold is

linear-depending threshold. αTij is the maximum value of the input image; The linking

strength coefficient β determines the weight of linking input to the internal status of

the neuron. The weight factor wm is the importance of mth spectral channel (M is

the total number of channels,∑M

m=1wm = 1).

4.2.2 Properties and Behaviors of multi-spectral PCNN

Unlike most other neural network models, the processing is automatic and there is no

training required in a PCNN. The PCNN algorithm consists of iteratively computing

until some stopping criterion is reached. Through iterative computation, neurons

produce a temporal series of pulse outputs, which indicates the pulse status of each

78

Figure 4.6: Periodical pulse of the neuron (i, j)

neuron (pixel). At each iteration, different neurons fire sequentially according to

the internal status of neurons and the firing threshold. Similarities in the input

pixels cause the associated neurons to pulse synchronously, thus indicating similar

structures.

The dynamic properties of the multi-spectral PCNN are very complex. In this

section, the property and behavior of a single neuron is analyzed under the assumption

that there are no linking connections. Biologically there is a fatigue period called

the refractory period after a neuron fires. A neuron cannot be captured by other

neurons if it is in the refractory period. Figure 4.6, obtained from equation 4.9-4.11,

illustrates the periodical pulse of a neuron (i, j), and its capture and refractory period.

Ui,j max is the possible maximum value of the internal activities, TRi,j is the refractory

period and TCi,j is the capture period, Fi,j is the combination of the external stimulus

Fi,j =∑M

m=1(wm × Fmi,j(t)).

When a neuron (i, j) fires, its threshold increases to V Ti,j, and then linearly de-

creases to Ui,jmax to make the neuron fire again. The pulse process continues period-

79

ically with the time Ti,j. During the refractory period TRij, the neuron cannot fire no

matter whether its neighbors fire or not because its threshold �i,j(t) is larger than the

maximum internal activity Ui,jmax. Only during the capture time TCij, the neuron

can pulse or be excited by other neurons as the threshold is lower than the maximum

internal activity Ui,jmax. There is a period changing from the refractory time to the

capture time. Therefore, PCNN mimics the mechanism of the biological neuron.

4.2.3 Rotational and Scale Invariant Feature Extraction Using

Pulse Spectral Frequency

The spectral time sequence generated from PCNN iterations reflects the image value

distribution pattern and thus it can be used to represent image features. Lindblad

and Kinser found that PCNN and wavelet transforms have many similarities, however

PCNNs are unique in that they generate rotation, translation and scale invariant time

signals (Lindblad & Kinser, 1999). This invariance property is identified by observing

the same period and number of peaks in the output time signals of a PCNN. The

invariance property is not hard to understand if we consider a PCNN as an image

representation that summarizes the firing neurons in the whole image (region). Since

an isotropic neighborhood is adopted, it does not matter which neuron is used to

provide the local stimulus. The output of a PCNN is not inherently invariant to

scale changes because the number neurons affected by the rescaled patch is changed.

However, the output pulse frequency of the rescaled patch is stable and the scale

changes will only be reflected in the outputs of a PCNN by a scale-factor. Here

we borrow the analysis method from literature (Johnson, 1994) to explain the scale

invariance properties of our PCNN model. For simplicity, we consider image patches

rather than single pixels. Assuming one image consists of a certain number of patches

and the patch number is independent to scale changes. Each image patch is considered

as a whole with its own intensity. The number of pixels in each patch depends on the

80

Figure 4.7: Geometry for scale invariance

scale factor. As shown in Figure 4.7, after an image is rescaled, the distances of image

patches change but the intensity per image patch is constant. When a neuron at patch

A receives a linking contribution from a neuron at patch B, the image patch at A

goes to kA and B goes to kB after the image is rescaled. However, the feeding input

of the neuron does not change (e.g. F (A) = F (kA)). Moreover, in the unit-linking

PCNN model, the linking input is unified to 0 or 1 and thus it does not depend on

the scale factor dependence k. Therefore, the internal activity of the rescaled patch

remains the same as that of the original patch and thus the pulse dynamics of each

image patch will not change. The only change is the actual number of pixels in each

image patch. As a result, if we can normalize this difference PCNN will be invariant

to scale changes.

In this research, the pulse spectral frequency (PSF) is used for rotation and scale

invariant object feature extraction. PSF is defined as a normalized histogram which

indicates the number of firing pixels in a specified time period (i.e. equation 4.12).

PSF (t) = N(t)/max(N) (4.12)

where N(t) is the number of firing pixels at time t; max(N) is the maximum

number of firing pixels in a time period. In order to achieve scale invariance we

normalize the number of firing pixels to [0, 1] in the discrete time steps. The dimension

81

of the histogram is equal to the total number of iterations. However, as stated by

Johnson in (Johnson, 1994), this scale invariance property may not hold for very small

image and large scale changes because the local group around a neuron also changes in

scale which cause the internal activity change as well. The calculation of PSF feature

consists of iteratively computing until the user decides to stop. There is currently

no automated stop mechanism built. Theoretically, the more the iteration time, the

richer information can be derived to characterize the image or object. However, at

the meantime, it will cause the high dimensionality of the feature vectors.

4.3 Colour and Texture Feature Fusion

4.3.1 Framework

Colour and texture are two fundamental features in describing an image, but prior

research generally focus on extracting colour and texture feature as separate entities

rather than a unified image descriptor (Whelan & Ghita, 2009). The use of colour

and texture information collectively has strong links with the human perception, and

this motivates investigating how to effectively fuse colour and texture as a unified

descriptor to improve the discrimination over viewing colour and texture features

independently. Although the motivation of using colour and texture information

jointly in object-based image classification is clear, how best to combine colour and

texture in a unified object descriptor is still an open issue. Huang el al. (Huang

et al., 2008) proposed a multiscale spectral and spatial feature fusion method based on

wavelet transform and evaluated in very high resolution satellite image classification.

Zhang et al. (Zhang et al., 2008) extracted texture features using multi-channel Gabor

filters and Markov random fields integrated the two features using a neighbourhood-

oscillating tabu search approach for high-resolution image classification. However,

these methods extract features from fixed window size and do not consider all pixels

82

Figure 4.8: Framework of object-level colour and texture feature fusion

within an object as a whole. Moreover, heavy computational burden is induced

by combining multiple features, which may cause the ‘the curse of dimensionality’

problem and decrease the performance of the classifier.

It is often difficult to classify objects using single feature descriptor. Therefore,

feature-level fusion plays an important role when multiple features are used in the

process of object classification. The advantages of feature fusion are: 1) the most

discriminatory information from original multiple feature sets can be derived by the

fusion process; 2) the noisy information can be eliminated from the correlation be-

tween different feature sets. In other words, feature fusion is capable of deriving and

gaining the most effective and least-dimensional feature vectors that benefit the final

classification (Yang et al., 2003). The feature fusion framework is illustrated in Figure

4.8. After object segmentation, colour and texture features are extracted from image-

objects. Different feature vectors are normalized and then serially integrated. After

that, kernel PCA is used to globally extract the nonlinear features from the integrated

feature sets as well as to reduce the dimensionality. The intrinsic dimensionality of

the serial fused features is estimated using a maximum likelihood method to select the

target dimensionality from the kernel PCA fused feature. Finally, features selected

from Kernel PCA are used as the input to classifiers for further analysis.

83

4.3.2 Feature Fusion Based on Kernel PCA

A serial fusion strategy (Yang et al., 2003) is used by simply combining different

feature vectors into one set of feature union-vector. Since the features are different

on the value scope, they are normalized into range [-1,1] by the Gaussian criterion.

Given a n-dimensional feature vector F = fij, where fij is the jth feature component

in feature vector Fi. Assume that fij is a Gaussian sequence, we compute the mean

mj and the standard deviation σj. Feature fij is normalized by f ′ij =

fij−mj

δj. Suppose

α and β are two feature vectors which are extracted from the same image-object.

The integrated feature union-vector is defined by γ =

α

β

. Obviously, if feature

vector α is m-dimensional and β is n-dimensional, then the dimension of the serial

integrated feature vector is (m+n).

Traditional linear feature selection and extraction methods such as PCA are con-

ducted in original input space, and thus cannot handle nonlinear relationships in the

data well (Cao et al., 2007). For example, the principal components of features may

not be linearly related to the input variables and the features of different categories

cannot be separated by a hyperplane. To solve this problem, kernel methods are intro-

duced to map original data to a kernel space using a mapping function. Kernel PCA

is one of these kernel methods which reformulate traditional linear PCA in a high-

dimensional space using a kernel function. Given M input vectors xp(p = 1, · · · ,M),

kernel PCA firstly map the original input vectors xp into a high-dimensional feature

space φ(xp) and then calculate the linear PCA in φ(xp). Performing PCA in the high-

dimensional feature space can obtain high-order statistics of the input variables, which

is also the initial motivation of kernel PCA (Xie & Lam, 2006). In PCA, the principal

component of xp is the product xpand the eigenvectors of the covariance matrix of M

input vectors. However, it is difficult to directly compute both the covariance matrix

of the high dimensional feature space φ(xp) and its corresponding eigenvectors and

84

eigenvalues in the high-dimensional feature space. Therefore, kernel tricks are em-

ployed to avoid this difficulty and the principal eigenvectors are computed from the

kernel matrix, rather than the covariance matrix of the high-dimensional feature.

Assuming that the kernel matrix is centered, i.e.∑M

p=1,q=1 K(xp, xq) = 0, intro-

ducing the kernel matrix K makes the mapping implicit without manipulating high

dimensional space φ(xp) explicitly in terms of Mercer’s condition. Each element of

K is a inner product in the high-dimensional space (i.e. K(xp, xq) = φ(xp) · φ(xq)).

Assuming C is the covariance matrix of φ(xp) , µi and βi are the ith eigenvalues and

eigenvectors of C respectively, and λi and αi are the ith eigenvalues and eigenvectors

of the kernel matrix K. The relationships between eigenvectors and eigenvalues of C

and K are

µi =λiM

(4.13)

βi =M∑j=1

αji × φ(xj) (4.14)

where αji is the jth element of αi (j = 1, · · · ,M).

In order to obtain the low-dimensional feature representation, the data is projected

onto the eigenvectors of the covariance matrix C. The result of low-dimensional data

representation Y is obtained by computing the principal eigenvectors of components

of xp in the space φ(xp).

yip =M∑j=1

α̃ji ×K(xj, xp) (4.15)

where yip is the ith element of yp and α̃i is the normalized αi(α̃i = αi√λi)

The mapping performed by kernel PCA relies on the choice of the kernel function.

In this paper, Gaussian kernel is employed which is widely used in many applications.

85

The kernel function is defined as:

K(xi, xj) = exp

(−‖ xi − xj ‖

2

2δ2

)(4.16)

where δ is the shape parameter.

Using kernel PCA to find the underlying structure of and the correlations of mul-

tiple feature sets has important benefits. First, the most discriminatory information

from original feature sets can be derived and redundant information can be eliminated

from the correlation between different feature sets by the fusion process. Second, the

dimension of feature sets can be reduced and thus the computational cost of the sub-

sequent classification stage is reduced. However, it is still a problem to find a criterion

for selecting optimal features using kernel PCA.

4.3.3 Intrinsic Dimensionality Estimation

In the previous discussion we assumed that the target dimensionality of the low-

dimensional feature representation was known and specified by the user based on

the descending order of the eigenvalues. Ideally, the optimal dimensionality needs to

be estimated automatically. A possible solution is to estimate the intrinsic dimen-

sionality of the high-dimensional feature set and use it as the target dimensionality.

Intrinsic dimensionality is the minimum number of variables that is necessary in order

to represent all the information in a dataset. In this paper a maximum likelihood

estimator (MLE) (Levina and Bickel, 2004) is employed to estimate the intrinsic di-

mensionality. MLE is a local intrinsic dimensionality estimator which is based on the

observation that the intrinsic dimensionality of the data manifold around one data

point can be estimated by measuring the number of data points covered by a hyper-

sphere with a growing radius. MLE considers the data points in the hypersphere as a

Poisson process, in which the estimated intrinsic dimensionality d around data point

86

xi in given k nearest neighbours is given by

d̂k(xi) =

[1

k − 1

k−1∑j=1

logTk(xi)

Tj(xi)

]−1

(4.17)

where Tk(xi) represents the radius of the smallest hypersphere with centre xi that

covers k neighbouring data points.

Different numbers of neighbouring data points can be treated as the different

scales. It was clear from equation 4.17 that the calculation of intrinsic dimension d̂

depends on scale parameter k. In this thesis, the intrinsic dimension is obtained by

averaging d̂ over a scale range [k1, k2]. It is calculated by the following equations.

d̂k =1

M

M∑i=1

d̂k(xi) (4.18)

d̂ =

∑k2j=k1

d̂k

k2 − k1 + 1(4.19)

where M is the number of input vectors, d̂k is the estimated intrinsic dimension at

scale k, and d̂ is the final estimation value.

4.4 Machine Learning Based Classification

Since object-based image classification is adopted, the classification was conducted in

object-feature space. After image segmentation, both spectral and texture features

of each image-object (i.e. tree crown polygon) were extracted and input to a variety

of classification algorithms for analysis. In this research, supervised machine learning

was adopted to classify certain vegetation species. Machine learning techniques are

now widely used in remote sensing data classification. A machine learning algorithm

is one that can learn from experience (observed examples) with respect to some class

of tasks and a performance measure (Mitchell, 1997). In supervised machine learn-

87

ing, the data is a sample of input-output patterns: given a input, a target output

is yielded. In the problem of supervised machine learning, given a sample of input-

output pairs, called the training sample, the task is to find a deterministic function

that maps any input to an output that predict future input-out observations, min-

imizing the errors as much as possible (Camastra & Vinciarelli, 2008). In order to

compare and justify the effectiveness of different classification models, three widely

used machine learning techniques were employed as benchmark classifiers in this re-

search: multilayer perceptron neural networks (MLP), decision tree forest (DTF),

and support vector machines (SVM). The theory regarding these machine learning

techniques is out the scope of this thesis. Only the basic idea of these techniques is

presented in this section but with a few more discussion on SVM which is believed to

be better.

4.4.1 Multilayer Perceptron Neural Networks

Artificial neural networks have been successfully in pattern recognition for many

years. A multilayer perceptron neural network (MLP) is a feed-forward fully con-

nected network. Although MLP can have an arbitrary number of hidden layers it has

been proven that one hidden layer is sufficient to guarantee that MLP has universal

approximation property (Camastra & Vinciarelli, 2008). Figure 4.9 shows a typical

three layer MLP with an input layer, a hidden layer and an output layer. In the input

layer, the predictor variable values (x1, · · · , xp) represents the input vector. Besides

the input vector, there is a constant input of 1.0 (called bias) which is multiplied

by a weight w and added to the sum going to the neuron. The weighted sum (uj)

is fed into an activation function (δ) in the hidden layer and outputs a value hj.

The outputs from the hidden layer are distributed to the output layer, in which they

are multiplied by a weight again to produce a combined value (vk). Afterwards, the

weighted sum (vk) is fed into an activation function (δ) to generate the final output

88

Figure 4.9: A multilayer perceptron neural network

(yk).

In order to use MLP to approximate a specific mapping, it is necessary to find

the parameter set (i.e. the weight values) that corresponding to such mapping. This

can be done through a training procedure where the network adapts the parameters

based on a set of labeled examples. There are several issues involved in designing and

training a MLP such as deciding the number of neurons in the hidden layer, finding a

globally optimal solution that avoids local minima, converging to an optimal solution

in a reasonable period of time and validating the neural network to test for overfitting.

4.4.2 Decision Tree Forest

Decision Tree Forests (DTF), also known as Random Forests, are ensembles of tree-

type classifiers. Decision tree forest classifiers were shown to be a highly accurate

model and comparable to other ensemble methods (i.e. bagging and boosting) in

terms of accuracy, but computationally much less intensive (Breiman, 2001). DTF is

a general term for ensemble methods using tree-type classifiers {h(x,�k), k = 1, · · · , }

where {�k} are independent identically distributed random vectors and x is an input

pattern. DTF has two stochastic (randomizing) elements: (1) the selection of data

rows used as input for each tree, and (2) the set of predictor variables considered

89

as candidates for each node split. A DTF grows a number of independent trees

in parallel, and they do not interact until after all of them have been built. For

classification, each tree in the Decision Tree Forest casts a unit vote for the most

popular class at input, while the output of the classifier is determined by a majority

vote of the trees.

Here is an outline of the DTF algorithm used in this research which was imple-

mented by the DTREG software(Sherrod, 2009):

(1) Take a random sample of N observations from the data set with replacement

(called “bagging”, some rows may not be selected while other may be selected multiple

times). On average, about 2/3 of the rows will be selected. The remaining 1/3 of the

rows are called the “out of bag (OOB)” rows.

(2) Construct a decision tree for each sample selected in step (1). Consider only

a subset of predictor variables as possible splitters for each node and perform a new

random selection for each split. Some predictors will not be considered for each split,

but a predictor excluded from one split may be used for another split in the same

tree.

(3) Repeat steps (1) and (2) a large number of times to construct a forest of trees.

(4) Just like the single-tree model, to “score” a row, run the row through each tree

in the forest and record the predicted value (i.e., terminal node) that the row ends

up in. Use the predicted categories for each tree as “votes” for the best category, and

use the category with the most votes as the predicted category for the row.

4.4.3 Support Vector Machine

SVM is a machine learning technology which has been successfully used in a variety of

pattern recognition tasks and often outperforming other classification methodologies

(e.g. Artificial Neural Networks) (Mills, 2008). SVM is a supervised non-parametric

statistical learning technique, therefore there is no assumption made on the underlying

90

Figure 4.10: A linear SVM example

data distribution. The basic idea of SVM training algorithm is to find a hyperplane

that separates the dataset into a discrete predefined number of classes. The term

optimal separation hyperplane is used to refer to the decision boundary that mini-

mizes misclassifications, obtained in the training step. Learning refers to the iterative

process of finding a classifier with optimal decision boundary to separate the training

patterns (in potentially high-dimensional space) and then to separate simulation data

under the same configurations (Zhu & Blumberg, 2002).

Figure 4.10 illustrates the simplest form of SVM: a linear binary classifier case.

To separate the training data {x1, x2, · · · , xn} with a label yi ∈ {+1,−1} into the

positive (+1) or negative (-1) classes, SVM tries to find an optimal decision function

(hyperplane) with the maximum margin ε between the points of each of the two

classes. These closest points are called the support vectors ( x1 and x2 are examples

of support vectors). The decision function is described as equation 4.11. The decision

can be made according to that when f(x) = 0, x is classified as +1, otherwise, x is

classified as -1.

For data not linearly separable in the input space, SVM would map the data

from the initial space to a (usually significantly higher dimensional) Euclidean space

91

H by computation of inner-product kernels K(xi, x). After the mapping, the data,

which is not linearly separable in the input space, become linearly separable in the H

space. A kernel function typically needs to fulfill Mercer’s Theorem in order to be a

valid kernel in SVMs (Scholkopf & Smola, 2001). Various classification methods are

constructed by employing different kernel functions K(xi, x) (e.g., linear, polynomial,

RBF, sigmoid). Radial basis function (RBF) is selected in this research as it often

suggested as the first choice since it has several advantages over other common kernel

functions (Hsu et al., 2008): 1) unlike linear kernel, RBF nonlinearly maps samples

into a high dimensional space, so it can handle the case when the relation between

class labels and attributes is nonlinear; 2) RBF kernel has less hyperparameters than

the polynomial kernel which make it less complex in model selection.

f(x) =N∑i=1

yiαiK(xi, x) + b (4.20)

RBF Kernel : K(x, y) = exp (−γ ‖ x− y ‖)2 (4.21)

where 0 ≤ αi ≤ C is the maximal margin hyperplane in the H space, C is a penalty

parameter and γ is the kernel parameter. When the maximal margin hyperplane is

found, only those points that lie closest to the hyperplane are the support vectors.

Overfitting is also another interesting concept that serves as a key attraction to

SVMs. Ideally an SVM analysis should produce a hyperplane that completely sep-

arates the feature vectors into non-overlapping groups. However, perfect separation

may not be possible, or it may result in a model which does not generalize well to

other data. SVM-based classification has been known to strike the right balance be-

tween accuracy attained on a given finite amount of training patterns and the ability

to generalize to other data (Mountrakis et al., 2011). To allow some flexibility in sep-

arating the categories, SVM models have a penalty parameter, C, that controls the

92

trade off between allowing training errors and forcing rigid margins. It creates a soft

margin that permits some misclassifications. A larger value of C increases the cost

of misclassifying points and forces the creation of a model perfectly fit the training

data but may not generalize well. Figure 4.11 illustrates an example of the tradeoff

between underfitting and overfitting. The accuracy of an SVM model is largely de-

pendent on the selection of the model parameters such as C, γ, etc. The DTREG

software package used in this research, provides two methods for finding optimal pa-

rameter values, a grid search and a pattern search (Sherrod, 2009). A grid search tries

parameter values across the specified search range using geometric steps. A pattern

search starts at the center of the search range and makes trial steps in each direction

for each parameter. If the fit of the model improves, the search center moves to the

new point and the process is repeated. If no improvement is found, the step size is

reduced and the search is tried again. The pattern search stops when the search step

size is reduced to a specified tolerance.

SVMs were originally designed for binary classification. Multi-classification prob-

lem should be decomposed into several binary classification problems. Currently two

popular decomposing strategies: one-against-one and one-against-all. As to the one-

against-one method, each SVM should be trained for each pair of classes, while in

one-against-all approach, one SVM should be built for each class. Hsu and Lin have

proven that the one-against-one strategy is as accurate to the one-against-all strategy,

but the former strategy is more practical as it requires less training time (Hsu & Lin,

2002). Therefore, one-against-one strategy is adopted in this research for multi-class

tree species classification.

SVMs are particular appealing in the remote sensing field due to their ability

to successfully handle small training datasets, often producing higher classification

accuracy than the traditional methods (Mountrakis et al., 2011). Many statistical

techniques such as maximum likelihood estimation usually assume that data distri-

93

Figure 4.11: Tradeoff between underfitting and overfitting

94

bution is known as a priori. While the major benefit of SVM that SVMs are devel-

oped around the principle of Structural Risk Minimization to address issues concerned

with generalization. Under this scheme, SVMs minimize classification error on un-

seen data without prior assumptions made on the probability distribution of the data.

This is particularly appealing in remote sensing applications since data acquired from

remotely sensed imagery usually have unknown distributions, and methods which as-

sume data as a normal distribution do not necessarily match that reality.

4.5 Summary

This chapter briefly reviews the major steps for thematic information extraction from

remote sensing image classification with special attention on visual features that may

applied to object-based tree species classification. Novel methods in spectral-texture

feature extraction, feature fusion and machine learning based classification were de-

veloped and applied to object-based tree species classification using multi-spectral

imagery.

95

x

96

Chapter 5

Experiments and Results

5.1 Object Detection and Segmentation

To evaluate the proposed algorithms for power line detection and individual tree

crown segmentation, experiments were conducted on the collected aerial image data.

Both qualitative and quantitative measures are used in the evaluation.

5.1.1 Power Line Detection

The first experiment was conducted on the collected high-spatial resolution natural

colour UAV images. In the experiment, we compare Hough line detection results on

edge maps generated from Canny filter and the proposed pulse coupled neural filter

(PCNF). The results before and after using knowledge-based line clustering in Hough

space were also compared. As is shown in Figure 5.1 (a), there are many linear features

in the original image: power lines, edges of road, shadows, etc. These linear features

are detected by Hough transform (see Figure 5.1 (c), shown in red lines). Although

some of these lines can be eliminated by applying knowledge based post-processing,

lines such as road edges are not removed because they are parallel to power lines (see

Figure 5.1(d), shown in green lines). A better choice is trying to avoid the misleading

97

information before detecting power lines. As is shown in Figure 5.1 (e), after using the

proposed PCNF for preliminary detection of power lines and edge maps generation,

most irrelevant points are filtered, though a few noises still exist. This is because

PCNF has the characteristic of grouping pixels according to the spatial and spectral

similarity. It reduces the local gray-level differences of images and makes up local

tiny discontinuous points in image regions. Power lines are made of special metal and

have uniform brightness on images while the background is different on textures and

intensities. Neurons stimulated by power lines generate different spectral stimuli from

that of the background, and then they pulse non-synchronously. Thus, power lines are

discriminated from the background. According to our experiment, the pulse output

of PCNF at the third iteration is a safe choice because pixels corresponding to power

lines pulsed and most of the background pixels have not pulsed at that time. However,

automatic selecting of temporal pulse outputs is required in the future work. From

Figure 5.1 (g) and (h), we can see that after using PCNF, power lines are correctly

detected no matter using knowledge based post-processing or not.

Figure 5.2 shows more results of the experiment. The first row shows the original

images. Row 2 and Row 3 are Hough line detection results on Canny edge image and

PCNF edge image without using knowledge-base post-processing. Row 4 and row

5 are the results after using knowledge based line clustering. From the experiment,

it is clear that the proposed pulse coupled neural filter (PCNF) is very useful as a

pre-processing tool. Most noises are filtered and power lines are prominent in the

images. After using PCNF, fewer irrelevant lines exist. Applying knowledge-based

post-processing by clustering lines in the Hough space also increases the accuracy of

power line detection. Combination of these techniques can significantly increase the

accuracy of power line detection in a complex environment.

An quantitative evaluation was conducted on a set of 15 images containing 53

powerlines located in 9 spans (the ground truth was manually labeled by visual ob-

98

Figure 5.1: Comparison of power line detection results

servation). The algorithm obtained an overall accuracy of 88.68% with a false positive

rate of 7.84% and false negative rate of 11.76%. It is noted that metallic fence line is

also detected (see the left line in Figure 5.1), because it has very similar characteristics

with power lines. In some countries such as Australia, it is not uncommon that the

fence lines exist near power line corridors and in many cases they are parallel to power

lines. Besides the difficulty to discriminate these very mistakable linear features, low

spatial resolution and motion blur are the other two major problems for the failure

of the power line detection algorithm. In general, the performance of the algorithms

depends on the quality of the image. Figure 5.3 shows some failure cases due to the

low quality of the collected imagery. In Figure 5.3 (a), it is hard even for human to

discriminate power lines without noticing the shadow of power pole. In Figure 5.3

(b), it is noticed that the illumination condition and the shadows of trees significantly

influence the visibility of power lines, thus causing difficulty for computer algorithms.

Compared to image based approaches, LiDAR is more popular and reliable for

power line survey since it can provide high density point cloud data and does not reply

99

Figure 5.2: Power line detection results

100

Figure 5.3: Failure examples for power line detection

Figure 5.4: Multi-layer powerlines and crossing powerlines

101

on illumination conditions. The developed knowledge-based line detection algorithm

does not consider the scenario of crossing power lines. Moreover, using image-based

approach is not possible to detect multi-layer power lines, especially when trees grow

over power lines (Figure 5.4). LiDAR data provides 3D information and make it

possible to detect multi-layer structured power lines. In addition, LiDAR can more

effectively generate accurate elevation and terrain models, which can also help to

remove terrain points and other similar linear features (e.g. fences). Therefore, to

provide a reliable solution for power line detection, future work is to develop effective

algorithm for LiDAR point cloud data processing and 3D modeling of catenary curve.

5.1.2 Individual Tree Crown Segmentation

The criterion for successful vegetation detection in this study is defined as an indi-

vidual tree having been delineated while preserving the contour of the crown. The

developed tree crown segmentation algorithm was evaluated against two existing im-

age segmentation algorithms (Deng & Manjunath, 2001), JSEG and TreeAnalysis

(Erikson, 2003). The JSEG algorithm is a classic region-based image segmentation

method which takes advantage of both colour and texture properties and it has been

widely used in natural image segmentation in many computer vision application.

TreeAnalysis is a fuzzy region-growing segmentation algorithm specifically designed

for tree crown delineation. Figure 5.5 shows a comparison of the three segmentation

algorithms and also the ground truth image. The ground truth data was created

by manual segmentation of individual trees. From visual assessment, the developed

algorithm obtained much better results than the other two algorithms. Since spec-

tral, texture and morphological features were considered in the developed algorithm,

trees were successfully detected and shadows were removed, thus the boundaries of

individual tree crowns were delineated more accurately. In contrast, the results by

TreeAnalysis and JOSEF showed significant under-segmentation caused by the con-

102

fusion between tree crown and its shadows.

In order to produce a quantitative evaluation, an analysis of under-segmentation

and over-segmentation was conducted using a set of four metrics: 1-to-1, 1-to-M,

M-to-1, and missing (Wang et al., 2006). The first criterion 1-to-1 indicates the suc-

cessful mapping of a single tree crown in the real world to a single tree crown by the

segmentation algorithm. The criterion 1-to-M defines a single tree crown that has

been incorrectly segmented into several portions; likewise, M-to-1 describes a cluster

or group of trees that have been segmented as one. Missing indicates that a tree

has been misclassified as ground. The accuracy of any algorithm is then calculated

as the proportion of correct 1-to-1 mappings to the total number of trees present.

In certain instances, it will be necessary to inspect vegetation from the ground to

acquire an accurate truth data set. Of the multi-spectral images captured, a series

of 13 images were selected for processing with a total number of 183 trees. Frames

were removed from the full sequence of images to minimize overlap that would see

some trees processed more than once. Furthermore, the algorithm was only applied

to those areas of the image that contained the power-line corridor. Table 5.1 shows a

quantitative comparison of JSEG, TreeAnalysis and the developed algorithm. Table

5.2 shows the quantitative evaluation results spread to each image. Overall, 178 of

183 trees imaged were detected by the developed algorithm, thus yielding a detection

rate of 97.27%. Although over- and under segmentations are undesired, detection is

achieved all the same, whereas those missed by the algorithm are potential threats

that go unchecked. It appeared that sparse foliage and small crowns were the ma-

jor reason of trees missed by the algorithm, where the combination of low spatial

resolution and low foliage density produced limited contrast for segmentation. With

regard to correct segmentation, the algorithm was found to achieve an overall accu-

racy of 84.7%, with the contribution of errors stemming from under-segmentation,

over-segmentation and missing of trees. Over-segmentation is less serious a prob-

103

Figure 5.5: Ground truth and segmentation results

lem than under-segmentation, since a posteriori merging of segments after the final

object-based classification is easier if the segments are correctly classified as the same

species. While under-segmentation makes trouble to the classification algorithm if

the trees in one segment are of different species. Of these errors, under-segmentation

is of greatest concern as it is expected at this stage that individual trees are selected

for further processing. Even to the naked eye, some instances are hard to detect

as under-segmentation typically occurs when trees have grown in a tight group and

have overlapping crowns. Add to that clusters containing grass as the background

is further complicated, with similar co lours and even texture. Figure 5.6 shows an

example of the failure cases where the algorithm failed to discriminate background

grass and trees. Combining Li DAR elevation data and Multi-spectral image will be

helpful to solve the under-segmentation problem.

104

Figure 5.6: A failure example of individual tree crown delineation

Table 5.1: Quantitative comparison of three segmentation algorithmsMethod 1-to-1 1-to-M M-to-1 Missing AccuracyJSEG 126 11 34 12 68.85%

TreeAnalysis 130 12 32 9 71.04%Our Algorithm 155 11 12 5 84.7%

Table 5.2: Quantitative analysis of individual tree crown detection and delineationImage Truth 1-to-1 1-to-M M-to-1 Missing Accuracy

1 21 17 0 2 2 81.95%2 11 11 0 0 0 100%3 25 21 2 2 0 84.0%4 21 20 1 0 0 95.24%5 11 9 0 2 0 81.82%6 15 13 1 0 1 86.67%7 3 3 0 0 0 100%8 11 10 0 0 1 90.91%9 11 9 1 0 1 81.82%10 13 11 2 0 0 84.62%11 13 9 2 2 0 69.23%12 22 16 2 4 0 72.73%13 6 6 0 0 0 100%

Overall 183 155 11 12 5 84.7%

105

5.1.3 Fusion of LiDAR and Multi-spectral Imagery

Ground filtering is an important step in the LiDAR and multi-spectral imagery fusion

algorithm. Therefore, the first experiment in this section is to evaluate the ground

filtering algorithm. The experiment is conducted on both LIDAR height data and

intensity data. Figure 5.7 (a) and (b) show 3D views of height and intensity values of

the LIDAR data. Different colours represent ranges of values while blue indicates the

lowest value range and white indicates the highest value range. From the example data

it is clear that ground points have the lower height, while object points have the lower

intensity. The skewness and kurtosis sequences were calculated from both intensity

and height value of the LIDAR data as shown in Figure 5.7 (c) and (d). Object

points are separated from ground points based on skewness and kurtosis sequences

as described by Algorithm 3 in chapter 3. Figure 5.7 (e) and (f) present the ground

filtering results on height and intensity data, in which object points are shown in blue

while ground points have been removed. From visual assessment, using intensity data

obtains better results than using height data. It is noted in Figure 5.7 (e) that some

small trees has been removed together with ground points and the power lines were

significantly broken. Although there is more noise exist in the result on intensity data

(Figure 5.7(f)), object points were preserved better. The obtained object points, are

then used to further improve individual tree crown detection and delineation together

with multi-spectral imagery.

Ergon’s network is mostly in rural areas, where vegetation detection is relatively

easier because there are less number of land cover types. However, in urban scenes, the

detection and delineation of individual trees become more difficult since more types of

land covers exist (e.g. trees, shrubs, grass, buildings, roads, swimming pools, etc.). In

this case, it is very hard to achieve robust tree crown detection and delineation using

only imagery or LiDAR data. Combining the complementary height information in

LiDAR data is very helpful.

106

Figure 5.7: Comparison of LIDAR intensity and height data by skewness and kurtosisanalysis

107

Figure 5.8: A pair of CIR image and LiDAR point cloud data in urban area

Figure 5.8 shows a pair of CIR image and LiDAR point cloud data in urban

areas. Figure 5.9 shows the fusion process and the result using the image and LiDAR

data. As is show in Figure 5.9 (a), an initial segmentation was conducted on the

CIR image without any post-processing. The initial segmentation detects trees as

well as other vegetation segments (e.g. grass). Figure 5.9 (b) shows the 2.5D depth

image of LiDAR object points after ground filtering. Each connected region in the

initial segmentation map was labeled, showing different colours in Figure 5.9 (c). The

2.5D depth image is then integrated with the labeled vegetation segments map. A

simple thresholding process is used in order to remove grass and low vegetation. The

mean height of each region is calculated and the regions with a mean height less than

1.5 meters are removed. An overlay of the LiDAR points after the fusion process

on the initial segmentation map is shown in Figure 5.9 (d). From this figure, we

can see that the low mean height regions representing grass and other low vegetation

were separated. Figure 5.9 (e) shows the tree segments after the fusion process.

Afterwards, an watershed algorithm described in chapter 3 was applied on the Figure

108

Figure 5.9: Fusion of LiDAR and multi-spectral imagery for tree crown delineation

109

5.9 (e) to decompose the tree clusters to individual tree crowns. As can be seen from

the final segmentation result, low vegetation regions have been successfully removed.

However, a critical limitation from this fusion process is that it depends on the high

point density of LiDAR data. For a small-sized tree crown and low point density

LiDAR data, no point or only a few points hit the tree which may cause the tree been

removed due to low region mean height.

5.2 Feature and Classifier Evaluation

5.2.1 Experiment Setup

In order to compare different feature descriptors and justify the effectiveness of dif-

ferent classification models, a number of experiments were conducted using three

benchmark classifiers: multilayer perceptron neural networks, decision tree forest,

and support vector machines. In this study, the implementation of DTREG software

is used for the three classifiers (Sherrod, 2009). V-fold cross validation technique was

employed in the experiment, and 10 folders were selected for the cross validation. The

dataset is partitioned into 10 groups, which is done using stratification methods so

that the distributions of categories of the target variable are approximately the same

in the partitioned groups. 9 of the 10 partitions are collected into a pseudo-learning

dataset and A classification model is built using this pseudo-learning dataset. The

rest 10% (1 out of 10 partitions) of the data that was held back and used for testing

the built model and the classification error for that data is computed. After that, a

different set of 9 partitions is collected for training and the rest 10% is used for testing.

This process is repeated 10 times, so that every row has been used for both training

and testing. The classification accuracies of the 10 testing datasets are averaged to

obtain the overall classification accuracy.

The basic design of the benchmark classifiers is described as follows:

110

1) Multilayer Perceptron Neural Networks (MLP): Three layers are used with one

input layer, one hidden layer and one output layer. The number of neurons in the

hidden layer is automatically optimized in DTREG. A logistic (sigmoid) activation

function is employed in both the hidden layer and output layer. Overfitting detection

and prevention: 20 instances from the training data are removed and used as a

validation set to check for overfitting as model tuning is performed. The error from

that test is compared with the error computed using previous parameter values. If

the error on the test rows does not decrease after 10 iterations then the training is

stopped and the parameters which produced the lowest error on the test data are

selected.

2) Decision Tree Forest (DTF): In the experiments, a maximum number of 200

trees was used when constructing a forest, and each tree can be grown to up to 50

levels (depth). In addition, a node in a tree will not split if it has fewer than 2 rows in

it. When a tree is constructed in a forest, a random subset of the predictor variables

are selected as candidate splitters for each node. In the experiments, the square root

of the number of total predictor variables was used as the candidates for each node

split, which is suggested by Leo Breimen (Breiman, 2001).

3) Support Vector Machines (SVM): RBF kernel was used for all SVM models

in the experiments. The one-against-one strategy was used because the classification

task in this study is a multi-classification problem. A grid search method was used

to find the optimal SVM model parameters such as C and γ. The searching rages are

0.1 < C < 50000 and 0.001 < γ < 20.

The image-objects generated from segmentation is arbitrary-shaped, however, tex-

ture measurements are usually extracted based on the texture property of pixels or

small blocks within the rectangular shaped region. Therefore, the arbitrary-shaped

objects are extended to a rectangular area for texture extraction. This can be achieved

by padding zero or mean value outside the object boundary, or obtaining the inner

111

rectangle from the object. Zero padding introduces spurious high frequency compo-

nents leading to degrading the performance of the texture feature, while the inner

rectangle cannot usually represent the property of the entire object well. Mean-

intensity padding has shown better performance than the other two approaches (Liu

et al., 2006a) and thus was adopted in this study. Firstly, the minimum bounding

rectangle was obtained from the image segment, and then the area outside the seg-

ment and inside of the minimum bounding rectangle was padded using the mean

value of pixels in the region.

5.2.2 Performance Measure

Given a certain application, more than one method is applicable. This motivates

evaluating the performance of these classification methods empirically in a specific

application. That is, given several classification algorithms, how can we say one has

less error than the others for a given application? Having selected a classification al-

gorithm to train a classifier, can we tell an expected error rate with enough confidence

that later on when it is used in a new dataset?

In this study, several most commonly used metrics were discussed for evaluating

different classification algorithms: overall accuracy, precision/recall, F-measure, ROC

analysis, and computational cost. All of these measures are based on the definition

of a confusion matrix. An example of confusion matrix for binary classification is

described in Table 5.3. To help the definition that follows, we define the following

symbols: TP: True Positive count; FN: False Negative count; FP: False Positive

count; TN: True Negative count.

The overall accuracy is the simplest and most intuitive evaluation measure for

classifiers. It is defined as

Accuracy =Number of correct predictions

Total number of samples=TP + TN

P +N(5.1)

112

Table 5.3: A confusion matrixPredicted Actual CategoryCategory Positive NegativePositive TP FPNegative FN TN

P=TP+FN N=FP+TN

It is worth noting that the overall accuracy does not distinguish between types

of errors the classifier makes (i.e. False Positive versus False Negative) (Japkowicz,

2006). For example, two classifiers may obtain the same accuracy but they may

behave quite differently on each category. If one classifier obtains 100% accuracy on

one category but only 41% on the other category, while another classifier generate

70% for each category, it is hard to claim that the first classifier is better. Therefore,

overall accuracy may not be use blindly as the evaluation method for classifiers on

a dataset. Precision and Recall can avoid the problem encountered by Accuracy.

Precision can be seen as a measure of exactness or fidelity, whereas Recall is a measure

of completeness. Their definitions are:

Precision =TP

(TP + FP )(5.2)

Recall =TP

P(5.3)

The goal of Precision/Recall space is to be in the upper-right-hand corner, which

means that the higher value of measure, the better classifier’s performance. However,

Precision and Recall do not judge how well a classifier decides that a negative example

is, indeed, negative. Receiver Operating Characteristic (ROC) analysis can solve both

the problems of Accuracy and Precision/Recall. ROC analysis plots the False Positive

Rate (FPR) on the x-axis of a graph and True Positive Rate (TPR) on the y-axis.

TPR is equal to Recall and FPR is defined as FPR = FPN. A ROC graph depicts

113

Figure 5.10: Illustration of ROC space analysis

relative trade-offs between true positive (benefits) and false positive (costs), and the

goal in ROC space is to be in the upper-left-hand corner (Davis & Goadrich, 2006).

The (0,1) point of the ROC space is also called a perfect classification. The diagonal

line from the left bottom to the right top corner is also called the random guess

line, which can be used to judge the whether it is good or bad classification. Points

above the random guess line indicate good classification results, while points below

the line are considered as bad classification results. The shorter the distance to the

(0,1) point, the better the classification is. Figure 5.10 1illustrates the evaluation of

classifiers in ROC space.1Source: http://en.wikipedia.org/wiki/Receiver_operating_characteristic

114

5.2.3 Evaluation of PSF feature in Rotation and Scale Invari-

ant Texture Classification

The first experiment assessed the developed PSF features in rotation and scale invari-

ant texture classification. A dataset of rotated texture images from the University

of Southern California Signal and Image Processing Institute (USC-SIPI) texture

database 2 is used in the first experiment. The dataset consists of the 13 Brodatz

textures digitized at seven different rotation angles: 0, 30, 60, 90, 120, 150, and 200

degrees (91 images). To test the scale invariance properties, each image was resized

to 4 scales (0.25, 0.5, 0.75, and 1). A total of 364 images were used in the experiment

with 13 textures (i.e. wool, bark, brick, bubbles, grass, leather, pigskin, raffia, sand,

straw, water, weave, and wood).

PSF features were extracted from the gray-scale texture images and then used

as input to build the classifier. Figure 5.11 shows some examples of texture images

and their corresponding PSF features. Figure 5.11 (a), (b) and (c) shows the PSF

features of bark texture at different scales and rotation angles. As we can see, the PSF

histograms of three bark texture images are not exactly the same, but the changing

trends of PSF histograms in a time period are approximately the same. However, as

shown in Figure 5.11 (d), (e) and (f), the PSF features of the other 3 textures (i.e.

bubbles, brick and water) are very different from each other. A well-known texture

descriptor, local binary patterns (LBP), is also evaluated for comparison purposes. In

this experiment, the rotation invariant LBP (Ojala et al., 2002) and a SVM classifier

were employed in the classification test. Table 5.4 compares the performance of LBP

and PSF when textures are rotated only and with both rotation and scale changes.

From the results we can see LBP performs slightly better than PSF when images

are with rotation only. However, PSF generated much higher classification accuracy

when the images have both rotation and scale changes. Table 5.5 compares the average2USC-SIPI Database: http://sipi.usc.edu/database/database.cgi?volume=textures

115

Table 5.4: Accuracies of PSF and LBP in texture classification (in percent)Rotation Only Rotation and Scale

PSF 98.9 99.18LBP 100 94.51

Table 5.5: Averaging computational costs of PSF and LBP (in seconds)PSF LBP

computational cost 0.937 0.136

computational costs of PSF and LBP per image. The experiment is conducted under

a desktop PC configuration of core duo 2.66GHz CUP and 2GB memory. Since PSF

involves iterative computation of the neural network, the computational cost is much

higher than LBP.

5.2.4 Evaluation of Features and Classifiers for Tree Species

Classification

The second experiment evaluates the PSF feature in individual tree species classifi-

cation using the collected multi-spectral imagery. It should be noted that classifying

all types of species in power line corridors requires significantly more resources than

what is currently available. In this research, we focus on three dominant species in

our test field. We abbreviate the species names to Euc-Ter, Euc-Mel and Cor-Tes.

Through a field survey with a botanist’s participation, 121 trees were selected and

labeled for the experiment with 64 Euc-Ter, 30 Euc-Mel and 27 Cor-Tes trees.

Since object-based classification is used in this study, individual tree crowns are

firstly segmented from image and local features are extracted from the crown regions

and after that the classification is conducted in object-feature space. Since the main

aim of this experiment is to evaluate the effectiveness of object feature descriptors,

we assume that the segmentation is perfect and thus individual tree crowns are man-

ually segmented from the images during the field survey. For comparison purposes,

116

Figure 5.11: Examples of texture images and their PSF features

117

some classic colour and texture feature descriptors were also evaluated. These include

GLCM, Gabor filters, LBP and colour histogram features extracted from 4 spectral

bands and also HSV colour space. It is also worth mentioning that plants have

distinctive spectral signature which is often modeled by combinations of reflectance

measured in two or more spectral bands. This motivates us to investigate whether

spectral-texture features extracted from spectral vegetation indices could help in veg-

etation species classification. In the experiment, three widely used vegetation index

maps are employed: the Normalized Difference Vegetation Index (NDVI), the Soil Ad-

justed Vegetation Index (SAVI), and the 2-band Enhanced Vegetation Index (EVI2).

PSF histogram features are generated from both the original spectral bands and the

three vegetation index maps.

Table 5.6 compares the overall classification accuracies of the PSF feature and

three classic texture descriptors using different machine classifiers. The results clearly

show that the selection of both feature descriptors and classifiers will strongly in-

fluence the classification accuracies. Nevertheless, the PSF feature obtains the best

overall classification accuracy on all three benchmark classifiers, which confirm its use

as an effective feature descriptor for this data. We also evaluated the performance of

PSF features extracted from multiple spectral bands. Table 5.7 summarizes the over-

all classification accuracies of these features. Hist-RGBNIR and Hist-HSV refer to

the colour histograms extracted from four spectral bands (R, G, B and NIR) and HSV

colour space; similar names are used for PSF features extracted from four spectral

bands and also HSV colour space; PSF-HSV-VI represents the PSF feature extracted

from both HSV colour space and three vegetation index maps. From the results,

we can see that PSF features show significant improvement over colour histograms.

While the colour histograms characterize the colour distribution of the pattern, they

do not exploit the spatial layout of the colours. It is also noted that PSF-HSV outper-

forms the PSF feature calculated from original spectral bands. Another interesting

118

Table 5.6: Overall classification accuracies of PSF and texture features (in percent)GLCM Gabor LBP PSF

MLP 69.42 71.9 66.94 70.25DTF 56.2 71.07 71.07 77.69SVM 69.42 69.42 77.69 77.69

Table 5.7: Overall classification accuracies of colour histogram and PSF features inmultiple spectral bands (in percent)

Hist-RGBNIR Hist-HSV PSF-RGBNIR PSF-HSV PSF-HSV-VIMLP 71.9 75.21 75.21 80.17 85.12DTF 71.9 78.51 80.17 80.99 78.51SVM 76.03 69.42 79.34 81.82 85.95

result is that when we incorporate the spectral vegetation index into the PSF-HSV

feature, a significant improvement is achieved. For the machine learning algorithms

tested, generally SVM is found to be robust to obtain good classification accucacy

for most of the feature vectors.

Figure 5.12 presents the analysis results of different feature descriptors using a

SVM classifier in ROC space. As we can see, generally most features get better

performance for class ‘Cor-Tes’ than the other two classes. PSF-HSV-VI performs

the best for classes ‘Euc-Ter’ and ‘Cor-Tes’ and PSF_HSV performs the best for

‘Euc_Mol’. Overall the PSF features outperform other colour and texture descriptors

for all three classes. We attribute the success of PSF feature to its capability of

capturing the local structure of image and its unique property of rotation and scale

invariance. These properties make PSF especially useful in object classification from

aerial images because the same object type has different shapes when viewed from

different heights and directions. Moreover, PSF can be easily extended to represent

the spectral-texture patterns by integrating the PSF histograms extracted from the

pulse images of multiple spectral bands.

119

Figure 5.12: Analysis of different feature descriptors in ROC space

5.2.5 Evaluation of Colour and Texture Feature Fusion

The third experiment evaluates the performance of the Kernel PCA based colour and

texture feature fusion scheme in object-based tree species classification. Two classic

colour and texture features, colour histogram and LBP, are selected in the experi-

ment because they are of high dimensionality. The overall classification accuracy of

the fused colour-texture feature were compared with single feature vectors and serial

integrated feature vector through the same classifier. SVM was used in the classifi-

cation test. For comparison purpose, another widely used nonlinear feature selection

technique, Generalized Discriminant Analysis (GDA), are also evaluated in the exper-

iment. GDA is also known as Kernel Linear Discriminant Analysis (Kernel LDA), it

is the reformulation of LDA in the high dimensional space constructed using a kernel

function (Baudat & Anouar, 2000). Gaussian kernel function is used to construct

GDA for the fusion of colour-texture features.

120

Figure 5.13: Classification accuracies of the fused features at different dimensions

From the experiment, the overall classification accuracies of colour histogram and

LBP texture features are 76.03% and 71.07% respectively. The serial integration of

these two features shows better performance over single feature with an overall ac-

curacy of 83.47%. To evaluate of performance of fused feature using kernel PCA,

we use a step-by-step model justification method (Song & Tao, 2010). We justify

the dimensionality from 2 to 8 with step 2, and from 10 to 100 with step 10 for the

fused feature vectors. Figure 5.13 shows the classification accuracy curve at different

dimensions. As we can see from the figure, the kernel PCA fused feature performs

much better than single feature and serial integrated feature. However, it is still based

on the assumption that user can specify a good target dimensionality. The estima-

tion of intrinsic dimensionality using MLE is employed as the automatic selection of

optimal number of dimensions. From our experiment, the intrinsic dimensionality of

the integrated LBP and Colour histogram feature is 39.3016, which conforms to the

result from Figure 5.13 where the best accuracy (95.04%) is obtained in dimension

40.

In the experiment, the computational costs of the classifiers using different fea-

121

Table 5.8: The classification results of single and fused colour and texture featuresHist-RGBNIR LBP Serial_Fusion KPCA-40 GDA-40

Overall Accuracy 76.03% 71.07% 83.47% 95.04% 91.74%Analysis Time 79.46 s 246.29 s 305.83 s 13.63 s 4.63 s

Table 5.9: The confusion matrix of SVM classification using the fused featurePredicted Actual CategoryCategory Euc-Ter Euc-Mel Cor-TesEuc-Ter 60 1 0Euc-Mel 4 28 0Cor-Tes 0 0 27

ture vectors are also compared. The analysis time is recorded under a desktop PC

configuration of core duo 2.66GHz CUP and 2GB memory. Table 5.8 summarizes the

overall accuracies and analysis time using different feature sets. The optimal dimen-

sion of kernel PCA and GDA fused feature is 40, which is derived from the estimation

of the intrinsic dimensionality. From the results, we can see that the analysis time

varies a lot for different feature sets. High dimensionality of the original colour and

texture features and the serial fused feature cause high computational costs, while

the using the nonlinear fusion method like kernel PCA and GDA can not only im-

prove the classification accuracy but also significantly reduce the dimensionality and

the computational costs. Table 5.9 shows the confusion matrix of classification using

KPCA-40 feature.

From the experimental results, it is clear that fusion of colour and texture features

provides improved discriminative power over using them independently. Moreover,

the proposed nonlinear feature fusion strategy using kernel PCA has shown great

improvement over the serial fusion strategy, not only on reducing the dimensional-

ity and computational cost, but also on removing noisy information and improving

the discriminative power. The proposed feature fusion strategy can be extended to

combine any other feature vectors if they are considered to have some complemen-

tary information. As an example, the PSF-HSV and LBP features were also tested

122

Table 5.10: The classification results of single and fused PSF-HSV and LBP featuresPSF-HSV LBP Serial_Fusion KPCA-35

Overall Accuracy 81.82% 71.07% 82.64% 90.91%Analysis Time 18.80 s 246.29 s 263.37 s 11.56 s

using the same fusion strategy. PSF-HSV and LBP features are serially integrated

as an union vector and then the intrinsic dimensionality of the serially fused fea-

ture was estimated using MLE method. According to the experiment, the intrinsic

dimensionality of the fused feature is 34.6609. Therefore, the first 35 eigenvectors

with the largest eigenvalue were selected to represent the serially fused feature. Table

5.10 shows the classification results of single and fused PSF-HSV and LBP features.

The fused PSF-HSV and LBP feature reached an overall accuracy of 92.13%, which

showed significant improvement over single and serially fused features. However, it is

also noted that higher accuracy of single features does not necessarily lead to higher

accuracy for the fused feature. For example, PSF-HSV showed better performance

than Hist-RGBNIR but the its fusion with LBP did not obtain better result than the

fused Hist-RGBNIR and LBP features.

5.3 Summary

This chapter presents the experiments and results for evaluating the developed algo-

rithms in object detection and segmentation, feature extraction and image classifica-

tion. The findings through the experiments include:

(1) The developed pulse coupled neural filter has been successfully applied as a

pre-processing method for initial detection of power lines prior to the Hough transform

being employed. Knowledge-based line clustering in Hough space further improved

the detection accuracy. However, it is observed that difficulties occurs for power line

detection from aerial imagery due to the low quality of the collected imagery (e.g.

illumination changes, motion blur and low spatial resolution).

123

(2) By using a PCNN in spectral feature space, followed by post-processing using

a watershed algorithm, the developed tree crown segmentation algorithm achieved a

detection rate of 97.27% and a segmentation accuracy of 84.7% from the collected

multi-spectral imagery. The major problems are the under-segmentation of tree clus-

ters and the inefficiency of discriminating trees from grass and other low vegetation.

(3) Fusion of multi-spectral imagery and LiDAR data has great potential to achieve

better object detection results. Object points were successfully detected by using a

statistical-based ground filtering algorithm on LiDAR intensity data. Region-level

fusion of initial vegetation segmentation map and LiDAR object points make it easier

to filter low vegetation regions. However, a critical limitation from this fusion process

is that it depends on the high point density of LiDAR data.

(4) The developed PSF feature is invariant to rotation and scale changes. The

experimental results on USC-SIPI texture database demonstrated its pros and cons

compared to LBP features. The experiments in vegetation species classification fur-

ther compared the discriminative power of PSF and some classic colour and texture

descriptors. Overall, PSF-HSV-VI feature obtained the best classification accuracy

and SVM generally performed the best among the three tested classifiers.

(5) Colour and texture features contain complementary information and thus mo-

tivate the fusion of feature descriptors. The experimental results demonstrated the

effectiveness of the kernel PCA based feature fusion method. Intrinsic dimensional-

ity also played an important role to determine the optimal dimension of the fused

features. The fused feature showed significantly higher classification accuracy than

using each feature independently and the serial integrated feature.

124

Chapter 6

Conclusion and Future Work

This thesis comprehensively investigated the use of aerial remote sensing and com-

puter vision techniques for power line corridor monitoring applications. Theoretically,

a biologically inspired spiking cortical model named pulse coupled neural network

(PCNN) was intensively studied and successfully applied to the specific aerial image

analysis application. Some novel algorithms in object detection and feature fusion

were also developed. The concepts proved in this thesis and the knowledge gained

from this research project offers a good reference to our industry partner and other

energy utilities who wants to improve their vegetation management activities. This

chapter summarizes the results and discussions by concluding the findings and making

recommendations for possible further research.

6.1 Summary of Findings and Contributions

The major findings and contributions of this thesis are summarized as follows:

• Vegetation management using aerial remote sensing techniques

This thesis presented a comprehensive study of vegetation management approaches in

power line corridor monitoring based on aerial remote sensing techniques. The results

125

from a series of experiments demonstrated the potential of moving from traditional

vegetation management strategy to a more automated, accurate and cost-effective so-

lution using aerial remote sensing techniques. Regarding aerial platforms, unmanned

aerial systems (UASs) are supposed to be a future solution but the major limitation

of using UASs is their current ability to carry power-demanding and heavy payloads.

Regarding sensor options, the combination of high resolution multi-spectral camera

and LiDAR sensor is highly appreciated in data collection due to their complementary

nature of spectral and 3D geometry information.

• Simplification of biologically-inspired spiking cortical models and application to

aerial image analysis

As a biologically inspired spiking cortical model, the pulse coupled neural network

(PCNN) emulates the process of visual cortex and are recognized as powerful tools

for many image processing tasks. The major advantages of PCNNs are that they are

self-organized spatial-temporal-coding models which mimic real neurons better and

have more powerful computation performance than traditional neural network models

due to the use of time. In order to better analyze the pulse dynamics and control the

performance, the original PCNN model was simplified in this thesis. This developed

model was applied to tree crown segmentation from multi-spectral imagery and also

used to design the pulse coupled neural filter as a pre-processing tool for power line

detection from aerial imagery. The idea of using pulse spectral frequency of a multi-

spectral unit-linking PCNN for spectral-texture feature extraction gained success in

a serial of experiments on texture classification and tree species classification.

• Multi-source information fusion

Multi-source information fusion plays an important role to increase the reliability of

the information extraction for robust operational performance and decision making in

power line corridor monitoring (e.g. improved classification, increased confidence and

126

reduced ambiguity). In this thesis, the fusion of multi-sensor data and multiple feature

vectors have been investigated. The integration of LiDAR data and multi-spectral

imagery was suggested for data collection due to the complementary information

contained the two types of sensor data and the data fusion showed the great advantage

in improving objection results, although it depends on the point density of LiDAR

data. The use of colour (spectral) and texture information collectively has strong

links with the human perception. In this thesis, feature fusion was investigated and

the developed method effectively combined colour and texture as a unified descriptor

and the experimental results demonstrated the improvement of the discriminative

power over using colour and texture features independently.

6.2 Future Work

The efforts and achievements in this thesis offer important knowledge in using bi-

ologically inspired image processing to assist vegetation management in power line

corridors. Possibilities for further improving the developed methods were also identi-

fied and recommended as future research directions:

• Real-time data processing

Currently, processing for object recognition has been designed for off-line use. Data

is collected, stored in a repository and analyzed at some later time. However, some

real-time processing would be beneficial as there is information available to assist in

the decisions made by the navigation and data storage systems. For example, the real-

time identification of regions outside the immediate vicinity of the power-line where

vegetation is sparse could be used to reduce the resolution of data stored. Questions

concerning the algorithms and computing architectures that are best suited to increase

the autonomy of a UAS capturing significant amounts of data need to be addressed. It

should be noted that PCNN has only local connections which make it quite plausible

127

for electronic implementation. Therefore, a hardware implementation (e.g. using

FPGA) of the developed algorithms in this thesis will be a possible direction to

increase the autonomy of the system.

• 3D feature extraction from LiDAR data

LiDAR is a relatively young 3D measurement technique offering much potential in

the acquisition of precise 3D geodata and object geometries and it does not reply on

illumination conditions. Therefore, LiDAR is highly recommended for data collection

in power line corridor monitoring. Meanwhile, advanced and intelligent algorithms

need to be developed for accurate 3D feature extraction from LiDAR data. For

example, LiDAR data provides 3D information and make it possible to detect multi-

layer structured power lines and model the 3D catenary curve. Moreover, the 3D tree

structure parameters may also be helpful for identifying the species.

128

References

Aardt, Jan A.N. van. 2000. Spectral separability among six southern tree species.

Ph.D. thesis.

Agganval, Nitin, & Karl, William Clem. 2000. Line detection in images through

regularized Hough transform.

Aggarwal, Nitin, & Karl, William Clem. 2006. Line detection in images through

regularized Hough transform. IEEE Transactions on Image Processing, 15(3), 582–

591.

Appelt, Paul J., & Goodfellow, John W. 2004. Research on how trees cause

interruptions- applications to vegetation management.

Bao, Yunfei, Li, Guoping, Cao, Chunxiang, Li, Xiaowen, Zhang, Hao, He, Qisheng,

Bai, Linyan, & Chang, Chaoyi. 2008. Classification of Lidar Point Cloud and

Generation of DTM from LiDAR Height and Intensity Data in Forested Area.

Bar-Cohen, Yoseph. 2006. Biomimetics: Biologically Inspired Technologies. Taylor &

Francis.

Bartels, Marc, Wei, Hong, & Mason, David C. 2006. DTM generation from LIDAR

data using skewness balancing.

Baudat, G., & Anouar, F. 2000. Generalized discriminant analysis using a kernel

approach. Neural Computation, 12(10), 2385–2404.

129

Beck, Keith, & Mathieu, Renaud. 2004. Can power companies use space patrols to

monitor transmission corridors?

Beltrame, Alessandra M. Knopik, Jardini, Mauricio G. M., acbsen, Rogerio M., &

uintanilha, Jose Alberto. 2007. Vegetation identification and classification in the

domain limits of powerlines in Brazilian Amazon forest.

Beraldin, A.-Angelo, Blais, Francois, & Lohr, Uwe. 2010. Airborne and Terrestrial

Laser Scanning. Taylor & Francis. Chap. 1, pages 1–39.

Berni, Jose A. J., Zarco-Tejada, Pablo J., SuÃ¡rez, Lola, & Fereres, Elias. 2009.

Thermal and Narrowband Multispectral Remote Sensing for Vegetation Monitoring

From an Unmanned Aerial Vehicle. IEEE Transactions on Geoscience and Remote

Sensing, 47(3), 722–738.

Bhatt, Rushi, Carpenter, Gail A., & Grossberg, Stephen. 2007. Texture segregation

by visual cortex: Perceptual grouping, attention, and learning. Vision Research,

47(25), 3173–3211.

Blaschke, T. 2010. Object-based image analysis for remote sensing. ISPRS Journal

of Photogrammetry & Remote Sensing, 65(1), 2–16.

Bleau, AndrÂŽe, & Leon, L. Joshua. 2000. Watershed-based segmentation and region

merging. Computer Vision and Image Understanding, 77(3), 317–370.

Brandtberga, Tomas, Warnera, Timothy A., Landenbergerb, Rick E., & McGraw,

James B. 2003. Detection and analysis of individual leaf-off tree crowns in small

footprint, high sampling density lidar data from the eastern deciduous forest in

North America. Remote Sensing of Environment, 85, 290–303.

Breiman, Leo. 2001. Random forests. Machine Learning, 45(1), 5–32.

130

Brook, A., Kimmel, R., & Sochen, N.A. 2005. Variational segmentation for color

images.

Camastra, Francesco, & Vinciarelli, Alessandro. 2008. Machine learning for audio,

image and video analysis: theory and applications. Springer.

Cao, Bin, Shen, Dou, Sun, Jian-Tao, Yang, Qiang, & Chen, Zheng. 2007. Feature

selection in kernel space. In: Proceedings of the 24th International Conference on

Machine Learning.

Chust, Guillem, Galparsoro, Ibon, Borja, Angel, Franco, Javier, & Uriarte, Adolfo.

2008. Coastal and estuarine habitat mapping, using LIDAR height and intensity

and multi-spectral imagery. Estuarine, Coastal and Shelf Science, 78, 633–643.

Clausi, David A., & Deng, Huang. 2005. Design-based texture feature fusion us-

ing Gabor filters and co-occurrence probabilities. IEEE Transactions on Image

Processing, 14(7), 925–936.

Clode, Simon, & Rottensteiner, Franz. 2005. Classification of trees and powerlines

from medium resolution airborne laserscanner data in urban environments.

Coburn, C.A., & Roberts, A.C.B. 2004. A multiscale texture analysis procedure

for improved forest stand classification. International Journal of Remote Sensing,

25(20), 4287–4308.

Comer, M., & Delp, E. 1999. Segmentation of textured images using a multiresolution

Gaussian autoregressive models. IEEE Transactions on Image Processing, 8(3),

408–420.

Culvenor, D.S. 2003. Extracting individual tree information: a survey of techniques

for high spatial resolution imagery. Boston: Kluwer Academic Publishers. Chap. 9,

pages 255–277.

131

David H. Hubel, Torsten N. Wiesel. 1998. Early exploration of the visual cortex.

Neuron, 20, 401–412.

Davies, E. R. 2009. Introduction to texture analysis. London, UK: Imperial College

Press. Pages 1–31.

Davis, Jesse, & Goadrich, Mark. 2006. The relationship between Precision-Recall and

ROC curve. Pages 233–240 of: The 23rd International Conference on Machine

Learning.

Deng, Yining, & Manjunath, B.S. 2001. Unsupervised segmentation of color-texture

regions in images and video. IEEE Transactions on Pattern Analysis and Machine

Intelligence, 23(8), 800–810.

Eckhorn, R., Reiboeck, H.J., Arndt, M., & Dicke, P.W. 1989. A neural network for

feature linking via synchronous activity: Results from cat visual cortex and from

simulations. Cambridge: Cambridge University Press. Pages 255–272.

Erikson, M., & Olofsson, K. 2005. Comparison of three individual tree crown detection

methods. Machine Vision and Applications, 16(4), 258–265.

Erikson, Mats. 2003. Segmentation of individual tree crowns in colour aerial pho-

tographs using region growing supported by fuzzy rules. Canadian Journal of

Forest Research, 33(8), 1557–1563.

Fernandes, Leandro A.F., & Oliveira, Manuel M. 2008. Real-time line detection

through an improved Hough transform voting scheme. Pattern Recognition, 41,

299–314.

Gerstner, Wulfram. 2001. Pulsed Neural Networks. MIT Press. Chap. 1 Spiking

Neurons, pages 3–54.

132

Golightly, I., & Jones, D. 2005. Visual control of an unmanned aerial vehicle for

power line inspection.

Gu, Xiaodong. 2008. Feature Extraction using Unit-linking Pulse Coupled Neural

Network and its Applications. Neural Process Letters, 27, 25–41.

Gurtner, Alex, Greer, Duncan G., Glassock, Richard R., Mejias, Luis, Walker, Rod-

ney, & Boles, Wageeh W. 2009. Investigation of fish-eye lenses for small-UAV

aerial photography. IEEE Transactions on Geoscience and Remote Sensing, 47(3),

709–721.

Haralick, R.M., Shanmugam, K., & Dinstein, I. 1973. Textural features for image

classification. IEEE Transactions on Systems, Man, and Cybernetics, 34(3), 610–

621.

Hay, G. Castilla G.J. 2008. Image objects and geographic objects. Springer. Pages

91–110.

Hay, G.J., & Castilla, G. 2008. Geographic Object-Based Image Analysis (GEOBIA):

A new name for a new discipline. Springer. Pages 75–89.

Hsu, Chih-Wei, & Lin, Chih-Jen. 2002. A comparison of methods for multiclass

support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.

Hsu, Chih-Wei, Chang, Chih-Chung, & Lin, Chih-Jen. 2008. A practical guide to SVM

classification (Technical Report). Tech. rept. Department of Computer Science,

National Taiwan University.

Huang, Xin, Zhang, Liangpei, & Li, Pingxiang. 2008. A multiscale feature fusion

approach for classification of very high resolution satellite imagery based on wavelet

transform. International Journal of Remote Sensing, 29(20), 5923–5941.

133

Huete, A. R. 1988. A soil-adjusted vegetation index (SAVI). Remote Sensing of

Environment, 25, 295–309.

Ituen, I., Sohn, G., & Jenkins, A. 2008. A case study: workflow analysis of power

line systems for risk management. In: International Archives of Photogrammetry

and Remote Sensing, vol. 37.

Japkowicz, Nathalie. 2006. Why question machine learning evaluation method? Pages

6–11 of: AAAI workshop on Evaluation Methods for Machine Learning. AAAI

Press.

Jensen, John R. 2005. Classification based on object-oriented image segmentation.

Pearson Education. Pages 393–398.

Jenson, John R. 2005. Thematic information extraction: pattern recognition. Pearson

Education. Pages 337–406.

Jiang, Zhangyan, Huete, Alfredo R., Didan, Kamel, & Miura, Tomoaki. 2008. De-

velopment of a two-band enhanced vegetation index without a blue band. Remote

Sensing of Environment, 112, 3833–3845.

Johnson, John L. 1994. Pulse-coupled neural nets: translation, rotation, scale, distor-

tion and intensity signal invariance for images. Applied Optics, 33(26), 6239–6253.

Johnson, John L., & Padgett, Mary Lou. 1999. PCNN models and applications. IEEE

Transactions on Neural Networks, 10(3), 480–498.

Jones, D., Golightly, I., Roverts, J., Usher, K., & Earp, G. 2005. power line inspection

- a uav concept.

Jordan, C. F. 1969. Derivation of leaf area index from quality of light on the forest

floor. Ecology, 50, 663–666.

134

Kobayashi, Yoshihiro, Karady, George G., Heydt, Gerald Thomas, & Olsen, Robert G.

2009. The utilization of satellite images to identify trees endangering transimission

lines. IEEE Transactions on Power Delivery, 24(3), 1703–1709.

Koch, B., Heyder, U., & Weinacker, H. 2006. Detection of individual tree crowns

in airborne lidar data. Photogrammetric Engineering & Remote Sensing, 72(4),

357–363.

Kuntimad, G., & Ranganath, H. S. 1999. Perfect image segmentation using pulse

coupled neural networks. IEEE Transactions on Neural Networks, 10(3), 591–598.

Lefsky, Michael A., & Cohen, Warren B. 2003. Selection of remotely sensed data.

Dordrecht: Kluwer Academic Publishers. Pages 13–46.

Li, Stan Z. 2009. Markov random field modeling in image analysis (3rd Edition).

Springer.

Li, Zhengrong, Hayward, Ross, Zhang, Jinglan, & Liu, Yuee. 2008. Individual

tree crown delineation techniques for vegetation management in power line cor-

ridor. Pages 148–154 of: Digital Image Computing: Techniques and Applications

(DICTA).

Liao, S., Law, Max W.K., & Chung, Albert C.S. 2009. Dominant local binary patterns

for texture classification. IEEE Transactions on Image Processing, 18(5), 1107–

1118.

Lindblad, T., & Kinser, J.M. 2005. Image processing using pulse-coupled neural net-

works. Second edn. Springer.

Lindblad, Thomas, & Kinser, Jason M. 1999. Inherent features of wavelets and pulse

coupled networks. IEEE Transactions on Neural Networks, 10(3), 607–614.

135

Liu, Ying, Zhang, Dengsheng, Lu, Guojun, & Ma, Wei-Ying. 2006a. Study on tex-

ture feature extraction in region-based image retrieval system. Pages 264–271 of:

Proceedings of International Conference on Multimedia Modelling.

Liu, Yongxue, Li, Manchun, Mao, Liang, Xu, Feifei, & Huang, Shuo. 2006b. Review

of remotely sensed imagery classification patterns based on object-oriented image

analysis. Chinese Geographical Science, 16(3), 282–288.

Loncaric, Sven. 1998. A survey of shape analysis techniques. Pattern Recognition,

31(8), 983–1001.

Lu, D., & Weng, Q. 2007. A survey of image classification methods and techniques

for improving classification performance. International Journal of Remote Sensing,

28(5), 823–870.

Lu, M. L., & Kieloch, Z. 2008. Accuracy of transmission line modeling based on aerial

LiDAR survey. IEEE Transactions on Power Delivery, 23(3), 1655–1663.

Ma, Yide, Li, Lian, Wang, Yafu, & Dai, Ruolan. 2005. Principle and applications of

pulse-coupled neural networks. Beijing: Science Press.

Mandelbrot, B. B. 1983. The fractral geometry of nature. New York: W.H. Freeman.

Manjunath, B.S., & Ma, W.Y. 1996. Texture features for browsing and retrieval of

image data. IEEE Transactions on Pattern Analysis and Machine Intelligence,

18(8), 837–842.

Meng, Xuelian, Wang, Le, Silvan-Cardenas, Jose Luis, & Currit, Nate. 2009. A

multi-directional ground filtering algorithm for airborne LIDAR. ISPRS Journal

of Photogrammetry & Remote Sensing, 64, 117–124.

Meyer, Fernand. 1994. Topographic distance and watershed lines. Signal Processing,

38(1), 113 – 125.

136

Mills, Henny. 2008. Analysis of The transferability of support vector machines for veg-

etation classification. The International Archives of the Photogrammetry, Remote

Sensing and Spatial Information Sciences, XXXVII, 557–563.

Mitchell, Tom. 1997. Machine Learning. McGraw Hill.

Mountrakis, Giorgos, Im, Jungho, & Ogole, Caesar. 2011. Support vector machines in

remote sensing: a review. ISPRS Journal of Photogrammetry & Remote Sensing,

in press.

Myneni, Ranga B., Hall, Forrest G., Sellers, Piers J., & Marshak, Alexander L. 1995.

The interpretation of spectral vegetation Indexes. IEEE Transactions on Geo-

science and Remote Sensing, 33(2), 481–486.

Najim, K. 2004. Stochastic processes : estimation, optimization, and analysis. Lon-

don: Kogan Page Science.

Ng, Jeffrey, Bharath, Anil A., & Zhaoping, Li. 2007. A survey of architecture and

function of the primary visual cortex (V1). EURASIP Journal on Advances in

Signal Processing, 2007, 124–141.

Ojala, Timo, Pietikainen, Matti, & Maenpaa, Topi. 2002. Multiresolution grey-scale

and rotation invariant texture classification with local binary patterns. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.

Polyak, Stephen. 1957. The vertebrate visual system. The University of Chicago Press.

Ranganath, H. S., & Kuntimad, G. 1999. Object detection using pulse coupled neural

networks. IEEE Transactions on Neural Networks, 10(3), 615–620.

Rautiainen, Miina. 2005. The spectral signature of coniferous forests: the role of stand

structure and leaf area index. Ph.D. thesis.

137

Rouse, J. W., Haas, R. H., Schell, J. A, & Deering, D. W. 1973. Monitoring vegetation

systems in the great plains with ERTS. Pages 309–317 of: The 3rd Earth Resources

Technology Satellite-1 Symposium. Scientific and Technical Information Office.

Scholkopf, Bernhard, & Smola, Alexander J. 2001. Learning with kernels. The MIT

Press.

Schwarz, Michael W., Cowan, William B., & Beatty, John C. 1987. An experimental

comparison of RGB, YIQ, LAB, HSV, and opponent color models. ACM Transac-

tions on Graphics, 6(2), 123–158.

Sherrod, Phillip H. 2009. DTREG predictive modeling software (Users Manual).

Shi, Weiren, Li, Zuojin, & Shi, Xin. 2009. A survey of biologically inspired image

processing for objects recognition. International Journal of Image and Graphics,

9(4), 495–509.

Soille, P. 2003. Morphological image analysis. Springer. Pages 189–209.

Song, Dongjin, & Tao, Dacheng. 2010. Biologically inspired feature manifold for scene

classification. IEEE Transactions on Image Processing, 19(1), 174–184.

Stewart, Robert D., Fermin, Iris, & Opper, Manfred. 2002. Region growing with

pulse-coupled neural networks: an alternative to seeded region growing. IEEE

Transactions on Neural Networks, 13(6), 1557–1562.

Stricker, Markus, & Orengo, Markus. 1995. Similarity of color images. Pages 381–392

of: SPIE Conference on Storage and Retrieval for Image and Video Databases, vol.

2420.

Sun, Changming, Jones, Ronald, Wu, Hugues Talbot Xiaoliang, Cheong, Kevin,

Beare, Richard, Buckley, Michael, & Berman, Mark. 2006. Measuring the distance

138

of vegetation from powerlines using stereo vision. ISPRS Journal of Photogramme-

try & Remote Sensing, 60(4), 269–283.

Turner, M. 1986. Texture discrimination by Gabor functions. Biological Cybernetics,

55, 71–82.

Vreeken, Jilles. 2003. Spiking neural networks, an introduction. Tech. rept. Utrecht

University.

Waldemark, Karina, Lindblad, Thomas, Becanovic, Vlatko, Guillen, Jose L.L., &

Klingner, Phillip L. 2000. Patterns from the sky Satellite image analysis using

pulse coupled neural networks for pre-processing, segmentation and edge detection.

Pattern Recognition Letters, 21, 227–237.

Wang, Yakun, Soh, Young Sung, & Schultz, Howard. 2006. Individual tree crown

segmentation in aerial forestry images by mean shift clustering and graph-based

cluster merging. International Journal of Computer Science and Network Security,

6(11), 40–45.

Wang, Zhaobin, Ma, Yide, Cheng, Feiyan, & Yang, Lizhen. 2010. Review of pulse-

coupled neural networks. Image and Vision Computing, 28, 5–13.

Watt, A., & Polocarpo, F. 1998. Image segmentation. Addison Wesley Longman

Limited. Pages 290–311.

Whelan, Paul F., & Ghita, Ovidiu. 2009. Color texture analysis. Imperial College

Press. Pages 129–164.

Wood, David. 2007. Risk Assessment for Rare & Threatened species near Ergon

powerlines. Tech. rept. Ergon Energy.

Xie, Xianghua, & Mirmehdi, Majid. 2009. A galaxy of texture features. Imperial

College Press. Pages 375–406.

139

Xie, Xudong, & Lam, Kin-Man. 2006. Gabor-based kernel PCA with doubly nonlinear

mapping for face recognition with a single face image. IEEE Transactions on Image

Processing, 15(9).

Yan, Gao, Mas, J.F., Maathuis, B.H.P., Zhang, Xiangmin, & Dijk, PM. Van. 2006.

Comparison of pixel-based and object-oriented image classification approaches - a

case study in a coal fire area, Wuda, Inner Mongolia, China. International Journal

of Remote Sensing, 27(18), 4039–4055.

Yan, Guangjian, Li, Chaoyang, Zhou, Guoqing, Zhang, Wuming, & Li, Xiaowen.

2007. Automatic extraction of power lines from aerial images. IEEE Geoscience

and Remote Sensing Letters, 4(3), 387–391.

Yang, Jian, Yang, Jing-yu, Zhang, David, & Lu, Jian-feng. 2003. Feature fusion:

parallel strategy vs. serial strategy. Pattern Recognition, 36, 1369–1381.

Yoon, Jong-Suk, Shin, Jung-Il, & Lee, Kyu-Sung. 2008. Land Cover Characteristics

of Airborne LiDAR Intensity Data: A Case Study. IEEE Geoscience and Remote

Sensing Letters, 5(4), 801–805.

Zeki, Semir. 1993. A vision of the brain. London: Blackwell Scientific.

Zhan, Kun, Zhang, Hongjuan, & Ma, Yide. 2009. New Spirking Cortical Model for

Invariant Texture Retrieval and Image Processing. IEEE Transactions on Neural

Networks, 20(12), 1980–1986.

Zhang, Junying, Dong, Jiyang, & Shi, Meihong. 2005. An adaptive method for image

filtering with pulse-coupled neural networks.

Zhang, Liangpei, Zhao, Yindi, Huang, Bo, & Li, Pingxiang. 2008. Texture feature

fusion with neighborhood oscillating tabu search for high resolution image. Pho-

togrammetric Engineering & Remote Sensing, 74(12), 1585–1596.

140

Zhang, Y. 2006. An overview of image and video segmentation in the last 40 years.

Hershey: IRM Press. Pages 1–15.

Zhou, Guoqing, Ambrosia, Vince, Gasiewski, Albin J., & Bland, Geoff. 2009. Fore-

word to the Special Issue on Unmanned Airborne Vehicle (UAV) Sensing Systems

for Earth Observations. IEEE Transactions on Geoscience and Remote Sensing,

47(3), 687–689.

Zhu, Guobin, & Blumberg, Dan G. 2002. Classification using ASTER data and SVM

algorithms; The case study of Beer Sheva, Israel. Remote Sensing of Environment,

80(2), 233–240.

141

Aerial Image Analysis Using Spiking Neural Networks ... - CORE

Documents