Decomposing Waveform Lidar for Individual Tree Species Recognition Nicholas Vaughn A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2011 Program Authorized to Offer Degree: School of Forest Resources
180
Embed
Decomposing Waveform Lidar for Individual Tree Species Recognition
A PhD dissertation by Nicholas Vaughn at the University of Washington
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Decomposing Waveform Lidar for Individual Tree Species
Recognition
Nicholas Vaughn
A dissertation submitted in partial fulfillment ofthe requirements for the degree of
Doctor of Philosophy
University of Washington
2011
Program Authorized to Offer Degree: School of Forest Resources
University of WashingtonGraduate School
This is to certify that I have examined this copy of a doctoral dissertation by
Nicholas Vaughn
and have found that it is complete and satisfactory in all respects,and that any and all revisions required by the final
examining committee have been made.
Chair of the Supervisory Committee:
Eric C. Turnblom
Reading Committee:
L. Monika Moskal
David G. Briggs
Eric C. Turnblom
Date:
In presenting this dissertation in partial fulfillment of the requirements for the doctoraldegree at the University of Washington, I agree that the Library shall make itscopies freely available for inspection. I further agree that extensive copying of thisdissertation is allowable only for scholarly purposes, consistent with “fair use” asprescribed in the U.S. Copyright Law. Requests for copying or reproduction of thisdissertation may be referred to Proquest Information and Learning, 300 North ZeebRoad, Ann Arbor, MI 48106-1346, 1-800-521-0600, to whom the author has granted“the right to reproduce and sell (a) copies of the manuscript in microform and/or (b)printed copies of the manuscript made from microform.”
Signature
Date
University of Washington
Abstract
Decomposing Waveform Lidar for Individual Tree Species Recognition
Nicholas Vaughn
Chair of the Supervisory Committee:Associate Professor Eric C. Turnblom
School of Forest Resources
The potential of waveform Lidar is investigated in a series of three articles. In the first,
a new approach is found to capture patterns within waveforms using an old technique:
the Fourier transform. The mean spectral pattern between waveforms hitting an
individual tree is found to aid in discriminating species. Using the full dataset, an
overall accuracy of 75 percent is achieved using a classification tree approach for 44
sample trees of 3 hardwood species native to the Pacific Northwest of the United
States. Important wavelengths within the waveforms include 1.5, 0.75, and 0.35
meters.
In a second article, the ability of the above technique to classify species using
datasets of lower densities is analyzed. From the original dataset with approximately
10 waveforms/pulses crossing a square meter at ground level (equivalent to a first
and last return discrete point dataset of about 20 points per square meter), reduced
datasets were created at 80, 60, 40 and 20 percent of the original density. The
classification was then performed at each density level. Reducing the density to
80 percent actually increased the overall accuracy to 82 percent, while subsequent
reductions reduced the accuracy to 61, 54, and 66 percent respectively for the 60, 40
and 20 percent reduced datasets.
A third article compares a combination of several variables obtained from a discrete
point Lidar dataset before and after the addition of variables obtained from waveform
Lidar data. The addition of waveform information aided in the classification of five
species, as well as in the classification of several two-species subsets. Performance
of small groups of similar discrete point Lidar-derived variables varied much more
between different species combinations, but when grouped they performed very well
in all combinations. These results provide some suggestive evidence that fine-scale
waveform Lidar information is important to classification of at least some tree species.
6.2 Default parameter values of the segmentation algorithm . . . . . . . . 93
6.3 Summary statistics of the predictor variables in group a . . . . . . . . 101
6.4 Summary statistics of the predictor variables in group b . . . . . . . . 102
6.5 Summary statistics of the predictor variables in group c . . . . . . . . 103
6.6 Summary statistics of the predictor variables in group d . . . . . . . . 103
6.7 Summary statistics of the predictor variables in group e . . . . . . . . 104
6.8 Summary statistics of the predictor variables in group f . . . . . . . . 104
6.9 Summary statistics of the predictor variables in group h . . . . . . . . 105
6.10 Summary statistics of the predictor variables in group i . . . . . . . . 106
6.11 Summary statistics of the predictor variables in group j . . . . . . . . 107
iv
GLOSSARY
ATM: Airborne Thematic Mapper, an aerial multispectral sensor
CASI(-2): Compact Airborne Spectrographic Imager, an aerial multispectral sensor
CIR: Color-InfraRed, photographic film with layers which reproduce infrared asred, red as green, and green as blue
CHM/CSM: Canopy Height (or Surface) Model,which is a DSM of canopy surfaceheight above ground
DSM/DEM: Digital Surface (or Elevation) Model, a two-dimensional raster modelof surface elevation
DTM: Digital Terrain Model, which describes a DSM of the bare-earth elevationabove a specified geoid
GPS: Global Positioning System, a system of satellites with known position usedto triangulate a two- or three-dimensional position of an object in near real-time
HSI/HSV/HSL: color models, often used in computer graphics, which provide al-ternatives to the red, green, blue (RGB) color model that attempt to moreaccurately match how the human eye perceives color
HYPERSPATIAL: geospatial data with a high spatial resolution, i.e. minimal pixelwidth and height in the case of raster data
HYPERSPECTRAL: geospatial data with a large number of bands covering a largerange of the electromagnetic spectrum
HYPERTEMPORAL: geospatial data with two to several repeated measurementsacross a range of time
IKONOS: A commercial satellite system offering multispectral imagery
v
IMU: Inertial Measurement Unit, an instrument using accelerometers and gyro-scopes to measure angular rotation and speed of change in angular rotation.
LDA: Linear Discriminant Analysis, a parametric classification rule assuming amultivariate normal distribution for the predictor variables
LIDAR/LIDAR/LIDAR: Light Detection And Ranging
MEIS: Multispectral Electro-optical Imaging Sensor, an aerial multispectral sensor
ML: Maximum Likelihood, a parametric classification rule
MULTISPECTRAL: geospatial data with a limited number of bands covering a rangeof the electromagnetic spectrum, typically referring to data with more than justred, green and blue components
PCA: Principal Component Analysis, a data dimension reduction technique
RASTER: a data set consisting of an indexed array (usually two-dimensional) ofcells, each cell containing a single value representing the variable of interest forthe indexed area in space
SAM: Spectral Angle Mapper, a non-parametric classification rule
SVM: Support Vector Machine, a non-parametric classification rule
vi
ACKNOWLEDGMENTS
I would like to thank those, without whom completion of this dissertation would
not have been possible. Financial support was provided for several quarters by both
the Precision Forestry Cooperative and Corkery family. The waveform Lidar dataset
was provided by Terrapoint, USA.
Dave Ward and Craig Glennie at Terrapoint provided much technical assistance
when transforming the raw waveform and position data into map coordinates. Bob
McGaughey and Steve Reutebuch, at the Pacific Northwest Research Station also
provided their expertise in this area.
My committee has provided endless feedback and motivational support along the
way to completion of this work and I have learned much from working with and asking
questions of each member. Without this support, I would not have even known where
to begin this endeavor.
Sooyoung Kim collected the original data used in part of this work and was gra-
cious enough to provide it for my work. Andrew Hill was very helpful in the remea-
surement and validation of this data. He continued to help even after I dragged a
sample of canine digestive output into his vehicle on my boot.
Other students in the RSGAL at University of Washington have helped in different
ways. Jeff Richardson answered my many questions about specific tools. I gained from
the experience of my comrades in many aspects of remote sensing work.
Friends and family provided along with moral support, occasional babysitting
which was very important to both my sanity and completion of my work. Maria
Petrova, provided much appreciated, last-minute babysitting service in order for me
vii
to give an important presentation. Both grandma Judys were very instrumental in
getting me time to work during their visits.
My wife, Jilleen, made sure to work hard watching the boys after long, hard days
at work in order to get me essential work time. Her constant support was the biggest
reason I could even make it to the finish line (physically and mentally). My boys, Reid
and Zack, reminded me to enjoy my time with them while I have it. They provided
much entertainment and a great way to get my mind off of work for a while.
viii
DEDICATION
To my amazing wife, Jilleen and our two boys for
providing me with unending support and inspiration
ix
1
Chapter 1
INTRODUCTION AND LAYOUT
1.1 Introduction
1.1.1 Species Recognition
As in many applications of modern technology to long-standing occupations, there is
a great potential to increase either the speed or quality, as well as reduce the costs of
forest inventory. The entire field of precision forestry developed very rapidly around
this idea. Such outcomes are highly needed in modern forestry, where providing a
steady flow of merchantable timber is no longer the only objective of a manager. The
number of competing objectives is continually increasing, especially on public forests,
and modern tools can provide the greater detail necessary to make optimal decisions
that balance these objectives.
There are several situations where remote sensing of forest lands can be of great
use. In forest inventory, for instance, the greater detail and coverage available from
remote sensing products can improve efficiency of estimates of almost any quantity of
interest. This can be as simple as stratifying an area of interest into closely related
compartments based on aerial photography of the area. Such stratification has been
used for many years to reduce overall variance in the estimates.
Modern remote sensing products are now of sufficient resolution and accuracy to
push such stratification to a new extreme. From modern products, highly accurate
positional information is known for any given subset of the data. Additionally, the
spatial resolution available is incredibly high compared to just a decade ago. Enabled
by these two important properties, an approach wherein statistics are compiled for
individual tree crowns should offer even greater efficiency. Most stand-level statistics
2
estimated from field-data have always been derived from at least some individual-tree
measurements because this detail was available. Until recently, the same has not been
possible for estimates based on remote sensing data.
While not yet readily obtained, an accurate estimate of the species of individual
trees represented in remotely sensed data alone would provide even further benefits in
many situations. Timber value is determined by species, and the models predicting
tree size from remotely sensed data differ greatly by species. Knowledge of species
could improve accuracy and precision by incorporating these differences. Models not
directly measuring tree species from the remote sensing data are augmented by models
based on the relative abundance of the individual species within the field data. With
great enough accuracy, direct prediction of species for each crown should allow for
better estimates of the other variables of interest in an inventory.
If timber values are not of interest, several additional applications can easily be
found where species information would be tremendously useful. One could more easily
predict habitat quality for plants and animals associated with particular stand types.
Risk assessment for fire or beetle outbreaks would benefit from species detection as
mapping high-risk areas could quickly be done. Additionally, changes in site condi-
tions often result in changes in species composition, and discovery of such changes
could be a rather straightforward process.
Researchers have been working on automatic individual tree species detection for
more than twenty years. In publications from this time period, improvements can be
observed as data sources and algorithms have improved. Several types of data have
been incorporated to achieve accurate species detection. These types include high
spatial resolution imagery, as well as still-improving hyperspectral data. Alongside
imagery, however, a newer form of three-dimensional remote sensing data has changed
the game significantly.
3
1.1.2 Lidar
Lidar comes from the phrase “Light Detection and Ranging”, which is the formal name
of a modern remote sensing system. This system takes advantage of the constant
speed of light emitted from an highly directional beam to record the distance to
targets within the beam’s path. A Lidar system typically emits a very large number
of pulses while rapidly changing the direction of the light beam. When the location
of the mobile instrument, as well as the orientation of the light beam are accurately
known, the system can create a large array of points representing surfaces of high
specular reflectivity along the known paths from all pulses of the directional beam.
This is called discrete point Lidar data, though it is often just referred to as Lidar.
Discrete point Lidar is relatively common today in many fields, however it has only
recently become popular in some areas for forest inventory. Lidar (airborne) data is
gathered in the following manner:
• An aircraft carrying the sensor is flown in overlapping strips over the area of
interest. The widths of these strips are determined by the range of angles taken
by the light beam and the height of craft.
• The sensor is paired with a differential Global Positioning System (GPS) unit
and an Inertial Measurement Unit (IMU), the combination of which enables
precise measurement of aircraft position and orientation at a given time.
• The sensor contains a directional light source with very low divergence, and the
angle of this beam can be precisely measured at a given time.
• The light pulses at a very high rate, currently up to several hundred thousand
times a second.
• For each pulse the sensor measures the intensity of light reflected back from the
target (ground or vegetation) and marks the time at which peaks are found in
4
this return signal.
• The time of the pulse and the exact travel time of the light to the given peak
can be used to precisely locate the point in space that deflected the light pulse
back to the sensor (called a return).
• The collection of these returns, each containing an X, Y, and Z component and
commonly a peak returned intensity component, is cleaned and subsequently
provided to the client as a final dataset.
The three-dimensional point cloud received by the client is typically used to create
a digital surface/elevation model (DSM/DEM) of ground elevation above a specified
geoid. Those in the forestry sector then will usually create a digital surface model of
the canopy surface (CSM) above the ground elevation, often called a canopy height
model (CHM). The properties of this CHM allow tree heights and crown volumes to
be estimated, both of which can be linked to tree stem diameter and volume.
Some authors have used these data to identify species of individual trees. The
arrangement of the points from each tree can give many descriptive statistics that
indicate the shape of the crown. Additionally, the intensity information of the points
can be summarized in a form that is affected by tree species. There has been some
great success for certain species using this method, but there will always be room for
improvement, especially if the number of species increases.
The use of Lidar information alone for species detection, without incorporating
multispectral imagery for instance, would be ideal. If this can be realized, then one
dataset could be used for all aspects of a forest inventory with outstanding precision.
To aide in this process, a newer format of Lidar now exists, called waveform (or
fullwave) Lidar that contains additional information which may be able to improve
the classification accuracy of individual trees.
5
1.1.3 Waveform Lidar
Waveform Lidar is recorded using most of the same steps and mostly the same instru-
mentation as discrete point Lidar. The difference is in how the return signal of light
reflecting off the ground and vegetation surface is recorded. Rather than detecting
peaks in this signal, the sensor rapidly samples the signal at a set rate. In this way,
the return signal is mapped in the same way that sound is digitally recorded.
This difference is an important one as the shape of the entire return signal is
known rather than just the points where peaks occur. Some work has shown that
the shape of a peak is related to surface properties of the target. A few researchers
have broken down the waveforms into series of peaks, for which individual peak shape
information is retained. Smaller peaks that would be skipped by the on-board peak
detector in discrete point systems can now be discovered. At minimum, this results
in a discrete point dataset of very high density. Theoretically, the extra information
contained in the peak shape information is useful for target discrimination.
1.1.4 Objectives
The objective of this dissertation work was to investigate whether or not waveform
Lidar contains additional information over that contained in discrete point Lidar that
can be used to improve individual tree species recognition.
1.2 Layout of Thesis
This dissertation is a bound collection of four individual articles, appearing as chapters
2, 3, 4, and 5. Chapter 2 contains the first article, which is an extensive literature
review. This review was intended to be a stand-alone work with the potential to be
published as a separate work. Chapter 6 contains a more detailed description of many
of the methods and materials used in the three technical articles that follow. The
intent of this overview is to provide the missing detail that often must unfortunately
6
be trimmed from articles to meet publication constraints.
The next three chapters contain the technical articles that are the main focus
of this thesis. Chapter 3 is an article published in the December 2010 issue of Re-
mote Sensing Letters. It is a brief article describing some preliminary test of a new
technique for analyzing waveform Lidar information. Chapter 4 contains an article
that extends the results of this preliminary test. It was presented at (and appears in
the proceedings of) the 2010 Silvilaser conference in Freiburg, Baden-Wurttemberg,
Germany. Chapter 5 is an article describing a more thorough examination of the
potential of waveform Lidar over discrete data alone for species classification.
Finally, chapter 7 is a general discussion of the entire dissertation work presented
in previous chapters. The discussion sections of the other chapters are much more
specific. The purpose of the general discussion is to emphasize important points
and provide a more broad overview of relevance of this dissertation when presented
along with the contributions of other authors in the field. This layout was designed
to provide a sufficient amount of continuity among the individual articles presented
to aid the reader with comprehension. At the same time, the purpose was to not
excessively repeat information, which may in fact distract the reader.
7
Chapter 2
PROGRESS TOWARDS THE ACCURATE
IDENTIFICATION OF INDIVIDUAL TREE SPECIES
WITH REMOTE SENSING DATA
2.1 Background
The field work required to achieve precise estimates of forest stand characteristics can
become very expensive and time-consuming. Typically, the largest share of the cost of
a field survey is associated with placing a measurement crew at the site for the number
of hours needed for the desired level of precision. Improvements in sampling designs
have reduced these costs significantly, but we constantly seek out tools that can offer
even greater reduction in field time requirements or improvements in precision.
Modern remote sensing tools offer the additional information needed to greatly
reduce survey costs by reducing the number of plots necessary (Aldred and Hall 1975).
While field costs never decrease, costs of remote sensing products have generally
decreased for a given level of resolution over the last few decades. This results in
ever-increasing motivation to incorporate remotely sensed information into sampling
designs. For a given level of precision, it is becoming more and more likely that
significant cost savings can be realized.
The improvement in efficiency provided by remote sensing tools will be most no-
ticeable for larger areas of interest, and the cost savings increase as coverage area
increases. For example, Mumby et al. (1999) found that satellite sensors outperform
the traditional approach to coastal habitat mapping. Kleinn (2002) and Tomppo and
Czaplewski (2002) show that for national and global-scale surveys, respectively, de-
signs incorporating remote sensing technology could reduce total inventory cost. The
8
number of field plots required to use a model-based sampling scheme increases more
directly with the number of strata and heterogeneity within strata than with the total
area of interest. Additionally, in some cases, it is very affordable to obtain satellite
data for a very large coverage area.
Several forest inventory variables are now readily obtained from remote sensing
products. At the stand level, low-cost satellite imagery is often used for stand de-
lineation and classification, as well as for measuring size and productivity of stands
(Magnusson 2006, Boyd and Danson 2005). For example, stand volume or biomass has
been linked with several variables easily obtained from multispectral satellite imagery
(Foody et al. 2003, Hall et al. 2002, Roy et al. 1991, Sader et al. 1989). Additionally,
species composition for individual stands has often been estimated from multispectral
data (Heikkinen et al. 2010, Kamagata et al. 2005, Franklin et al. 2000, Treitz and
Howarth 2000, Martin et al. 1998, Gong et al. 1997, Walsh 1980, Beaubien 1979).
However, multi-temporal data (Xiao et al. 2002, Mickelson et al. 1998, Wolter et al.
1995, Everitt and Judd 1989) and hyperspatial data (Ke et al. 2010, Wulder et al.
2004, Treitz and Howarth 2000) have been used for this task as well. Some authors
have even classified stands with Lidar data alone (Donoghue et al. 2007, Schreier et al.
1985), or in combination with imagery (Ke et al. 2010).
With the improvement in data resolution and the increased availability of new
tools, some have begun to focus on individual trees rather than the whole stand as
the classification unit (Lim et al. 2003, Gougeon and Leckie 2003). The immediate
advantage of such an approach is more detail and a potential for higher precision
(Culvenor 2003). Both of these are due to a smaller size and drastically increased
number of the sampling unit. Many advances have been made in this area of remote
sensing, both in data availability and analysis techniques. Spatial resolution has
increased substantially, as well as spectral resolution. Additionally, the computer
speed and storage available in a modern desktop computer allow for execution of very
complex algorithms in little time.
9
Despite the progress in individual tree analysis, one variable remains very diffi-
cult to obtain: species. Species information about individual trees within a forest
(or knowledge of species mixture of a stand) can be of great use when decisions re-
garding forest health or estimations of resource values must be made. Whether forest
resource value is defined monetarily or otherwise, it usually depends highly on the
species present. Many variables one can measure from a forest are even more de-
scriptive when species information is available. As a result there is much potential
to increase precision in the desired estimates. Without species-specific information,
species information must either be extrapolated from the field data component of a
survey or even disregarded altogether.
In the recent past many strides have been made in the ability to remotely detect
stand species composition and individual tree species. Slowly, as instruments improve
and algorithms evolve, the potential for accurate prediction of individual tree species
grows. The objective of this paper is to review and discuss the published results
obtained so far in the extraction of individual tree species information from remote
sensing products.
2.2 Individual Tree Segmentation
The first step in identifying the species of individual trees is to find which subsets of
the data represent each tree. Completing this step more accurately leads to an increase
in potential for accurate species identification. With aerial photos, a trained human
interpreter could easily differentiate one tree from another. However, with larger
coverage data sets and lower resolution data, the job of individual tree segmentation
is given to a computer algorithm. The clusters resulting from this segmentation are
usually considered as individual trees, but may in fact represent several trees or only
one part of a tree.
Due to the ability of dominant crowns to obscure sub-canopy information and the
two-dimensional nature of aerial and satellite raster data, individual tree segmenta-
10
tion is inherently difficult using such data. Despite the challenges involved, several
methods have been introduced in the literature. Culvenor (2003) provides a brief
overview of these techniques. Most algorithms, starting with an algorithm proposed
in Gougeon (1995a), rely on the pattern of bright sunlit sides of crowns adjacent to
shadowed areas. The brightness of a pixel in the original image is considered repre-
sentative of the proximity of a pixel to the top of a tree. For most coniferous and
broadleaf species with strong apical dominance, the brightest pixel within the area
of a tree’s crown is considered the center of the tree. Areas of shadow around the
brighter pixels help define the crown perimeter. Ke and Quackenbush (2008) test sev-
eral popular methods on common datasets. Such methods are of great use on images
of dense stands, or those with ample amount of shadow (Leckie et al. 2003).
Other methods that do not rely as much on shadow are better for more isolated
trees (Leckie et al. 2003). Some examples using templates or pattern signatures are
Pollock (1996) and Brandtberg and Walter (1998). Pattern recognition is a rapidly
growing field, fueled by interest in both remote sensing and artificial intelligence, and
new advancements will likely come along soon. However, many are using a new data
source to geographically locate tree crowns for further analysis.
The three-dimensional nature of Lidar data makes individual tree segmentation
much more direct. To confirm this, one must only look at the great number of
recent publications reporting positive results using Lidar for this procedure. Hyyppa
and Inkinen (1999) took the idea of extracting tree outlines from raster imagery
and extended it to a Canopy Height Model (CHM) dataset created from discrete
point Lidar data. The CHM is smoothed using a simple convolution filter prior to
segmentation to reduce the number of local maxima. Stand-level attributes are then
available by summing the attributes obtained for the individual crown segments. In
several publications that follow, it is this basic process that is extended or refined.
One opportunity for refinement comes with the choice of the single filter to apply
to the entire CHM. This step is important because individual tree sizes can vary
11
greatly, and adapting the filter to the local tree height distribution can help account
for this variation. Persson and Holmgren (2002) fit parabolic forms to local maxima in
the CHM image to determine appropriate parameters for a Gaussian smoothing of the
CHM. Popescu et al. (2002) adjusted filter window size based on a model of tree crown
width predicted from canopy height. Brandtberg et al. (2003) created a state-space
representation of the CHM. Pitkanen et al. (2004) test three adaptations of Popescu
et al. (2002) and Brandtberg et al. (2003) for optimal performance. Reitberger et al.
(2007) used an iterative smoothing algorithm on the CHM rather than a direct filter
approach.
Besides optimizing the classification of local maxima as tree tops, some modify the
procedure used to group individual pixels of the CHM to a given maxima. Brandtberg
et al. (2003) used a state-space blob signature to detect trees. Solberg et al. (2006)
introduced a star-shaped criterion to restrict which pixels could be amalgamated into
a tree cluster. If a vector from the tree center to a given pixel center exits the current
crown outline before entering the pixel, the pixel is not included. Koch et al. (2006)
used a similar idea to finalize crown outlines at the end of segmentation. In this
algorithm, large changes in elevation between two pixels of the CHM along a vector
signify the boundary of the crown. Chen et al. (2006) and Kwak et al. (2007) use a
distance to the nearest center transformation to outline tree crowns.
In all of the publications above, the CHM is used for both local maxima detection
and for crown region-growing. Some authors have suggested methods that deviate
from this process. Morsdorf et al. (2003) used a clustering approach on the Lidar
point data, informed by the location of the local maxima on the CHM. Reitberger
et al. (2009) started with tree positions estimated from the CHM, but then changed
to a voxel representation of the area. They used a raster normalized cut algorithm,
which had been modified to work in three dimensions, to decide if an individual cluster
represented multiple trees.
12
2.3 Individual Tree Species Detection
With a description of the physical space occupied by an individual tree, the next
step is to link this position with geographically positioned data. In the case of raster
data, this would be the pixels occupying the two-dimensional outline of the extent
the crown of a single tree. However, in the case of Lidar and other three-dimensional
data, all data contained within the volume of space occupied by the tree is considered
relevant. This section is a review of works published in the last two decades for the
identification of individual tree species using data relevant to the tree position. It is
divided into two parts, data from passive sensors provided in raster form, and another
for data from Lidar, which currently is the only active sensor used for such work.
2.3.1 Passive sensors alone
The increase in spatial resolution available from commercial or public imagery sources
over the past few decades did not directly improve stand type classification as expected
(Dikshit 1996). The decrease in pixel size unfortunately results in greater variation
within the pixels representing a given class (Marceau et al. 1994). However, the
ability of researchers to identify individual tree crowns as well as analyze patterns
within these crowns has increased. There is an endless number of variables one can
compute from just the pixels identified within the crown polygons in an image, and
some have worked quite well. Three main types of patterns have been used to identify
tree species: spectral patterns in multi- and hyperspectral data, textural patterns
within hyperspatial data, and, quite rarely, patterns across multiple images taken at
different times.
Many authors have noted that tree species can have signature spectral reflection
and absorption patterns across the electromagnetic spectrum (Cochrane 2000, Roller
2000, Van Aardt 2000, Asner 1998, Datt 1998, Fung et al. 1998, Gong et al. 1997).
Accordingly, it is usually possible to use the values of several spectral bands for
13
the pixels falling within an individual tree crown. This method tends to be fairly
successful at the individual tree level, given the spatial resolution within the data
is fine enough that the spectral signal mixture comes from the tree crown alone.
Fine resolution multispectral data can still be rather expensive, and several studies
opportunistically work with data that has been acquired for use in multiple projects.
• Pinz et al. (1993) gave an exposition of several modern neural-network train-
ing techniques. They use multiple segmentation approaches and separate four
conifers from beech in a Canadian forest using four-band aerial imagery ac-
quired with the Multispectral Electro-optical Imaging Scanner (MEIS) instru-
ment. Additionally, aerial color-infrared photography (CIR) taken over Austria
is used to separate a pine from two spruce species. Overall classification accu-
racy is very high given the early appearance of the paper.
• Thomasson et al. (1994) used four-band videography (red, green, blue, near-
infrared) to distinguish six broadleaf species in Louisiana. Given the nature
of broadleaf crowns, they used manual digitization of tree crowns rather than
an automatic method. Using two different acquisitions, in summer and fall,
added about 10 percent to the overall classification accuracy using maximum
likelihood (ML) and minimum distance classifiers.
• Under the hypothesis that tree-level analysis is better for analysis of high spatial
resolution images than stand or pixel-level approaches, Gougeon (1995b) tested
seven multispectral attributes of segmented tree crowns for use in tree species
classification of three pine and two spruce species. The best performance came
from average intensity over all pixels representing the tree crown. This is in
contrast to earlier work (Gougeon and Moore 1989), which showed that for lower
spatial resolution images, the highest single-pixel intensity value performed best.
• Meyer et al. (1996) used manual digitization of crowns and CIR to distinguish
14
spruce, beech, fir and two damage classes of pine in Switzerland. A first attempt
on the original unfiltered imagery was not very successful due to image noise.
Their second method, incorporating a high-pass filter on the CIR band values,
new mixture values from principal component analysis (PCA), and textural
features were more successful.
• (Key 1998, Key et al. 2001) compared the relative benefits of having multi-
spectral versus multi-temporal data. They used CIR acquired on many dates
to distinguish four broadleaf species with manual digitization of crowns. ML
classification was used, with prior probabilities determined by proportion of the
given species in the training data. As is typical in work with broadleaves, results
were poor overall. Multispectral data appeared to be of higher importance than
multi-temporal.
• Brandtberg (2002) gave a unique approach using aerial CIR to separate two
conifers and two broadleaf species in Sweden. Segmentation was done with
a multi-resolution state-space approach (Brandtberg and Walter 1998). Linear
discriminant analysis (LDA) was used with several input variables incorporating
crown shape, size, and texture. A fuzzy approach was also used, in which a
crown object does not necessarily belong to one class, but rather has grades of
membership to each of the classes.
• Haara and Haarala (2002) used digitized CIR imagery to classify individual
image segments into two conifer and two beech species, or ground/understory.
A generic region growing algorithm was used to segment crowns from local
intensity maxima. Mean intensities across three window sizes as well as channel
intensity ratios were used in a LDA. The beech were highly separable, but more
confusion occurred between the conifers.
15
• Koch et al. (2002) classified three conifers and two broadleaf species in South-
ern Germany using manual crown digitization on aerial CIR imagery. A ML
approach was used to classify at the pixel level with mean intensity and several
textural measures of pixels as input variables. Crown segments were classified
to match the majority of the pixels.
• Leckie et al. (2003) tested several variables measured from automatically out-
lined tree crowns to distinguish five conifer species and one broadleaf species in
Canada using data from the Compact Airborne Spectrographic Imager (CASI).
Several variables describing the intensity and texture of the trees in each band
were tested, but the mean intensity of the pixels with above-average brightness
performed the best. While classification was done at the tree level in order to
classify stand composition, testing was only done at the stand level. Similar
results were later found for old-growth trees (Leckie et al. 2005).
• Katoh (2004) tested the ability of the pixel-level mean and standard deviation
of IKONOS imagery bands and four standard vegetation indices to distinguish
two conifer species and nineteen broadleaf species. A multiple comparison pro-
cedure was used to determine the best variables for a ML classification. Overall
classification accuracy was not high, but good for 21 species. Coniferous and
broadleaf species could be distinguished fairly accurately.
• Clark et al. (2005) trained a classification model with field spectrometer data
and applied it to seven tropical species (of several hundred) that manage to
protrude above the canopy in Costa Rica. Manual digitization on 1.6-meter res-
olution hyperspectral data was used for pixel-level classification. The spectral
angle mapper was compared to LDA and ML classification, with LDA perform-
ing best.
16
• Goodwin et al. (2005) used manual digitization of crowns and the ML classifica-
tion to distinguish four eucalyptus species in Australia. While only one species
could be accurately mapped, all eucalyptus could be distinguished from sur-
rounding vegetation quite well with only four of the 10 original CASI imagery
bands.
• Olofsson et al. (2006) took a slightly different approach and used model tem-
plates to detect trees in the high resolution CIR imagery. The modeled crowns
were used to compute crown average intensity values by band. Results were
very good for deciphering Swedish broadleaf species from two common conifer-
ous species. The separation of the trees into all three groups was fairly accurate.
• To separate two species of pine in Korea, Kim and Hong (2008) used texture in-
formation and crown outline shape parameters including ratios of area, perime-
ter, and diameter on high-resolution 5-band satellite imagery. The crown shape
parameters worked well to distinguish pines from broadleaf species, while tex-
ture features from the near-infrared band were used to separate the two pine
species.
More recent works take advantage of more advanced, and computationally heavy,
classification algorithms that involve no assumed distributions or parameters. Heikki-
nen et al. (2010) use a support vector machine (SVM) with a Mahalanobis kernel.
Both Krahwinkler and Rossmann (2010) and Mora et al. (2010) use a decision tree
approach. Both of the above classification algorithms tend to work well with high-
dimensional data, but may require extensive tuning to achieve optimal results.
Some authors looked at pixel-based or object-based classification produced with
image analysis software as species segmentation was not necessarily the primary goal
of the publication. Koukoulas and Blackburn (2001) introduce a new classification
accuracy index and test it with a pixel-based species map created from Advanced The-
17
matic Mapper (ATM) imagery with a built-in supervised ML classification algorithm.
In his dissertation, Wang (2003) used eight-band CASI imagery to find an optimal
scale parameter for an object-recognition algorithm to distinguish three species of
mangrove from lagoon, road, and surrounding forest in Panama. Kanda et al. (2004)
used four-band videography to distinguish three common Japanese species. Ke and
Quackenbush (2007) use four-band Quickbird satellite to distinguish four coniferous
from broadleaf species in New York. Pu (2010) used IKONOS satellite imagery to
distinguish many species in an urban area of Florida.
2.3.2 Sensor Mixtures
Given the benefits of using Lidar data to discover tree locations, several authors
have investigated the combined ability of Lidar data mixed with other imagery data.
Typically, tree position is determined with Lidar, while classification is done with
variables created from the raster data. The results are usually at least acceptable,
however this is not always the case (Korpela et al. 2007). One hurdle that must be
overcome is the lens and topographical warp that occurs when imagery is recorded.
Using Lidar data with enough overlap, it is possible to always observe trees from
directly overhead, whereas in imagery vertical lines become stretched and rotated on
the image. Some authors try to resolve this with mathematical models, others work
around it.
• Koukoulas and Blackburn (2005) linked the pixel-based species map previously
created (Koukoulas and Blackburn 2001) from ATM data to tree position de-
rived from Lidar. Segmentation was done using a contour line algorithm on
the raster CHM. Contours representing a height greater than a threshold were
considered as part of a tree top. The same algorithm was applied to the green
band of the imagery and results were compared. Species classification was not
evaluated at the tree level in this paper.
18
• Heinzel et al. (2008) segmented the trees with the algorithm outlined by Koch
et al. (2006). Lidar-produced segments were corrected using the color informa-
tion from the raster data. They were able to distinguish two broadleaf species
from coniferous species by transforming the CIR data into the hue, saturation,
and intensity (HSI) color model and setting threshold values in the hue dimen-
sion.
• Holmgren et al. (2008) produced one of the highest published classification accu-
racies. Several variables computed from the Lidar data for each crown, including
parameters of a parabolic surface fit to the crown top, assisted by mean inten-
sity values for each of the bands in the ML classification. They achieved a 96
percent accuracy distinguishing two individual conifers from broadleaf species
as a group.
• Saynajoki et al. (2008) attempted to separate aspen trees from other species in
Finland by first distinguishing coniferous and broadleaf species manually using
aerial imagery, and only then distinguishing the aspens from other broadleaf
species with automatic methods. These methods included several variables cal-
culated from the Lidar data. Again broadleaf species proved difficult to classify,
with only 57 percent of the aspens correctly classified.
• As part of a species-specific forest inventory, Packalen (2009) incorporated Lidar
and aerial photography to estimate individual tree species as one of two conif-
erous species and a broadleaf species in two areas of Finland. He used textural
features from the imagery and Lidar point height percentiles as well as propor-
tion of ground and vegetation hits. LDA classification provided individual tree
species, which was then used to model stand level diameter distribution and
volume by species.
19
• Puttonen et al. (2009) used a CHM created with Lidar data to model which
pixels on aerial CIR imagery belonged to the sunlit side and shaded side of each
tree. The average intensity value for each band on each side of the tree, as well
as the ratio of the two sides for each band were used as input variables in a
LDA to distinguish two coniferous species from a broadleaf species in Southern
Finland.
• Hedberg (2010) attempted to separate four broadleaf and two coniferous species
in Sweden. Lidar was used for crown segmentation and a 13-band CASI imagery
set is used for classification. The original image intensity as well as textural
features are put through two data reduction schemes, a t-test based method and
a PCA, before the reduced variables are used in a SVM with sigmoid kernel.
• Li et al. (2010) separated five species in Ontario, Canada with high accuracy.
Lidar-derived features included texture parameters from a three-dimensional
gray level co-occurrence matrix of a voxel-set built from the Lidar data. Ad-
ditionally, texture from hyperspectral imagery was included as input into a
decision tree analysis. Best results were achieved, as expected, when both types
of data were combined.
As with pure raster imagery, some authors have incorporated the existing supervised
object-based algorithms built into raster analysis software packages to classify in-
dividual tree species. Voss and Sugumaran (2008) used object-based algorithms to
segment and classify seven dominant tree species in Iowa. Classification results using
either fall or summer hyperspectral data alone were rather poor. Adding elevation
data from a Lidar-produced CHM added about 20 percent to the accuracy. Zhang
et al. (2009) compare segmentation algorithms to aid an object-based classification of
unstated species in Canada. Little information is available in this brief proceedings
submission.
20
2.3.3 Active sensors alone
Lidar has become a very ubiquitous data source in the field of forestry. Very rapidly,
the precision and data density are increasing along with the number of providers.
Newer instruments enable providers to spend less time in the air, resulting in lower
prices for consumers. Today, it can be relatively quick and easy to obtain a quality
dataset for individual tree crown identification. There are several advantages to Lidar
data, and these have helped give rise to its popularity.
Attributes computed from discrete point Lidar for tree species differentiation can
be placed into two categories: spatial distribution and intensity. Originally, intensity
data was either not recorded or this information was discarded before a dataset was
provided to the user. Thus only structural information was available for use. Only
one paper, Liang et al. (2007), concentrated on just these structural patterns when
distinguishing coniferous and broadleaf species. They used a raster image created
from the difference between a surface model created from first returns and a model
created from last returns, which was essentially a CHM.
The spatial distribution of Lidar points provides a representation of both the
shape and density of a tree’s crown. There is little reason to doubt that species have
characteristic crown shapes, given that this is one feature a human observer uses to
make a distinction. Many species, however, have considerable internal variation in
shape. Also, tree size can have a large effect on crown shape, and some authors have
accounted for this with relative height measures (Ørka et al. 2009, Holmgren and
Persson 2004).
When there was interest, providers started recording intensity information for
individual Lidar points. The intensity values of these points indicate the combined
effects of size and reflectivity of vegetative tissue, incidence to the direction of the light
beam, and atmospheric effects. Lidar equipment usually incorporates a near-infrared
light source, and this part of the spectrum is sometimes useful in separating coniferous
21
and broadleaf species (Holmgren and Persson 2004). However, for larger projects,
changes in range and gain sensitivity on the sensor can result in significant variation
in intensity values with time. A great example of this is shown in Moffiet et al. (2005).
These variations can have a large effect if intensity is used for classification purposes.
Some authors are working on corrections for such problems (Gatziolis 2011, Korpela
et al. 2010a). For work with smaller areas of interest with low values of scan angle,
one would expect small range differences and fewer problems from changes in mean
intensity. Several authors have relied at least in part on reflectance attributes for
classification purposes. The two presented below, however, concentrated only on such
attributes.
• To distinguish between a native spruce and two native broadleaf species in
Norway, Ørka et al. (2007) tested the mean and standard deviation of intensity
for three categories of returns: first of many, only, and last of many. Overall
accuracy was most affected by the metrics derived from the points in the first
of many category.
• Kim et al. (2009) investigated differences between eight broadleaf and seven
coniferous species using metrics derived from intensity using two datasets, rep-
resenting both leaf-on and leaf-off conditions. Metrics used were mean and coef-
ficient of variation of all points, upper canopy points, and surface points. Large
differences were found between species, greatly due to the seasonal differences
between the two datasets.
Quite a few authors have looked at the combined values of structural and intensity
information for tree species retrieval. There will be some correlation between the
two attributes, as reflected intensity will be affected by structural attributes such as
branch density and leaf size. However, the combination of the two attributes does
seem to work quite well when compared to either attribute alone. Important variables
from the papers below often fall into both categories of attributes.
22
• After looking at a very large number of variables, Holmgren and Persson (2004)
found that using six less-correlated variables produced the highest classification
accuracy for two coniferous and a combined broadleaf species category. These
variables included five structural metrics, including surface shape and relative
height statistics, and only one intensity metric, the standard deviation of inten-
sity.
• To distinguish seven species of woody vegetation in Australia, Moffiet et al.
(2005) compare height and intensity distribution statistics. However, proportion
of single returns, crown permeability, and cover were better predictors. They
found that intensity information varied too greatly to be useful as a predictor
of species.
• Brandtberg (2007), after a preliminary analysis in Brandtberg et al. (2003),
outlined a series of four events that could occur as a beam of light hits a tree
target. Statistics computed on height and intensity values, within points that
followed a given path of these events, are used to classify tree species. However,
only moderate classification accuracy is achieved, which the author attributes
to similarity of the broadleaf species.
• Ørka et al. (2009) expanded upon the work in Ørka et al. (2007) by adding
several structural attributes to the available input data. Additionally, the clas-
sification problem was reduced to a single coniferous and a single broadleaf
species. Field-derived tree position was used rather than an automatic segmen-
tation approach. This resulted in a larger number of small trees that were more
difficult to classify.
• Suratno et al. (2009) used return-type percentages, as well as mean and stan-
dard deviation of both height and intensity by return type to distinguish four
23
coniferous species in a Montana mixed-conifer forest. Individual tree classifica-
tion accuracy was not high, but dominant species, led by proportion of returns
by return type, was more accurately detected at the stand-level.
• To distinguish two Norwegian coniferous from a broadleaf species, Vauhkonen
et al. (2009) incorporated crown shape information along with height and inten-
sity distribution metrics. Additionally, textural information derived from the
CHM was included. Overall accuracy was very high and comparable to that of
Holmgren and Persson (2004).
• In order to investigate the effects of intensity corrections on classification results,
Korpela et al. (2010b) tested a large number of variables with raw and corrected
data from two Lidar sensors, one of which records the gain control setting. In
all cases, such correction positively affected the classification results. LDA, k-
Nearest Neighbor, and Random Forests classification were tested, with Random
Forests slightly outperforming the others.
A modern format of Lidar, usually referred to as waveform or fullwave Lidar is quite
a bit more complex than discrete point Lidar. Rather than record individual peaks in
the return signal, a digitizer samples the return signal at a regular rate (Mallet and
Bretar 2009). Several authors have explored techniques for processing and analyzing
these data (Chauve et al. 2007, Nordin 2006, Wagner et al. 2006, Persson et al.
2005). There is ample enthusiasm about the potential uses for this information and
its applicability to target discrimination (Parrish and Nowak 2009, Reitberger et al.
2009, Wagner et al. 2008; 2004). Because of this enthusiasm, many authors have
begun investigating these data for species recognition.
• After modeling the waveform data as a series of Gaussian-shaped peaks repre-
senting points, Reitberger et al. (2008; 2006) use metrics of crown shape of the
24
trees, the distribution of points within the canopy, intensity and width informa-
tion of the peaks, and the number of peaks per waveform to decipher between
coniferous and broadleaf trees in Germany.
• Litkey et al. (2007) distinguish two coniferous species from two broadleaf species
in Finland using an inflated discrete point dataset created by extracting points
from the waveform data. Interestingly, the extra density subtracts from overall
accuracy when a Nearest Neighbor approach is used. They also consider iden-
tification of distinctive waveforms for a given species, and compare waveform
data against georeferenced handheld digital camera images.
• Hollaus et al. (2009) distinguished two coniferous species from a beech species in
Austria using means and standard deviations of both peak widths and modeled
cross sections (as described by (Wagner et al. 2006)). Distinctions between
beech and the coniferous species were quite good, though heavy crown overlap
of species on the ground hampered the results noticeably. With similar methods,
Hofle et al. (2008) explored these features for segmentation and classification of
larch, oak and beech species in a natural forest outside Vienna.
• Vaughn et al. (2010) looked at Fourier transformations of the individual wave-
forms, averaging the influence of each frequency across individual trees. While
the one-dimensional Fourier transform includes no crown surface shape informa-
tion, three broadleaf species were distinguished with fair accuracy. Subsequent
reductions in the number of waveforms (Vaughn and Moskal 2010) showed that
this approach is promising for more sparse data as well.
• Heinzel and Koch (2011) tested a large number of combinations of peak-point
attributes to distinguish four to six species occurring in a forest in Southwestern
Germany. A stepwise approach was used to determine the best combination
25
of variables. Intensity, peak width, and the number of peaks per waveform
performed well as predictors.
2.4 Discussion
The problem of individual tree species recognition is indeed a very difficult one. In
general, the main problem is that many species that can be distinguished on the
ground by bark and leaf morphology differences must now be distinguished from
a thousand meters above. Larger-scale differences between species, such as crown
shape, branch patterns and color, are often blurred by the differences within a species.
Species is only one factor that affects the final shape and color presented by a tree.
Additionally, there are many locations where two similar species of the same genus
coexist together but a desire remains to be able to distinguish them (Suratno et al.
2009, Goodwin et al. 2005, Van Aardt 2000, Gougeon 1995b).
In order to overcome this inherent problem, a large number of approaches have
been introduced over the last few decades. Differences from site to site are large, and
there really is no universal solution. A technique that works well in a location with
heavy conifer cover may break down when several broadleaf species are introduced.
The number of species of interest, the typical differences between these species, as
well as budget constraints, will determine the best data and variables to use for each
location.
The problem of distinguishing coniferous from hardwood or broadleaf (typically de-
ciduous) species seems to be much more tractable. It is often the case that transform-
ing a confusion matrix into one containing only two classes: coniferous and broadleaf,
will improve the overall accuracy observed (Kim et al. 2009, Van Aardt 2000). This is
usually due to the strong apical dominance typically exhibited by coniferous species
or the potentially very large color difference between the two groups (Vauhkonen et al.
2009, Kim and Hong 2008, Ørka et al. 2007, Olofsson et al. 2006, Katoh 2004). Due to
the difference in growth behavior, being able to accurately distinguish between these
26
two groups can be very beneficial to inventory tasks.
When significantly north or south of the equator, most forests are dominated by
relatively few species. This means that being able to accurately distinguish four or five
species would be beneficial on a large share of the world’s forest. In most applications,
there will likely be a few coniferous and a few broadleaf species to distinguish. A large
share of the work in such conditions has occurred in the Nordic countries. Here one
often needs only to distinguish one or two coniferous species, usually Scots pine (Pinus
sylvestris L.) and Norway spruce (Picea abies (L.) Karst.), from one or two broadleaf
species (Betula spp. or Poplus tremula L.). Results from this region have been very
impressive (Korpela et al. 2010b, Ørka et al. 2009, Holmgren et al. 2008, Holmgren
and Persson 2004). In North America, authors have commonly worked with more
species of coniferous trees (Leckie et al. 2003, Gougeon 1995b, Pinz et al. 1993).
Species separation in the tropical regions is far more difficult due to the large
number of species (Clark et al. 2005, Roy et al. 1991). In any classification the error
rate should rise as the number of classes increases. Additionally, many species are
strictly understory species and identifying them with existing remote sensing tools is
not very reliable. In such cases, identifying important emergent species will likely be
the best that can be done.
Currently, there are two basic data sources one can use for crown segmentation
and classification: two-dimensional raster data or three-dimensional Lidar data. Both
have advantages and disadvantages that can help decide which to use for a specific
application. However, the best results typically come from the use of both types if
color and structural information is compiled (Hedberg 2010, Li et al. 2010, Heinzel
et al. 2008, Holmgren et al. 2008). Unfortunately in any practical situation, the
acquisition of multiple data sets is not an option.
Of the two data sources alone, Lidar has provided the best results on an individual
tree basis. This might be due to important advantages of the Lidar data. Not only
can a more exact position of a tree be obtained, but many useful variables can be
27
computed from the three-dimensional information of the points at this position. For
example, height distribution information has proven to be very useful in tree species
separation (Vauhkonen et al. 2009, Holmgren and Persson 2004). Point intensity data
have been considered important in species discrimination (Kim et al. 2009, Ørka et al.
2007). A disadvantage of Lidar data is that only one wavelength of light is used for
these intensity values.
There is at least one other disadvantage to working with Lidar data. There are
very few software applications and standard algorithms for working with Lidar data,
especially for doing object-based analysis such as species identification. This is even
more true for waveform Lidar data. While early results suggest that this is a good data
source for species classification, all analysis must typically be done with custom-built
tools or specialized packages not readily available to analysts who might incorporate
some of the methods presented. This situation is changing, as raster analysis software
packages are starting to incorporate new geometry algorithms that can make use
of three-dimensional point Lidar data. Additionally, the latest version of the LAS
standard format for storing Lidar data now includes an option to store waveform data
(ASPRS 2010). It will not be long before the object-oriented classification algorithms
readily available for raster data become able to incorporate Lidar information.
Individual tree species identification capabilities have improved in time along with
the increase in data resolution and computing ability. While we will likely never be
able to identify every tree on the ground, there are models that exist to estimate
without bias the plot-level statistic of interest based on the crown segments that we
are able identify. If a species can be associated with each of these crown segments,
that will greatly improve our ability to estimate inventory information stratified by
species.
28
Chapter 3
FOURIER TRANSFORM OF WAVEFORM LIDAR FOR
SPECIES RECOGNITION
3.1 Abstract
In precision forestry, tree species identification is one of the critical variables of forest
inventory. Lidar, specifically full waveform Lidar, holds high promise in the ability
of identifying dominant hardwood tree species in forests. Raw waveform Lidar data
contains more information than can be represented by a limited series of fitted peaks.
Here we attempt to use this information with a simple transformation of the raw
waveform data into the frequency domain using a Fast Fourier Transform. Some
relationships are found among the influence of specific component frequencies across
a given species. These relationships are exploited using a classification tree approach
in order to separate three hardwood tree species native to the Pacific Northwest of
the USA.
We are able to correctly classify 75 % of the trees (khat of 0.615) in our training
dataset. Each tree’s species was predicted using a classification tree built from all
the other training trees. Two of the species grow in close proximity and grow to a
similar form, making differentiation difficult. Across all the classification trees built
during the analysis, a small group of frequencies is predominantly used as predictors
to separate the species.
3.2 Introduction
Species identification is an important component of many forest surveys. Environmen-
tal quantifications of interest such as timber value and habitat quality depend highly
29
on the species distribution within the stand. Because of this importance, techniques
to quickly and accurately determine individual tree species or simply the proportion
of a given species on a larger scale are intensively sought after. As it is more and more
common for Lidar to be used for operational forestry, techniques to classify species
from Lidar data are of great interest.
As the spatial structure of a tree is modeled quite well, it is tempting to believe
that a Lidar dataset of sufficient density might contain enough information needed
to correctly distinguish several tree species from one another. Under this hypothesis,
several authors have proposed methods to use discrete point Lidar information (Kim
et al. 2009, Liang et al. 2007, Ørka et al. 2007, Brandtberg 2007, Moffiet et al. 2005,
Holmgren and Persson 2004). In order to improve upon these results, some have
combined Lidar data with raster datasets in one form or another (Saynajoki et al. 2008,
Holmgren et al. 2008, Korpela et al. 2007, Koukoulas and Blackburn 2005). However,
in such cases, the Lidar data is incorporated more to aide tree crown segmentation,
than for classification purposes.
Recently, Lidar vendors have begun to make available waveform Lidar data sets.
Waveform data sets contain an entire digitization of the intensity over a brief time
for each light pulse. Mallet and Bretar (2009) provide a detailed introduction to such
data and the instruments that collect this data. The potential to hold additional
information about the target is likely increased along with the density of recorded
waveform data. Wagner et al. (2004) have argued that waveform data already contain
sufficient information for target classification.
Each waveform contains information about the reflectivity, density, and spatial
arrangement of the leaves and branches of the target tree. Given the amount of
information contained in a waveform, techniques that smooth this information such
as Gaussian decomposition, may risk losing important information. In its raw form,
a waveform is a simple time series. Therefore, tools that have been used in the
past to analyze time series may again prove their usefulness in this case. These
30
tools allow for the transformation of the original data into forms that emphasize the
temporal relationships between all the sample points. Such a representation of the
data facilitates the search for patterns within waveforms which may help distinguish
a given tree species. In this paper, we employ a technique commonly used in the
analysis of time series, the Fourier transform in order to distinguish three deciduous
species native to large areas of the western United States with waveform Lidar data.
3.3 Methods
3.3.1 The study site
The Washington Park Arboretum in Seattle, Washington is operated by the Univer-
sity of Washington Center for Urban Horticulture. The Arboretum, which is approx-
imately 230 acres (93 hectares) in size, is planted with more than 10,000 cataloged
woody plant specimens representing several genera. In addition much of the Arbore-
tum contains natural stands of species native to Western Washington State, such as
Douglas-fir (Pseudotsuga menziesii (Mirbel) Franco), western redcedar (Thuja plicata
Donn ex D. Don), bigleaf maple (Acer macrophyllum Pursh), black cottonwood (Pop-
ulus balsamifera L. ssp. trichocarpa (Torr. & A. Gray ex Hook.) Brayshaw), and
red alder (Alnus rubra Bong.). The non-native trees in the arboretum are planted in
groups by genus. Native species are also dispersed throughout arboretum and can be
found clustered into their own groups or sparsely mixed within the non-native trees.
3.3.2 Data processing
We applied waveform data provided by Terrapoint USA, who flew a RIEGL LMS-
Q560 laser scanner, with waveform signal digitization, over the Washington Park
Arboretum on August 8th, 2007. This instrument was set to digitize waveforms at a
sample interval of about 1 ns, or 15 cm in linear distance. Scan angle ranged from
-30 to 30 degrees, and the pulse frequency was set at 133,000 Hz, resulting in a pulse
31
density of about 10 pulses per square metre (ppm) near nadir at ground level. For
comparison, this would yield about 20 points per square metre in a comparable first
and last return discrete point dataset. A single 4.5 km looped pass in the North-
South direction for the length of the Arboretum, lasting about 6 min provided nearly
49 million waveforms.
Within the same Arboretum, Kim et al. (2009) geolocated and measured charac-
teristics of the trees in 18 field plots within the Arboretum. The locations of these
plots are shown in figure 3.1. The field plots were installed systematically so that at
least one plot is measured in each genus group of interest. Within each plot about
ten to twenty example trees were identified and measured during the summer of
2005. Typically, the measured trees were somewhat isolated, simplifying the process
of crown delineation from the Lidar data. However, several of the groups of native
species are arranged with densities similar to the densities of natural stands. Each of
the trees measured in the field plots has been mapped into UTM coordinates using
an angle and distance from known points within the plots. These points were located
with survey-grade GPS units and these data were later differentially corrected for
optimal accuracy.
In order to associate waveform data to individual trees on the ground, the tree
crowns had to be delineated in mapping coordinates. While possible to do this directly
using a waveform Lidar dataset, many tools already exist to perform such analysis
on a discrete point dataset. We therefore used a discrete point dataset, built from
the waveform dataset, to create a raster image containing a digital canopy height
model. In this elevation model the highest return elevation, above the DEM, within
each grid cell was stored. We used the method described by Hyyppa et al. (2001a) to
obtain an initial set of polygons representing the crown outlines of individual trees in
the Arboretum. This method works in an iterative manner, at each step neighboring
pixels are added to clusters surrounding local maxima of a filtered canopy height
model. Under such an algorithm, many groups of trees are mistaken for single trees
32
Figure 3.1: Map of the Washington Park Arboretum with flight path and ground plotlocations.
33
(Hyyppa et al. 2001b). Though this should have little effect in the Arboretum, the
polygon for each tree in the training data set was visually inspected and, if necessary,
improved upon by hand. All waveforms with data inside the outline of each tree (at
any height) were identified. Due to the large size of the full dataset, this procedure
was performed using code written in C.
3.3.3 Fourier transform
In the analysis of time series, several tools are available to look for non-random pat-
terns within the data. One technique is to look at the data in the “frequency domain”
to discover frequencies of strong influence. This is usually done with a discrete Fourier
transform. This transform converts the original data into a set of coefficients repre-
senting the influence of sine and cosine wave of a known set of frequencies. Large
coefficients are associated with heavy influence and imply that a higher amount of
periodicity at the given interval is detected in the data. The transform loses no in-
formation, as the number of frequencies is equal to the number of samples in the
original signal. Fast versions of the transformation exist under the name Fast Fourier
Transform (FFT), and have a relatively low upper limit on computing time (Singleton
1969). The fft function in the R programming language (R Development Core Team
2009) was used to compute the FFT on each waveform.
In this study, waveforms had 60 samples each, representing about 9 m of linear
distance. Because of how the FFT works, only the coefficients of half of the frequencies
are meaningful. With 60 samples, we can consider the amplitudes of the first 30
frequencies to be useful. For each tree then, the averages (across all waveforms hitting
a tree) for each of these 30 useful amplitudes were stored as variables named with
a leading “M” followed by the frequency identification number (M1 through M30).
Additionally, the standard deviations of each frequency were recorded as variables V1
through V30.
Additionally, the average intensity value was kept for each waveform. This easily
34
computed value represents the total amount of light reflected from each pulse. It is
easy to see how this value might vary by species. The average and standard deviation
of these values for each tree were recorded as MI and VMI. In total, for each tree in the
training data set, there are 62 variables that will be considered for use in classification
as described in the next section.
The FFT algorithm assumes an equal time period between samples, however in
some cases the range values within each pulse data are not equally spaced. To greatly
simplify analysis, these facts were ignored, as a violation of this assumption is not
too concerning in this case. The displacement of an occasional sample point should
have little impact on the results. In most cases the difference in intensity between
two neighboring samples is very small. We are also ignoring that our series is not of
a periodic origin, as are our sinusoidal basis functions. It does little harm to pretend
that our series repeats itself in both directions ad infinitum.
3.3.4 Classification
We attempted to correctly classify all trees in three hardwood species: red alder,
black cottonwood, and bigleaf maple. These species represent common hardwood
species that grow naturally in the arboretum, and therefore are represented well in
the field data. To partition the data we used a classification tree approach (Breiman
et al. 1984; page 18). The R library tree contains a function of the same name for
modeling with these classification trees (Venables and Ripley 2002; page 266). Figure
3.2 shows the classification tree obtained by fitting the entire training dataset. The
variable and split value used is shown atop the fork. Each end node is labeled with
the species most represented in the group of trees that have not been eliminated when
traversing the tree from the root. Below each leaf the deviance within that leaf and
actual class membership are presented. The total tree deviance is the sum of the
individual leaf deviances, and the reductions in tree deviance as each split is added
are shown in a table in the bottom left of the figure.
35
Figure 3.2: Example result of a classification tree fit to the training data. This is thetree returned when fitting to the entire training dataset and pruning off one branchto simplify the structure. Above each split is the variable and level used to split theremaining trees into two subgroups. The reduction in tree deviance resulting fromeach split is shown in the bottom left. Leaf deviances are shown in parenthesis, actualclass memberships in square brackets.
36
Table 3.1: Results of the classification when each tree species is predicted from aclassification model incorporating all other trees.
PredictedSpecies bigleaf maple cottonwood red alder Producer accuracy (%)bigleaf maple 7 2 1 70cottonwood 2 14 1 82red alder 1 4 12 71User accuracy (%) 70 70 86 75
With limited training data available, a leave-one-out cross-validation technique
was used to estimate the actual predictive power of this technique when the trained
model cannot be applied to a separate validation data set. The species of each tree
was predicted by a classification tree that was trained with all other trees in the data.
The numbers involved in such a process makes refinement of each tree unpractical,
and thus each tree was built from the built-in defaults of the tree function. In a
non-academic application, the tree building could be better optimized and this may
result in slight improvements in the classification accuracy.
3.4 Results and Discussion
Table 3.1 shows the classification results from the cross-validation. The overall classi-
fication accuracy, or the portion of correctly classified trees, was 75 %. The associated
κ value was 0.615. For individual species, 70 % of maples, 82 % of cottonwoods and
71 % of alders were correctly classified. Previous studies have accuracies ranging
from about 64 % (Brandtberg 2007) to around 95 % (Holmgren and Persson 2004).
However, the classification approaches and model applications vary quite drastically
among these works. An indirect comparison of methods applied in different situations
provides little information about the qualities of each. While these results are not
generally better than previous results, they were obtained from a simple analysis with
much room for improvement.
37
In the selection of species to classify, we left out all conifers. One reason for
this omission is that it seems methods using discrete point data are already capable
of discriminating hardwoods from conifers. For instance, Reitberger et al. (2006)
achieved 88 % accuracy distinguishing conifers from hardwoods in a German mixed
forest. Another reason for leaving out conifers is that the dominant shape of most
conifers would mean that light pulses crossing the trees at steep angles would likely
be drastically different than those passing at shallow angles. The more dome-like
shape of most hardwoods might nullify this effect. It is possible that limiting analysis
to more vertical scan angles could provide information to differentiate conifers from
hardwoods or multiple species of conifers from one another. For this study we paid
no attention to scan angle, and this is an area for improvement.
Table 3.2 lists all variables that were used in more than 2 classification trees out of
the 44 built. The second column shows the frequency, in cycles per meter, associated
with the listed variables. However, the variables MI and VMI are not associated with
any frequencies. The fourth column lists how many trees used the variables as a
predictor variable.
The classification trees during the cross-validation procedure consistently relied
upon very few of the available variables. This is largely due to the fact that in each
cross-validation run, only one tree was replaced in the training data. However, it is
still surprising that so few variables were so consistently included. The variables M6,
M12, M26 and MI were included in a strong majority of the trees. Any biological
meaning of these particular frequencies is not obvious, but there are some possible
explanations for the importance of these frequencies. The frequency of M12 divided
by M6 is 2.20 while M26 divided by M12 is 2.27. The fact that these two quotients are
nearly the same may not be coincidence. A sine wave of a given frequency is orthogonal
to a sine wave of twice the frequency, assuming no phase shifts. Such a selection of
variables may tend to be optimal due to this phenomenon. Of the three commonly
used frequencies M6 represents a lower frequency, M12 a medium frequency, and M26
38
Table 3.2: The most commonly used variables from the cross-validation procedure.Count is the number of cross-validation trees incorporating the variable. Those begin-ning with “M” are means of the coefficients of a given frequency across all waveformshitting a tree. Variables beginning with “V” are variances. “MI” and “VMI” are themean and variance of all intensity values for all waveforms hitting a tree.
a higher frequency. When compared to the dimensions of a tree, the wavelength of
the lower frequency is on a scale that could represent between-branch variation while
that of the higher frequency may represent within-branch variation.
The boxplots shown in figure 3.3 represent the range of each of the variables M6,
M12, M26 and MI (panels (a), (b), (c) and (d), respectively) over all the waveforms
hitting each tree. The species of each tree is represented by a shade of gray. There is
a clear, observable difference between bigleaf maple and the other two species. Less
clear is that the variables M12 and MI are more responsible for the differentiation of
red alder and black cottonwood. Figure 3.2 shows that lower values of MI, or high
values of both MI and M12 lead to a decision of black cottonwood. This can be
observed in the boxplots after prolonged examination.
This technique does show promise as an additional tool for the classification of tree
species. There is sufficient information available in the raw waveform data to aide in
the distinction of tree species. As in decomposition of waveforms into peaks, we have
still managed to reduce the data. Instead of reducing waveforms to peaks, we have
reduced a large amount of data into simple averages for each tree. One important
note is that no spatial information, beyond that used to assign a waveform to a
given tree, was used in this analysis. Related techniques incorporating the additional
spatial information to look for patterns between waveforms might provide a large
performance boost.
As waveform data is dense and very expensive when compared to discrete point
data, a next step is to test whether similar results can be produced from discrete point
data. Per-tree histograms of return abundance by height, such as those in Falkowski
et al. (2009), appear similar to a single waveform. There may be some spatial patterns
detectable in such “waves” using the same Fourier transform.
It is important to note one potential drawback to using Lidar for individual tree
results. As in discrete point systems, the intensity values from waveform Lidar sys-
tems are dependent on time. This is due to an adaptive gain setting on the instrument
40
Figure 3.3: Boxplots of the values of four classification variables for all trees in thetraining dataset. Each box represents one tree of the indicated species and representsall waveforms intercepting the tree’s crown outline polygon (40 to 16000 waveforms).The values of these variables have no units.
41
is changed dynamically during flight to adapt to large scale changes in surface reflec-
tivity. The intensity values over the length of the flight over the arboretum seem
stable in this case. Effects of intensity scale changes on the results of this technique
would likely depend on the degree of such change.
3.5 Conclusion
The technique described in this paper provides an elegant method for the classifica-
tion of tree species from waveform Lidar data. Further refinement, such as accounting
for scan angle and more precise crown delineation techniques, could bring substantial
increases in accuracy. This way of looking at the data in the frequency domain pro-
vides much information about branch and leaf arrangement patterns observed between
waveforms. However this viewframe provides little or no information about general
tree shape and large-scale spatial arrangement, as do other methods using point data
previously published (e.g. Kim et al. 2009, Holmgren et al. 2008). Therefore, these
two ways of looking at the data may compliment one another. This hypothesis needs
to be tested through future research.
Though only three species were tested here, two are very similar, suggesting that
the technique may perform well in regions with higher complexity. The use of Lidar
would also eliminate many complications imposed on optical imagery analysis by cloud
cover in many regions. Because a waveform dataset contains species information as
well as the information contained in a discrete point Lidar dataset, it may soon
be unnecessary to acquire an additional optically-based raster dataset for the sole
purpose of species classification.
42
Chapter 4
FOURIER TRANSFORM OF WAVEFORM LIDAR FOR
SPECIES RECOGNITION - DATA REQUIREMENTS
4.1 Abstract
Waveform Lidar information is typically analyzed only after decomposing waveforms
into a sum of Gaussian peaks. Under the assumption that some important information
may be lost in the decomposition, an attempt was made to transform the waveform
into the spectral domain using a fast Fourier transform. This approach was successful
at distinguishing three deciduous species with 75 % accuracy (kappa=0.62), using a
classification tree approach.
The data set density used in this work was about 10 light pulses per square metre
(lppm) near nadir at ground level. This allows for an analysis of data density effects on
the ability of the classification method to correctly identify a given species. The data
were reduced, by removing waveforms at uniform intervals, into subsets containing
approximately 80, 60, 40, and 20 % of the original density. This resulted in densities
of approximately 8, 6, 4 and 2 lppm.
Surprisingly, not all reductions of data were found to decrease the ability of this
method to correctly identify tree species. In fact the 80 % density showed marginal
improvement over the full density. The 60, 40 and 20 % densities decreased classifica-
tion accuracy by 10 to 20 %. The results indicate that pulse density has only slight,
yet sometimes unpredictable effect on the classification accuracy outcome.
43
4.2 Introduction
Airborne Lidar data has been shown to provide estimates of stand characteristics,
such as height and canopy cover, that have very high precision (Andersen et al. 2006,
Næsset et al. 2004). Being an active sensor, not dependent upon light conditions,
Lidar data are often preferred over data from various aerial and space-borne sensors.
However, species recognition is one area where Lidar has not yet excelled. For this
reason it is still common for species information to be obtained from two-dimensional
hyperspectral imagery. However, as Lidar instruments improve we should see corre-
sponding improvements in the ability to classify species using Lidar data alone.
The most obvious differences between many tree species are those involving color
and physical structure. Though Lidar typically works with only one “color”, a fre-
quency in the near-infrared range of the spectrum, a few examples exist of some
success using Lidar intensity data alone to discriminate tree species (Kim et al. 2009,
Ørka et al. 2007). Others have worked with variables objectifying tree shape and
structure in various ways (Liang et al. 2007, Brandtberg 2007, Brandtberg et al.
2003). Encouraging results have also been achieved with the combination of intensity
and structure variables (Ørka et al. 2009, Vauhkonen et al. 2009, Kim 2007, Holmgren
and Persson 2004).
More recently, a slightly different form of Lidar, called waveform or fullwave Li-
dar, has become more readily available (Mallet and Bretar 2009). This instrument
digitises segments of the return signal at a very high sample rate, resulting in data
that resemble a wave with peaks and troughs. Several authors have investigated the
abilities of waveform Lidar to distinguish species characteristics. At first, these data
were used to detect peaks that may be missed by an on-board peak detector system,
which resulted in denser discrete point datasets. Reitberger et al. (2006) showed that
such data were useful for species detection. To obtain these peak locations, wave-
forms are typically decomposed into a series of Gaussian or similar forms (Chauve
44
et al. 2007). Some works suggest that storing shape parameters from individual peaks
may lead to further improvements in classification (Hollaus et al. 2009, Wagner et al.
2008, Litkey et al. 2007).
As useful as individual peak modeling has been, information is always lost when
simplifying data using any model with fewer parameters than data points. The pulse
width of output signal on most Lidar systems is on the order of 0.5 to 2 m, and this
signal has a strong smoothing effect on the shape of the returned signal. While decon-
volving the data may reduce much of this smoothing, fitting the data to waveforms
will filter out higher frequency patterns. Some of these patterns may help distinguish
one species from another. We put this theory to test by transforming individual wave-
forms using a discrete Fourier transform (Vaughn et al. 2010). This linear transform
rebuilds the signal as a composition of sine waves, allowing one to analyze the impor-
tance of differing frequencies in the observed data. Results showed that wavelengths
as low as 0.36 m were important for classification.
The spatial scale of raster data often has a large influence on classification re-
sults, and optimal scales differ by data type and approach (Treitz and Howarth 2000,
Marceau et al. 1994, Woodcock and Strahler 1987). Also depending on the applica-
tion, discrete Lidar data density may also highly influence results. Liu et al. (2007)
found that point density affects DEM accuracy. One advantage of this particular
approach is that it should be less dependent upon data density, because the simple
means used as discriminating variables for each tree should be stable even with only
relatively few waveforms hitting the tree. In this paper we apply the same trans-
formation to datasets with reduced numbers of waveforms compared to the original
dataset to test if this is indeed the case.
45
Figure 4.1: Difference between waveform and discrete Lidar data. The 60 waveformsamples are shown as circles and a spline fit to these data appears as a solid grayline. A peak detector might detect two peaks at about 345 and 348 m and returnthe intensity value when the peak is detected as shown with exes. Due to inherentlimitations, real time peak detection algorithms usually produce a slight lag in peaklocation.
46
4.3 Methods
4.3.1 The study site
The Washington Park Arboretum in Seattle, Washington is operated by the Univer-
sity of Washington Center for Urban Horticulture. The Arboretum, which is approx-
imately 93 ha in size, is planted with more than 10,000 cataloged woody plant spec-
imens representing numerous genera. In addition much of the Arboretum contains
several species native to Western Washington State, such as Douglas-fir (Pseudot-
suga menziesii (Mirbel) Franco), western redcedar (Thuja plicata Donn ex D. Don),
bigleaf maple (Acer macrophyllum Pursh), black cottonwood (Populus balsamifera
L. ssp. trichocarpa (Torr. & A. Gray ex Hook.) Brayshaw), and red alder (Alnus
rubra Bong.). The non-native trees in the arboretum are planted in groups by genus.
Native species are also dispersed throughout arboretum and can be found clustered
into their own groups or sparsely mixed within the non-native trees. While many of
the planted trees are open-grown, overlapping crowns are typical within the native
tree groups. However, tree densities are rarely as high as one would expect in natural
stands in the vicinity.
4.3.2 Data processing
We applied waveform data provided by Terrapoint USA, who flew a RIEGL LMS-
Q560 airborne laser scanner with waveform signal digitization, over the Washington
Park Arboretum on August 8th, 2007. This instrument was set to digitise waveforms
at a sample interval of about 1 ns, or 15 cm in linear distance. Scan angle ranged from
-30 to 30 degrees, and the pulse frequency was set at 133,000 Hz, resulting in a pulse
density of about 10 pulses per square metre (lppm) near nadir at ground level. For
comparison, this would yield about 20 points per square metre in a comparable first
and last return discrete point dataset. One example waveform is displayed in figure
4.1. The range from the instrument is shown on the horizontal axis, while the unitless
47
intensity value is shown on the vertical axis. Two points are marked with exes indicate
data points that might be returned by a traditional on board peak detector. A single
4.5 km looped pass in the North-South direction for the length of the Arboretum,
lasting about 6 min provided nearly 49 million waveforms. Each waveform contained
a minimum of 60 consecutive samples, which covers a linear distance of about 9 m.
In many cases, depending on target height, the number of samples was 120 or even
180 for a given waveform, but this is not necessarily consecutive data. Because of the
discrepancy in the number of waveforms, only the first 60 samples were kept from
each waveform. This should cover 9 m of the path of the waveform starting from the
surface of the target. As a result, in trees taller than about 9 m, ground strikes will
not be recorded within the retained waveform data.
Within the same Arboretum, Kim et al. (2009) geolocated and measured charac-
teristics of the trees in 18 field plots within the Arboretum. The locations of these
plots are shown in figure 4.2. The field plots were installed systematically so that at
least one plot is measured in each genus group of interest. Within each plot about
ten to twenty example trees were identified and measured during the summer of 2005.
Each of the trees measured in the field plots has been mapped into UTM coordi-
nates using an angle and distance from one of three known points within the plots.
These points were located with survey-grade GPS units and these data were later
differentially corrected for optimal accuracy.
In order to associate waveform data to individual trees on the ground, the tree
crowns had to be delineated in mapping coordinates. We therefore used a discrete
point dataset, built from the waveform dataset, to create a raster image containing
a digital canopy height model. In this elevation model the highest return elevation,
above the DEM, within each grid cell was stored. We used the method described
by Hyyppa et al. (2001a) to obtain an initial set of polygons representing the crown
outlines of individual trees in the Arboretum. This method works in an iterative
manner, such that at each step neighboring pixels are added to clusters surrounding
48
Figure 4.2: Map of the Washington Park Arboretum with flight path and ground plotlocations.
49
local maxima of a low-pass filtered canopy height model. Under such an algorithm,
groups of trees are often mistaken for single trees (Hyyppa et al. 2001b), though in
the Arboretum this should be less of a problem. The resulting polygon for each tree
in the training data set was visually inspected and, if necessary, improved upon by
hand. All waveforms were identified that contained data at any height above ground
within the outline of each tree.
4.3.3 Fourier Transform
In the analysis of time series, several tools are available to look for non-random
patterns within the data. One such tool is the discrete Fourier transform, which
allows one to look at the data in the “frequency domain”. In doing so, we may
discover frequencies of strong influence within the time series. This transform converts
the original data into a set of coefficients representing the individual influence of
sine waves from a known set of frequencies. Large coefficients are associated with
heavy influence and imply that a higher amount of periodicity at the given interval is
detected in the data. Fast versions of the transformation exist under the name Fast
Fourier Transform (FFT), and have a relatively low upper limit on computing time
(Singleton 1969).
Figure 4.3 shows the transform of an example waveform taken from the training
data. Typically, the mean is subtracted from each sample value. In figure 4.3(a),
the mean centered waveform appears along with the complex waveform fit by the
FFT algorithm. A result of modeling with the same number of variables as data is
that all of the sample points all fall exactly on the composite wave. Two example
component waves are depicted in figure 4.3(b). The amplitude of each wave represents
the contribution of that particular frequency to the composite wave shown in figure
4.3(a). How the influences of these two example frequencies compare to the rest is
depicted in figure 4.3(c). The first frequency is 0, and represents an intercept term.
The rest of the amplitudes are symmetrical as per a restriction of the FFT algorithm.
50
Figure 4.3: Example of spectral decomposition of one waveform. In panel a thecomposite wave returned by the transformation is plotted along with the originalmean-centered intensity values. As the transformation disregards range information,the wave composition has been re-translated back into the range scale. In panel b,two examples of the 60 component waves are drawn. In panel c, the amplitudes of all60 component waves are plotted against their frequency.
51
The fft function in the R programming language (R Development Core Team 2009)
was used to compute the FFT. In figure 4.3(c), the amplitudes of the component
waves from figure 4.3(b) are shown with solid dots.
The frequencies of the component waves used by the FFT algorithm are deter-
mined entirely by the number of sample points. In order to ensure that the same
frequencies are used by the transformations of two time series, one series must con-
tain the same number, or a power of 2 multiple of the number, of samples in the other
series. As mentioned above, 60 samples were kept from each waveform so that this
condition could be met. However, this 60 sample waveform, stretching about 9 m will
cross the boundary of some tree crown outlines. This means that some waveforms
will contain data for parts of neighboring trees. We decided that no action would be
done to correct for this due to the added difficulties this would create.
As waveforms in this study were restricted to have 60 samples each, the number
of amplitude values returned by the FFT algorithm is then also 60. The influence of
frequencies above a given level, known as the Nyquist frequency, cannot accurately be
measured. As a result, with 60 samples per waveform, we can consider the amplitudes
of only the first 30 frequencies to be useful. Not coincidently, this is the point at which
the amplitudes start to mirror those of lower frequencies in panel c of figure 4.3. This
mirroring is a restriction used by the algorithm to keep the number of variables, from
being greater than the number of sample points.
For each tree, the averages (across all waveforms hitting a tree) for each of these
30 useful amplitudes were stored as variables named with a leading “M” followed by
the frequency identification number (M1 through M30). Additionally, the standard
deviations of each frequency were recorded as variables V1 through V30. Additionally,
the average intensity value was kept for each waveform. This easily computed value
represents the total amount of light reflected from each pulse. The average and
standard deviation of these values for each tree were recorded as MI and VMI. In total,
for each tree in the training data set, there are 62 variables that will be considered
52
for use in classification as described in the next section.
The FFT algorithm assumes an equal time period between samples, however in
some cases the range values within each pulse data are not equally spaced. To greatly
simplify analysis, these facts were ignored, as a violation of this assumption is not
much cause for concern in this case. The displacement of an occasional sample point
should have little impact on the results. In most cases the difference in intensity
between two neighboring samples is very small. We are also ignoring that our series
is not of a periodic origin, as are our sinusoidal basis functions. It does little harm to
pretend that our series repeats itself in both directions ad infinitum.
4.3.4 Classification
To apply the FFT information to classify tree species, we used a classification tree
approach (Breiman et al. 1984; page 18). Classification tree algorithms recursively
split the data into two parts based on a value of the most locally powerful predictor
variable. The R library tree contains a function of the same name for modeling
with these regression and classification trees (Venables and Ripley 2002; page 266).
Given a class variable as a response and a list of potential predictor variables, the
function will compute a tree of “appropriate” size. Here appropriate is determined
by an internal algorithm. Splits are added to the tree’s branches sequentially, until
a very large tree is produced. At each split, the variable that most reduces the tree
deviance under a multinomial model is chosen. Cross-validation is used to determine
the optimal tree size, as too large of a tree will overfit the data, and too small a tree
will perform poorly.
We attempted to correctly classify all trees belonging to three native hardwood
species: red alder, black cottonwood, and bigleaf maple. These species represent
common hardwood species that grow naturally in the arboretum, and therefore are
represented well in the field data. Figure 4.4 shows the classification tree obtained by
fitting the entire training dataset. Each fork represents the optimal separation of the
53
Figure 4.4: The result of a classification tree fit to the full-density training data.Listed above each split is the variable and level used to split the remaining trees intotwo subgroups. The reduction in tree deviance resulting from each split is shown inthe bottom left. Leaf deviances are shown in parenthesis, actual class memberships insquare brackets. The variables M6, M12, and M26 correspond to the mean influenceof the wavelengths 1.80, 0.82 and 0.36 m, respectively.
54
remaining trees into two sub-groups. The variable and split value used is shown atop
the fork. Each end node is labeled with the species most represented in the group of
trees that have not been eliminated when traversing the tree from the root. Below
each leaf the deviance within that leaf and actual class membership are presented.
The total tree deviance is the sum of the individual leaf deviances, and the reductions
in tree deviance as each split is added are shown in a table in the bottom left of the
figure. One branch of this tree was “pruned” because both leaves predicted the same
species.
With limited training data available, the trained model could not be applied to a
separate dataset for validation. Therefore, a cross-validation technique was used to
estimate the actual predictive power of this technique on new data. The species of each
tree was determined from a tree trained from all the other trees in the training data.
The numbers involved in such a process makes refinement of each tree unpractical,
and thus each tree was built from the built-in defaults of the tree function. In a
non-academic application, the tree building could be better optimised and this may
result in slight improvements in the classification accuracy.
4.3.5 Data reductions
To test the technique at different levels of data density, the original dataset was sys-
tematically reduced. Within the waveform data for each tree in the training data, we
divided the waveforms into five groups of waveforms representing every fifth wave-
form. Each group started from one of the first five waveforms. For example the
first group was comprised of the first waveform, sixth waveform, eleventh waveform,
and so on. Thus the reduced datasets were created by sequentially removing these
groups starting from the fifth group and down to the second group. The resulting new
datasets then contained 80, 60, 40, and 20 % of the original waveforms. Data densities
represented by these datasets were approximately 8, 6, 4, and 2 lppm, respectively.
For each of these reduced datasets and the original dataset the classification process
55
described above was performed, and results were recorded in order to be put into
table form for subsequent comparison.
4.4 Results and Discussion
Table 4.1 shows the classification results for the full-density dataset. The overall
classification accuracy, or the portion of correctly classified trees, was 75 %. The
associated κ value was 0.615. For individual species, 70 % of maples, 82 % of cotton-
woods and 71 % of alders were correctly classified. While differences in leaf reflectance
likely play a part in the results, we believe that differences in tree structure lead to
stronger classification.
As presented in a previous paper (Vaughn et al. 2010), the wavelengths most often
chosen by the classification tree algorithm as partitioning variables are 1.80, 0.82
and 0.36 m. These wavelengths likely correspond differences between species across
different components of a tree. For example the leaves of bigleaf maple are typically
about 15-30 cm in width, while the leaves of red alder and black cottonwood are
much smaller. This difference might be expected to show up in shorter wavelengths,
and this is the case in the example tree shown in figure 4.4. The variable M26 helps
distinguish red alder from bigleaf maple. The two longer wavelengths may represent
differences in branch to branch distance and leaf retention rates between the species.
Red alder has fairly thin leaves allowing more visible light through, and may be able
to retain more leaves further into the canopy than the other species.
The full-density results were surprising as we had expected the red alders and
cottonwood trees to be more easily separated from the bigleaf maples. This is because
the maples represented in the dataset are nearly all open-grown and the leaves of
bigleaf maple are a much larger target than those of other trees. The larger leaves
should provide at least a more consistent first peak height. The cottonwoods and red
alders are growing in closer proximity and appear to have similar growth forms to the
naked eye under this condition. However, judging from an near-infrared raster image
56
Table 4.1: Results of the classification when each tree species is predicted from aclassification model incorporating all other trees using the full-density dataset.
PredictedSpecies bigleaf maple cottonwood red alder Producer accuracy (%)bigleaf maple 7 2 1 70.0cottonwood 2 14 1 82.4red alder 1 4 12 70.6User accuracy (%) 70.0 70.0 85.7 75.0†† This number represents overall accuracy.
created from the discrete point data, red alder and black cottonwood also have very
similar reflectance of the near-infrared wavelength used by the RIEGL LMS-Q560.
This implies that structural differences between the two species contributed highly to
the classification. As branch to branch distances may play a role, we wonder if tree
growth rate would affect the results. More needs to be done to figure out what features
of the tree are contributing most to the classification abilities of this technique.
Table 4.2 shows how the technique responded to different densities of waveforms.
Amazingly, the 80% density resulted in improved performance. Additionally, the
20% density provided greater classification ability than both the 60% and the 40%
densities. The kappa values behaved in a similar manner, as expected when the same
trees are used for all densities. These results suggest that the height, and perhaps
the speed, at which a waveform Lidar mission is flown should not greatly affect the
results of this classification technique as much as the would affect techniques that
may rely on high point cloud density.
We believe that the demonstrated robustness of the technique to data density is
due mainly to the fact that the stronger classification variables are actually sample
means. The sample mean is a very efficient estimator of a population mean, and large
numbers of samples are not needed to get a fairly good estimate. Therefore, whether
the mean is calculated from thousands of waveforms or simply hundreds, the sample
57
Table 4.2: The classification results under systematically reduced datasets represent-ing 100, 80, 60, 40 and 20 % of the original density of waveforms.
Density Species Producer accuracy† User accuracy Kappa(%) (%) (%)100 all 75.0 0.62
bigleaf maple 70.0 70.0cottonwood 82.4 70.0red alder 70.6 85.7
80 all 81.8 0.72bigleaf maple 90.0 75.0cottonwood 82.4 73.7red alder 76.5 100.0
60 all 63.6 0.43bigleaf maple 50.0 62.5cottonwood 64.7 52.4red alder 70.6 80.0
40 all 54.6 0.29bigleaf maple 30.0 37.5cottonwood 76.5 56.5red alder 47.0 61.5
20 all 65.9 0.48bigleaf maple 60.0 54.6cottonwood 70.6 66.7red alder 64.7 73.3
† For all species this number represents overall accuracy.
58
mean should be very close. However, such a reduction of data should not be expected
to have no effect as the sample mean can be affected by extreme outliers in the data.
If, when collecting a waveform Lidar dataset, one ends up with a higher proportion
of unusual waveforms, the results would suffer accordingly.
There are several technical difficulties that were overlooked in order to more di-
rectly test the efficacy of the FFT for species detection. Finding methods to address
any of these difficulties will likely improve upon the results presented here. First, we
did nothing to account for the differences in scan angle between trees and species.
Because tree structure differs horizontally from vertically, scan angle likely plays a
part in the dominant wavelengths that are seen in the FFT of the waveform data.
The dataset does not provide enough coverage to test this technique for a standard
range of scan angles for all trees. Figure 4.5 shows the full range of scan angle for
each tree in the training data. In this figure correctly classified trees are indicated by
filled-in symbols. There is a clear discrepancy in the number of correct classifications
in the trees with generally higher scan angles on the right side of the figure.
A second simplification was the reduction of all waveforms to exactly 60 samples
due to requirements of the FFT. There were two cases where this might severely affect
results. The first case is the loss of data because more than 60 samples were available
in a given waveform. About 62, 32, and 5 % of waveforms contained 60, 120, and
180 samples respectively. Another case is when waveforms cross crown outlines, such
that only a portion of the samples contained in the waveform pertain tot he given
tree. In this case some of the waveform data for a tree is actually describing other
trees of unknown species. These simplifications may have a drastic effect. However,
a standardization of the number of samples in a waveform is by far the easiest way
to ensure that the component frequencies modeled by the FFT are the same across
all waveforms.
59
Figure 4.5: The range of scan angles found for each tree in the dataset. Dotted linesrepresent the full range of scan angle, while solid lines represent the 25th to the 75thpercentiles. Trees correctly classified in the full dataset are represented with filled-insymbols at the tree’s median scan angle.
60
4.5 Conclusion
The technique described above, despite some simplifications, shows much promise.
One important feature is that, despite significant reduction in data density, the tech-
nique did not respond with large decreases in effectiveness. This is due largely to
the sample mean being a highly efficient estimator of population mean. As such, any
modification of this technique that incorporate other statistics may not scale as well
as we have seen here. However, the number of samples in a waveform should be the
same despite the details of the Lidar acquisition, and methods that rely only on the
density within a waveform should see similar results. Standardization of scan angles
along with future increases in crown segmentation accuracy should not affect this
feature of the method, and will likely lead only to improvements in accuracy.
61
Chapter 5
INCREASING INDIVIDUAL TREE SPECIES
CLASSIFICATION ACCURACY BY INCORPORATING
RAW WAVEFORM LIDAR DATA
5.1 Introduction
Information about individual tree species can be extremely beneficial when estimating
many forest values from remote sensing data. Unfortunately, detection of individual
tree species using remote sensing data has proven to be a difficult task to accomplish.
Species is only one factor that affects the realized shape and color of a tree crown,
while other factors such as location, competition, and simple genetic variation have
large influence as well. As a result, there is significant overlap between species for
most variables that one can measure from remote sensing data. Due to the difficulty
of obtaining sufficient classification accuracy, species information is commonly disre-
garded or alternative methods are found (Korpela et al. 2007). One such alternative
is to impute species information from the ground data observations matched to each
crown segment (Breidenbach and Næsset 2010). This technique is a step forward, but
further improvement should be possible.
Knowledge of the probable species of individual crown regions identified in the data
would enable us to stratify model estimates by species. This will most likely increase
precision in any stand-level estimates of interest. With this goal in mind, researchers
have continuously tested new forms of remote sensing data seeking improvements
in detecting stand- and individual tree-level species information. As remote sensing
technology and computer algorithms have improved, so have the classification results
achieved.
62
With the advent of Lidar, many aspects of a forest inventory can now be accom-
plished using this data (Hyyppa et al. 2004, Næsset et al. 2004). While multi-spectral
(Leckie et al. 2003) and hyperspatial data (Brandtberg 2002) have worked well in the
past for identifying species information, the purchase of additional datasets is likely
to be outside budget constraints. Ideally, the use of a Lidar dataset alone would be
sufficient to achieve the necessary species identification accuracy.
Many authors have investigated the potential of Lidar data, sometimes mixed with
additional data sources, for species classification. Because each individual situation
presents a unique arrangement of challenges, the overall results have been mixed.
While the number of variables one can compute from discrete point Lidar data is
infinite, there are only a few concepts that can be represented by these variables.
These concepts are: crown density, crown shape, crown surface texture, and received
energy from individual peaks. Most authors have incorporated variables from more
than one of these concepts.
Crown density describes the leaf and branch size and arrangement and is typically
measured using proportions of the returns hitting different classes of objects (Brandt-
berg 2007, Moffiet et al. 2005). Crown shape information is often compared using
parameters of surface models fit over the top of the Lidar point cloud (Vauhkonen
et al. 2009, Reitberger et al. 2008, Holmgren and Persson 2004). The distribution of
return heights, often described using select percentiles of the return heights (Korpela
et al. 2010b, Ørka et al. 2009), includes information about both crown density and
crown shape. Crown surface texture refers to the roughness of a tree crown, and has
been measured using a canopy height model (Vauhkonen et al. 2009). The instan-
taneous light energy received by the sensor when each peak is detected is typically
referred to as intensity. The measured intensity is affected by several physical traits
such as leaf size, chemistry, and incident angle, which are all affected by species.
While most authors incorporate this intensity information, both Ørka et al. (2007)
and Kim et al. (2009) found that intensity alone could be a reasonable predictor of
63
species.
In the last half-decade, a newer format of Lidar information, commonly referred
to as “waveform” or “fullwave” Lidar, has slowly increased in availability. In contrast
to the more common discrete point Lidar systems, this newer Lidar system takes ad-
vantage of increased processor speeds and data storage capacity by digitally sampling
at a high rate the return signal received at the sensor. The result mimics the appear-
ance of a wave, and an example of such a waveform can be seen in figure 5.1. If the
waveform shown in figure 5.1 were to be passed through an onboard peak detector,
the result might resemble the two exes immediately following the peak crests.
While a few authors have looked to waveform data for improving classification
accuracy the first step has always been to decompose the waveforms into discrete
peaks, nearly matching the information one can get from discrete point data. One
advantage of this technique is that information about peak shape can be preserved.
The shapes of these peaks have successfully been used in distinguishing vegetation
from other surfaces (Wagner et al. 2008). Additionally, pulse width or cross section
information has been helpful in classifying deciduous from coniferous species (Hollaus
et al. 2009, Reitberger et al. 2008). Little work has been done to see if patterns
within the original waveform data, prior to peak decomposition, provide information
for species classification.
In a previous paper (Vaughn et al. 2010), we showed that Fourier transformation of
the waveforms crossing each crown led to moderate accuracy while classifying three
hardwood species. In this work, no two- or three-dimensional information about
crown structure computed from a discrete point array was included. However, the
amplitude of a rather high frequency component of the Fourier transforms played an
important role in distinguishing two of the species. This frequency was high enough
that even high-density discrete point Lidar data could not contain such information.
The purpose of this paper is to validate the results of the original paper as well as
further test the importance of waveform Lidar in determining tree species. The latter
64
Figure 5.1: An example waveform and associated discrete return points. The 60 wave-form samples are shown as circles and a spline fit to these data appears as a solid grayline. A peak detector might detect two peaks at about 345 and 348 meters and returnthe intensity value when the peak is detected as shown with exes. Without knowledgeof future sample values, real time peak detection algorithms usually produce a slightlag in peak location.
65
is done by testing classification performance both before and after the addition of
raw waveform information to a full suite of crown density, crown shape, crown surface
texture and intensity metrics.
5.2 Methods
5.2.1 Waveform Acquisition
Waveform data were obtained by Terrapoint, USA during the evening of August 7th,
2008 over the University of Washington Arboretum in the city of Seattle, Washing-
ton. Sensor altitude above canopy surface ranged from 145 to 412 meters with mean
distance of 310 meters. Scan angle varied from -30 to positive 30 degrees from zenith.
Pulse frequency was 100 thousand pulses per second. The majority of the arboretum
was covered in one loop in the North-South direction. As this was a sample flight
with little planning, significant gaps exist between segments of the flight line. As a
result many of the trees with field data are in the margins of the swath area. Overall
data density averaged about 10 pulses per square meter near nadir at ground level.
5.2.2 Field Data
Our field data were collected in a slightly different manner as one would collect infor-
mation if an inventory was required. We segmented tree crowns from the Lidar data
prior to visiting the field so that we could verify that each tree matches its associated
data segment. Several trees of five species were collected in this manner to ensure
that our data were as clean as possible. This segmentation and field data collection
took part in three steps:
1. All waveforms were decomposed into individual peaks using a simple peak-
detection algorithm. This point information was indexed into a voxel array
structure.
66
2. A segmentation algorithm was used on the voxel array data to map the volume
of space occupied by clusters of voxels representing individual tree crowns.
3. Outlines of these clusters were used to locate the trees on the ground and identify
the species.
Creation of the Voxel Array
A simple peak finding algorithm was performed on the waveform data after deconvo-
lution with the Richardson-Lucy algorithm (Lucy 1974). The range at maximum for
each peak found within the waveforms was used to compute an x, y, z position. Two
additional pieces of information were kept for each peak. First was the total energy
of the peak, or the sum of the intensity values for all waveform samples occurring
during the defined peak. Second, we recorded the total range duration of the peak.
This resulted in a fairly heavy discrete point dataset. We used a three-dimensional
(voxel) grid overlaid on the volume of interest with horizontal dimensions set at one
meter and vertical dimension set at 0.75 meters. A single file was created to contain
the grid location of all peaks occurring within this grid. The file header also stored
information about which voxels, referenced by layer, row and column, contain points.
The voxels with one or more points were used to segment out the individual tree
crowns as well as to compute statistics for species classification.
Crown Segmentation
In order to obtain three-dimensional crown information about each tree crown, we
created a voxel-based segmentation algorithm. Under this three-dimensional region
growing algorithm, individual layers of the voxel array are read one at a time starting
with the topmost layer. Individual voxels from each layer are added to new or existing
voxel-clusters depending on their distance from these existing clusters. The ability of
a cluster to incorporate a new voxel depends on current number of member voxels,
67
vertical center of mass of these member voxels, as well as distance from the new
voxel. Several parameters allow for control over how large clusters can become. This
algorithm is outlined in greater detail in chapter 6.
Collection of Tree Species
Using the voxel clusters produced in the last step, we created a GIS layer containing
the two-dimensional outlines of each crown. The crown outline data were placed on
a field computer with a built-in GPS receiver. Current position in the field was used
to match voxel cluster outlines to the specimens of individual trees of five native
species: Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco) (DF), western redcedar
(Thuja plicata Donn ex D. Don) (RC), black cottonwood (Populus balsamifera L.
ssp. trichocarpa (Torr. & A. Gray ex Hook.) Brayshaw) (BC), bigleaf maple (Acer
macrophyllum Pursh) (BM), and red alder (Alnus rubra Bong.) (RA). The first two
of these are coniferous (CO) species, and the last three are deciduous hardwood (HW)
species.
Clusters which contained parts of multiple trees, as well as those that contained
only part of a single tree, could be identified in the field. These clusters were split
along vertical planes or combined as necessary back at the office. This work was done
in March of 2011, during which hardwoods were still in a leaf-off condition. To avoid
scan angles too far from nadir, we stayed within 60 meters of the flight line. In doing
so, we were able to identify 22 to 29 individuals of each species, totaling 130 trees.
Most conifers were large enough that crown segmentation was clear. However, we
had some difficulty separating crowns of hardwood species growing in close proximity.
Because we desired certainty that only a single tree crown is represented by a cluster,
we skipped a small number of trees which could not be accurately deciphered. Table
5.1 lists statistics describing the height distribution by species of the trees in the final
training data set.
68
Table 5.1: Tree height statistics by species of the trees contained in the training data.
Species Count Min. 25th Pct. Median 75th Pct. Max.(m) (m) (m) (m) (m)
Figure 5.3: Loadings of the first three principal components of the Fourier interquar-tile range variables (q1 to q30).
Table 5.2: Coefficients from the canonical correlation procedure for the first twocanonical variates of both datasets. Mean and standard deviation of all variables areincluded for reference.
Coefficient CoefficientVar. Mean S.D. U1 U2 Var. Mean S.D. V1 V2
The first two pairs of canonical variates were fairly highly correlated, with correlations
of 0.98 and 0.90, respectively. The following six pairs had correlations of 0.50 or lower.
The high amount of correlation between the first two pairs demonstrated that there
is some overlap of information among the two datasets. In other words, a portion
of information from the Fourier transforms of the waveforms can be obtained from
patterns from the discrete points extracted from the waveforms.
Coefficients from the first two rotations are shown in table 5.2. These first rotations
result in the variates U1, U2, V1, and V2. Given the coefficients and means given in
the table, values of U1 are most influenced by the variables cm1, m0 and q0. These
variables are all related to the amount of energy received by the sensor in the Lidar
instrument. Similarily, the intensity means, i1 to i3 show strong influence in both
V1 and V2. This result is nearly as visible by just looking at the correlation between
some of these variables alone. In fact, i1 shares a correlation of 0.85 with cm1 when
the two variables are compared directly. Surface point density, ptop, played little part
in either V1 and V2, suggesting that this information may not be obtainable from the
waveform Fourier transformations directly. On the other hand, d12, d13, and d23 do
play a significant part in V1 and V2, suggesting that some of this information overlaps
between the two variable sets.
5.3.2 Classification Results
The combined variables from the discrete point and waveform datasets worked fairly
well for the classification of the five species. An overall accuracy of over 85 percent
was achieved, which compares quite highly with related projects. Table 5.3 is the
confusion matrix for the classification of all species using all variables. Cottonwood
(BC), maple (BM) and Douglas-fir (DF) seem to be most easily separated from the
rest of the species as well as each other. The largest pairwise confusion occurred
79
Table 5.3: Confusion matrix for the classification of all five species using all availablepredictor variables from both the discrete point and waveform data.
between alder (RA) and cedar (RC), a hardwood and a conifer. Similarly, confusion
between RA and both conifers was greater than that between the three hardwood
species. While there was some confusion between the conifers, it was all in one
direction; no DF were predicted to be RC.
Table 5.4 gives the results, as overall percent accuracy, for all classifications using
each predictor group. In all but two cases, the addition of the twelve Fourier trans-
formation variables improved the accuracy over the eighteen point variables. In the
five species classification, the addition increased the overall accuracy achieved by over
six percent (8 of the 130 trees in the dataset). The Liddell test procedure returned a
test statistic value of 2.40 and a one-sided p-value of 0.0384. This indicates that there
is a very low probability that including the Fourier transformation variables did not
actually improve classification accuracy.
Of the point-derived variables, no single group seemed to perform best in all
situations. In fact, each individual group seemed to have species for which it was
highly important. The relative height percentiles in group a were best for separating
BC and RA. The point intensities in group b did well in BC versus BM, but performed
the best on the conifers. Inter-peak distance measures in group c worked the best in
differentiating BM from BC and RA. The crown roughness and permeability variables
80
Table 5.4: Overall percent classification accuracy results of the support vector machineapplied with a five-fold cross validation to different predictor variable groups andspecies groups.
Species classification groupCO BC BC BM DF
Pred. group All HW HW BM RA RA RC(%) (%) (%) (%) (%) (%) (%)
Miscellaneous constants used in the clustering algorithm:
#Maximum nuber of rows/columns and layersMAXRC = 4096MAXLAY = 256NA_VAL = -999#Smallest mass that a neighbor must have to incorporate new voxelMIN_MASS = 1.0#Radius of neighbor search windowNEIGHBORHOOD = 8#Radius of area around current voxel for mass calculationMASS_WINDOW = 3#Minimum number of cells a cluster’s direction must cross to combineCROSS_CELLS = 4#Minimum percent of r,c that a cluster cube must overlap to combineOVERLAP = 60#Minimum height above ground for ipointat member functionCROSS_HEIGHT = 2.0#Minimum radius of mass search window around each cluster centroidMIN_RADIUS = 2.0#Minimum number of cluster member cells to avoid deletionMIN_CLUST_SIZE = 50#Minimum height in map units of cluster to avoid deletionMIN_HEIGHT = 5#Minimum ratio of cluster height to width to avoid deletionMIN_RATIO = 0.8 # At least as tall as 80% of width#A (width) and BO (x0) for logistic modifier function FHMASS_WT_WIDTH = 4.6 #Makes logistic slope ~ 2.0MASS_WT_X0 = 7.0#A and BL, FLmax and FLmin for mass radius modifier FLRAD_MULT_WIDTH = MASS_WT_WIDTHRAD_MULT_X0 = 3.0RAD_MULT_MAX = 1.5RAD_MULT_MIN = 1.0#ConstantsSQRT_PI = 1.7724538509055159AW = 9.1902397002691796 # ln(99) - ln(1/99) = 2ln(99)
133
A.2 File cellfunctions.py
Miscellaneous functions:
import numpy as Pfrom constants import *
#Functions to convert L, R, C values to a single index, and back againdef cellindex(L,R,C):
#return (L * MAXRC + R)*MAXRC + Creturn (R * MAXRC + C) * MAXLAY + L
def get_neighbors(r,c,radius,grid):’’’Find clusters that have cells within radius of r, c(rows, cols) = grid.shaperrange = [max(r-radius,0),min(r+radius+1,rows-1)]crange = [max(c-radius,0),min(c+radius+1,cols-1)]nc = list(P.unique(grid[rrange[0]:rrange[1],crange[0]:crange[1]]))try:
nc.remove(NA_VAL)except:
passreturn nc
def ptdist(pt1, pt2, res):’’’distance between two numpy arrays’’’return P.linalg.norm((pt1-pt2)*res)
def cubeoverlap(cube1,cube2,inclayer=True):’’’counts the number of cells in common with two cubes’’’#rowsrows = set(range(cube1[0][1],cube1[1][1]+1)).intersection(
return len(rows)*len(cols)*(len(lays) if inclayer else 1)
def Area(rad,dis):’’’Area of one side of lens shape of overlap (both sides notequal unless radii are equal)’’’rad**2 * P.arccos(dis/rad) - dis*P.sqrt(rad**2-dis**2)
def circleoverlap(r1,r2,dist):
134
’’’Gives the area of overlap between two circles. Processes eachargument as an array’’’R = float(max(r1,r2))r = float(min(r1,r2))dist = abs(float(dist))if dist >= R+r:
def logistic(x,width=1.0,x0=0.0):’’’A logistic attenuation function used in several places in theclustering algorithm’’’return 1.0/(1.0 + P.exp(AW/width*(x-x0)))
135
A.3 File cellcluster.py
A python class to hold voxel-members and compute various attributes of those voxel-
members. This class inherits from the set class:
import numpy as Pfrom constants import *from cellfunctions import *
’’’Contains the voxels associated with the cluster ID in a binarytree structure, and contains methods to work with these voxels’’’def __init__(self, L, R, C, id):
ind = cellindex(L, R, C)if ind >= 0 and ind <= MAXLAY*MAXRC*MAXRC:
set.__init__(self)#self.cells = B.btree(ind)self.top = P.array([L, R, C])#self.count = 1self.id = idself.big = P.array([L,R,C])self.small = P.array([L,R,C])self.mycentroid = Noneself.addcell(L, R, C)self.neighbors = set()self.vertborder = 0 #1 if west border, 2 if east borderself.horizborder = 0 #1 if south border, 2 if north borderself.deleted = 0self.combined = Falseself.combinedintome = 0self.imergedinto = -1return
else:return None
def _setbig(self, L, R, C):’’’Set the big corner’’’if L > self.big[0]: self.big[0] = Lif R > self.big[1]: self.big[1] = Rif C > self.big[2]: self.big[2] = Creturn
def _setsmall(self, L, R, C):’’’Set the small corner’’’if L < self.small[0]: self.small[0] = Lif R < self.small[1]: self.small[1] = Rif C < self.small[2]: self.small[2] = Creturn
def _updatecent(self, L, R, C):tmparray = P.array([float(L),float(R),float(C)])if self.mycentroid is None:
self.mycentroid = tmparrayelse:
self.mycentroid = \
136
(self.mycentroid*(len(self)-1) + tmparray) / len(self)def setvborder(self, west = True):
self.vertborder = 1 if west else 2def sethborder(self, south = True):
self.horizborder = 1 if south else 2def addcell(self, L, R, C):
’’’Add a new cell to the cluster’’’ind = cellindex(L, R, C)self.add(ind)self._setbig(L, R, C)self._setsmall(L, R, C)self._updatecent(L, R, C)return
def contains(self, L, R, C):’’’True/False: does cluster contain cell’’’return cellindex(L, R, C) in self
def cube(self):’’’Return two (L,R,C) points defining the cube enclosing thecluster’’’return (self.small, self.big)
def flatten(self):’’’Return an array of 0s and 1s indicating if row and columnare occupied in any layer by a cell in cluster’’’dims = self.dims()rows = dims[1]cols = dims[2]tmparray = P.zeros((rows,cols),’int’)for item in self:
(l, r, c) = lrcfromindex(item)tmparray[self.big[1]-r,c-self.small[2]] = 1
return tmparraydef ratio(self,res):
’’’Computes the ratio of height to average of radius’’’dims = self.dims() * resreturn dims[0] / P.dot(dims,P.array((0,0.5,0.5)))
def combine(self, othercluster):’’’Will take all cells from another cluster and place it intothis one’’’oldcount = len(self)for item in othercluster:
’’’Return the radius at which cells can be added as the radiusof a circle with the same area as my average number of cellsper layer, times a multiplier. The multiplier decreases frommaxmult depending on the number of layers {x} in me so farwith form:mult = 1/(1+exp(slope*(heightrange-x0)))The minimum multiplier is minmult’’’dims = self.dims()area = float(res[1]*res[2]*len(self))/float(dims[0])mult = (maxmult-minmult)*logistic(dims[0]*res[0],slope,x0) +
minmultreturn max(P.sqrt(area)/SQRT_PI,MIN_RADIUS) * mult
’’’Computes the overlap area between my radius and amasswindow around cell L, R, C. A weight is created that goesfrom 1 to 0 as distance from my centroid increases, with form:wt = 1/(1+exp(AW/width*(x-x0)))These weights times the overlap are summed for each layer inmy cube’’’#Get my representative radiusrad = self.myradius(res, rmmin, rmmax, rmwidth, rmx0)#Get cell horizontal distance from my centroiddistxy = ptdist(P.array((R, C)), self.mycentroid[1:], res[1:])#How much do the two circles overlap at this distancelap = circleoverlap(rad, masswindow, distxy)#Get an array of layer distances from my centroidlrange = range(self.small[0],self.big[0]+1)vecdl = P.abs(P.array(lrange) - L)#Array of weightsvechwt = logistic(vecdl,width,x0)dwt = logistic(distxy,width,x0)return P.sum(lap*vechwt)*dwt
def centroid(self):return self.mycentroid
def direction(self):’’’Computes the vector of top - centroid’’’return P.array(self.mycentroid - self.top)
def dims(self):’’’Gets and array of cube dimensions, (L, R, C)’’’return self.big - self.small + 1
def pointatcells(self,buff = 0):’’’Count the number of grid cells that are intersected by mydirection vector (+/- optional buffer)’’’dirvec = self.direction()#Check for values that would mess us upif dirvec[0] == 0:
return set()
138
indexset = set()#Get multiplier for direction vector to move one layerratio = abs(1/dirvec[0])r = int(self.mycentroid[1])c = int(self.mycentroid[2])l = int(self.mycentroid[0])-1checklayers = range(1,int(self.mycentroid[0])+1)for lay in checklayers:
pt = self.mycentroid + dirvec*lay*ratioif pt[1] < 0 or pt[1] > MAXRC or pt[2] < 0 or \
’’’Count how many records in indices match records inmyself’’’#count = 0#for index in indices:# if index in self: count += 1#return countreturn len(self.intersection(indices))
139
A.4 File crownseg.py
Python script to do 3-d voxel-based clustering using the cluster class described above,
functions in uppercase are pseudofunctions that need to be written for the voxel data
structure in use:
import osimport sysimport argparse as Aimport numpy as Pfrom constants import *from cellfunctions import *from cellcluster import cluster
parser = A.ArgumentParser(prog=’crownseg2’,description=(’Reads a voxel file computing’
action=’store_false’,dest=’dodelete’,help=’Delete invalid clusters before writing’)
parser.add_argument(’-C’, ’--nocombine’,action=’store_false’,dest=’docombine’,help=’Combine clusters before writing’)
parser.add_argument(’-b’, ’--bottom’,action=’store’,dest=’bottomlayer’,type=int,help=’Layer at which to stop algorithm’)
parser.add_argument(’-t’, ’--top’,action=’store’,dest=’toplayer’,type=int,default=None,help=’Layer to start algorithm’)
parser.add_argument(’--minrow’,action=’store’, dest=’minrow’,type=int,default=None,help=’Row to begin area of interest’)
140
parser.add_argument(’--maxrow’,action=’store’, dest=’maxrow’,type=int,default=None,help=’Row to end area of interest’)
parser.add_argument(’--mincol’,action=’store’, dest=’mincol’,type=int,default=None,help=’Col to begin area of interest’)
parser.add_argument(’--maxcol’,action=’store’, dest=’maxcol’,type=int,default=None,help=’Col to end area of interest’)
args = parser.parse_args()
#Read some information from the voxel filevoxelfile = OPENVOXELFILE(args.voxelfilename)
#Header of voxel file should contain#MinX - lower left easting#MinY - lower left northing#MinZ - elevation of bottom of voxel structure#CellWidthX - width of columns#CellWidthY - height of rows#LayerHeight - height of layers#NumRows - number of rows#NumCols - number of columns#numLayers - number of layers
voxelheader = GETVOXELHEADER()
#Ensure that rows/columns/layers are validif voxelheader.NumRows > MAXRC:
print ’Too many rows:’,voxelheader.NumRowssys.exit(3)
if voxelheader.NumCols > MAXRC:print ’Too many columns:’,voxelheader.NumColssys.exit(3)
if voxelheader.NumLayers > MAXLAY:print ’Too many layers:’,voxelheader.NumLayerssys.exit(3)
#If we only want to work with a region of voxel structure (specified#in arguments mincol, minrow, etc.), calculate these boundaries
except Exception as exc:print ’Could not create abovegrid. Exception:\n’,excsys.exit(4)
#Number of voxel cells per layerCELLS_PER_LAYER = voxelheader.NumRows*voxelheader.NumCols
#######################################################################BUILDING#######################################################################Create the list of clustersthe_clusters = []num_cluster_ids = 0num_clusters = 0
#OKAY, now we go through layer by layer (from top) reading#the locsectionfor l in range(startlayer,stoplayer-1,-1):
#Each layer holds CELLS_PER_LAYER cells with a list of points per#row/column combination (cell)curlayer = READVOXELLAYER(voxelfile,l)
##Go through rows and cols updating clustersfor r in range(startrow,stoprow+1):
for c in range(startcol,stopcol+1):gridr = r - startrowgridc = c - startcol#Check if cell has pointsif curlayer[r*voxelheader.NumCols + c].NumPoints > 0:
#If a cluster has already claimed (R,C),#add this cell to that clusterif clustergrid[gridr,gridc] != NA_VAL:
if args.docombine or args.dodelete:print ’There are’,num_clusters,’clusters, \
doing some combining/trimming’else:
print ’There are no clusters, stopping’sys.exit(0)
#######################################################################TRIMMING#######################################################################Go through clusters to see if they can be combined/removed#combine if it ’points’ to (direction vector aims at) a neighbor,#and neighbor has mass belowold_num_combined = -1if args.docombine:
while num_combined != old_num_combined:print ’Starting combination pass, combined so far:’, \
num_combinedold_num_combined = num_combinedfor cluster in the_clusters:
if cluster.deleted > 0 or cluster.combined:continue
#Look at direction vectorcube = cluster.cube()cent = cluster.centroid()direc = cluster.direction() #Dlayers, Drows, Dcolumns#If my radius is greater than 7m, don’t botherif cluster.myradius(resarray, \
#if cluster direction is near vertical (<~20deg), and#top-cent is > than 5m, and bottom is below height#cutoff, then leave it aloneif cluster.small[0] < CROSS_HEIGHT and \
continue#Okay, we can join with neighbornum_added = n.combine(cluster)cluster.imergedinto = n.idcluster.combined = Truecluster.deleted = 100num_combined += 1num_clusters -= 1#Update clusters that had me as a neighbor to have#new neighbor n. My own neighbors are added to n with#call to n.combine()for neighbor in cluster.neighbors:
nn = the_clusters[neighbor]if nn.deleted > 0 or nn.combined:
continue#Take my name off list, if it is therenn.removeneighbors([cluster.id])nn.addneighbors([n.id])
#Stop looking for more neighbors to combine withbreak
#if it does not combine, remove clusters that are too small or too#flatif args.dodelete:
for cluster in the_clusters:if not cluster.combined:
#See if we meet min cell countif len(cluster) < MIN_CLUST_SIZE:
/********************************************************************//*function to compute the mean squared error between two vectors *//*data and reference are vectors of length n *//********************************************************************/double meansqdist(double data[], double reference[], int n) {int i = 0;double sumdist = 0.0;for (i=0; i<n; i++) {
/********************************************************************//*Performs Rich-Lucy deconvolution until a specified maximum change *//* is found *//*d is original data, nd length of d *//*psf is point spread function, np length of psf *//*maxiter is maximum number of iterations *//*tol is maximum mean distance between two iterations *//*out is final deconvolved data, iter is number of iterations used *//********************************************************************/void richlucy(const double d[], const int *nd, const double psf[],
const int *np, const int *maxiter, const double *tol,double out[], int *iter) {
for (index=2; index<count; index ++) {/*change in intensity*/diff = intens[index] - intens[index-1];/*Keep old derivative as prevderiv*/prevderiv = deriv;/*New deriv is diff unless it is close to 0, then set it to 0.0*/deriv = diff*diff > 0.01 ? diff : 0.0;
/*If intensity is != 0 and this is not a trough (neg. deriv thenpos. deriv)*/
if (intens[index] > 0.1 && !(deriv > 0.1 && prevderiv < 0.1)) {/*If we found a "peak", note it’s height and location
Peak is pos. deriv then neg. deriv*/if (deriv < 0.1 && prevderiv > 0.1) {
found_peak = 1;big_index = index;
}/*Add intensity to peak mass*/cur_mass += intens[index];
} else {/*If a peak just ended, record it’s dimensions and location*/if( found_peak && cur_mass >= MIN_MASS ) {
R language implementation of the procedure used to select only waveforms that hit
the tree of interest first.
##Function to determine if waveform crosses starts near and goes thru a##cell in the given cluster. Assumes that scan angle is steep enough##that either top of cell or bottom of cell will be hitpointsintree <- function(clayers,crows,ccolumns,x,y,z,vx,vy,vz) {if(length(x) != length(y) ||
length(y) != length(z)) {stop("x,y,z lengths do not match")
}if(length(vx) != length(vy) ||
length(vy) != length(vz)) {stop("vx,vy,vz lengths do not match")
}if(length(clayers) != length(crows) ||
length(crows) != length(ccolumns)) {stop("clayers,crows,ccolumns lengths do not match")
stop("Direction vectors do not match starting points")}
##Recompute Z as a height above DEM##getz pulls the DEM height of location x, ynewz <- z - getz(x,y)newz[which(!is.finite(newz))] <- -10000
##Get the uppermost layer of cells (set T)tops <- aggregate(clayers,list(Row=crows,Col=ccolumns),max)lt <- length(tops$x)
##distance of each start to center of each cluster cell##in top layer dim: lp X lt##getColX, getRowY and getLayZ return the map coordinates##of the voxel centerdist <- sqrt(
newz[rows2do] - getLayZ(tops$x[cols2do])+0.375)##X, Y, Z of each close ray when it reaches top of each close##cell (dim: length(close) X 3)crosshigh <- cbind(x[rows2do],y[rows2do],newz[rows2do]) +
##X, Y, Z of each close ray when it reaches bottom of each close##cell (dim: length(close) X 3)crosslow <- cbind(x[rows2do],y[rows2do],newz[rows2do]) +
out <- as.data.frame(matrix(0,length(indices),9))names(out) <- c("Layer","Row","Column","bottom","east",
"north","south","top","west")out$Layer <- Layersout$Row <- Rowsout$Column <- Columns##For top and bottom, we add/subtract one layer worth (Nrows*Ncols)##of cell indicesout$bottom[which((indices - Nrows*Ncols) %in% indices)] <- 1out$top[which((indices + Nrows*Ncols) %in% indices)] <- 1##Neighbors to the north or south add or subtract one rowout$south[which((indices - Ncols) %in% indices)] <- 1out$north[which((indices + Ncols) %in% indices)] <- 1##Indices of neighbors to the east/west differ by just oneout$west[which((indices - 1) %in% indices)] <- 1out$east[which((indices + 1) %in% indices)] <- 1out
}
153
B.5 File predvars.r
This is the code used to compute the predictor variables in chapter 5. Three datasets
are used, comb is a data frame containing the Layer, Row and Column address of each
voxel listed by CLUSTERID, which is the integer name given to each cluster of voxels
resulting from the segmentation algorithm. The data frame allpoints contains all
information about each peak-point derived from the waveform data, indexed by the
Layer, Row, and Column of which voxel they are contained within. Due to size, the
waveform data are saved in individual files for each cluster. A simple function to
compute the area of convex hulls, used for the rarea variable, is included.
##Function to compute area of a convex polygon##Splits polygon into triangles using polygon vertices and centroid##Sum area of these trianglespolygonarea <- function(x,y) {##Make sure equal lengthif (length(x) != length(y)) stop("Unequal vector lengths")lp <- length(x)##Test if first point is same as last point, if so discardif (x[1] == x[lp] && y[1] == y[lp]) {
x <- x[-1]y <- y[-1]lp <- lp-1
}centroid <- c(mean(x),mean(y))x.c <- x - centroid[1]y.c <- y - centroid[2]area <- 1/2*abs(x.c[-lp]*y.c[-1] - x.c[-1]*y.c[-lp])sum(area)
}
##Create a new data frame to hold all variablesnewdata <- fielddata[,1:3]##Height percentiles (group a)newdata$h90 <- newdata$h75 <- newdata$h50 <- newdata$h25 <- 0##Peak attributes (group b)newdata$m3 <- newdata$m2 <- newdata$m1 <- 0##(group c)newdata$d23 <- newdata$d13 <- newdata$d12 <- 0newdata$lambdahat <- 0##Portion of near crown hits (group d)newdata$ptop <- 0##Hullarea to flatten numcells (group d)newdata$rarea <- 0##neighbor stats with 0.5m voxels (group e)newdata$pnt05 <- 0newdata$pnb05 <- 0
154
##Only one side neighbor (group e)newdata$pn105 <- 0##surface modeling variables (group f)newdata$sa <- 0; newdata$sb <- 0
names(fmedians) <- paste(sep="","D",2:31)names(fiqrs) <- paste(sep="","Q",2:31)##Median and IQR of originalmivi <- matrix(0,length(fielddata$CLUSTERID),2)colnames(mivi) <- c("MI","QI")
##For each cluster ID in the field datafor (cnum in seq(fielddata$CLUSTERID)) {##Save the cluster IDid <- fielddata$CLUSTERID[cnum]##comb contains all voxels for each cluster (tree)WC <- which(comb$CLUSTERID == id)##allpoints contains the peak-points for each cluster, indexed by##layer, row, and columnWP <- which(allpoints$Cluster==id)
##0)Compute fft median and iqr##Load the waveform data file for this cluster##These are to big to have all loaded at oncefname <- paste(sep="","/drive2/FULLWAVE/wavedumps/waves_",
fielddata$CLUSTERID[cnum],".dumped")load(file=fname)##Remove samples that occur earlier than 50 meters##I think these represent measurement of the outgoing pulsewaves <- waves[which(waves$Range > 50),]##Remove overflows of intensity (recorded as 15434), set to 0waves$Intensity[which(waves$Intensity > 15000)] <- 0##Perform fft on first 60 samples (I think fft function pads##with 0’s to 64 to meet 2^n size requirement).ffts <- as.matrix(aggregate(Intensity~Wave,waves,
##Group a################################Percentiles of point heightsmaxht <- max(allpoints$Z[WP])sz <- allpoints$Z[WP]/maxhtnewdata[cnum,c(’h25’,’h50’,’h75’,’h90’)] <-
quantile(sz,c(0.25,0.5,0.75,0.9))
##Group b##############################intensity of ordered peaksnewdata$m1[cnum] <-
##Group c#############################################Order the points by pulse and peak numbersortoder <- order(allpoints$GPSTime[WP],allpoints$Peaknum[WP])sortpoints <- allpoints[WP[sortorder],c(2:5,8)]##pulses that have a first peak (occuring within tree)hasfirst <- sortpoints$GPSTime[which(sortpoints$Peaknum == 1)]##have second peakhassecond <- sortpoints$GPSTime[which(sortpoints$Peaknum == 2)]##have third peakhasthird <- sortpoints$GPSTime[which(sortpoints$Peaknum == 3)]##Have both necessary for computationhas12 <- intersect(hasfirst,hassecond)has13 <- intersect(hasfirst,hasthird)has23 <- intersect(hassecond,hasthird)
##mean 1-2 distance##which peaks are 1st in pulse with 1st and 2nd peaks in treefirstgood <-
which(sortpoints$GPSTime %in% has12 & sortpoints$Peaknum == 1)##which peaks are 2nd in pulse with 1st and 2nd peaks in treesecondgood <-
which(sortpoints$GPSTime %in% has12 & sortpoints$Peaknum == 2)##difference between ranges of these peaksdists <-
##Exponential model of waiting time between peaks:##Start with no consecutive peaks (all false)alldists <- logical(length(sortpoints$GPSTime))##Add those which are ...add <-
which(##consecutive peaks ...diff(sortpoints$Peaknum)==1 &##and in same pulse.sortpoints$GPSTime[-1] == sortpoints$GPSTime[-length(WP)]
)alldists[add] <- TRUE##Compute distance between these consecutive peakscons <-waits <- which(alldists)
##Fit an exponential model to these distances (waiting times)newdata$lambdahat[cnum] <- fitdistr(waits,"exponential")$estimate
##Group d##########################################Set T of all top voxels by row, columnW6 <- which(comb$Layer[WC] >= 6)top6 <- aggregate(comb$Layer[WC][W6],
##Get outline of first 8 layers, calculate p/a ratiotoplayer <- max(comb$Layer[WC])top8 <- aggregate(comb$Layer[WC][which(comb$Layer[WC] > toplayer-8)],
list(row=comb$Row[WC],col=comb$Col[WC]),length)
hull <- chull(id,numlayers=8)##Get ratio of polygon area to number ofnewdata$rarea[cnum] <-
length(top8$row) / polygonarea(hull[,1],hull[,2])
##Group e######################################################Re-index points to a 0.5x0.5x0.5 meter voxel array##Z, Y, and X range of points (in map coordinates)lrange <- range(allpoints$Z[WP])rrange <- range(allpoints$Y[WP])crange <- range(allpoints$X[WP])##Break these ranges into 0.5m groups by ...##layer ...lseq05 <- seq(0,ceiling(lrange[2]),by=0.5)lcut05 <- as.numeric(cut(allpoints$Z[WP],
##Group f##################################Crown surface model parameters##Get centroid of the highest eight layerscenter8 <- c(L=mean(comb$Layer[WC[top8]]),
##Compute difference in height for each voxeldz <- toplayer - top6$x##Compute distance from center (xydist) and angle (rad)ydist <- top6$Row-center8[2]xdist <- top6$Col-center8[3]rad <- atan2(ydist,xdist)xydist <- sqrt(xdist^2 + ydist^2)
##Fit our model to this surfacecrownmod <- nls(dz~exp((base + slope*sin(rad-shift))*xydist)-1,
start=c(slope=1,base=0,shift=0))##Keep the parametersnewdata$sa[cnum] <- coef(crownmod)[’base’]newdata$sb[cnum] <- coef(crownmod)[’slope’]
}
159
B.6 File liddelltest.r
R language code for a Liddell’s exact one-sided probability value for a test of change
in a two-level factor when the individuals are measured twice (paired). This is similar
to a McNemar’s test, however the McNemar’s test is approximate. This test was used
in the Methods section of chapter 5. The two arguments, before and after must
be factors (with the exact same two levels) of the same length. Note that there is no
error checking code to verify correct arguments.
liddell.pval <- function(before,after) {##Create 2x2 contingency table:## after## before FALSE TRUE## ~~~~~~~~~~~~~~~~~~~~~~~~## FALSE a b## TRUE c d## ~~~~~~~~~~~~~~~~~~~~~~~~cont <- table(before,after)
## r is max of c or b, s is minr <- max(cont[1,2],cont[2,1])s <- min(cont[1,2],cont[2,1])##Compute F-statisticF <- r/(s+1)##Right-tailed probability under F dist##with 2*(s+1) and 2*r degrees of freedompf(F,2*(s+1),2*r,lower=FALSE)
}
160
VITA
Nicholas Vaughn was born in San Diego, California. Interest in forestry in general
led to a move North, where he received his Bachelor of Science in Forest Management
from Humboldt State University in Arcata, California. After working for a brief few
years in the research branch of the United States Forest Service in Redding, California
he moved North yet again to obtain a Master of Science and, eventually, a Doctor of
Philosophy from University of Washington in Forest Resources.