-
Information Extraction from Remotely Sensed Images From Data to
Information
Data refers to numerical results of any set of measurements
regardless of whether or not the measurements are acquired with a
certain purpose in mind. Information is an aggregate of facts so
organized or datum so utilized as to be knowledge or intelligence.
Data has to be transformed so as to derive Information. The process
of transforming data into information is known as Information
Extraction. Information Extraction can be of three types:
1. Manual 2. Semiautomatic 3. Automatic
Information is the Key Exploring the image has almost superseded
the image itself. The information derived from imagery is what
provides earth resource managers with the information they need to
make decisions the best place for a new dam, the height of flood
defenses and so on. Remote Sensing is all about information the
tangible result of all the effort that goes into building and
operating an earth observation platform is a set of measurements of
the Earth system from space, from which we can derive information
of economic, social, strategic, political or environmental value.
As sensors are continually being developed and refined so
image-processing tools have to change to ensure that new data can
be fully exploited. The last 26 years has been characterized by
steadily increasing spatial accuracy and the increase of microwave
SAR but what affect will it have on how we process it? Many more
sensors are planned for the next few years, all of which will
inevitably require new processing tools. By far the most consistent
trend has been the improvement in spatial resolution of optical
images since the 80 meter Multi Spectral Scanner on board Landsat
1. These will be as sharp as 50 centimeters in the case of
Quickbird. Whilst higher resolution enables greater identification
of small objects, it causes traditional land classification
techniques to become unreliable because the contributions of
different material types within the pixel distort the pixel
spectrum from that of the material of interest, often resulting in
a loss of discrimination and potential misclassification. Other
technique for addressing these have been tried in the past, such as
neural networks, but with limited success and reliability and is
hence used very little commercially.
Where are we heading for
-
The era of 1-meter satellite imagery presents new and exciting
opportunities for users of spatial data. With Space Imagings IKONOS
satellite already in orbit and satellites from EarthWatch Inc.,
Orbital Imaging Corp. and, of course, ISRO scheduled for launch in
the near future, high resolution imagery will add an entirely new
level of geographic knowledge and detail to the intelligent maps
that we create from imagery. Geographic imagery is now widely used
in GIS applications worldwide. Decisions made using these GIS
systems by national, regional and local governments, as well as
commercial companies, affect millions of people, so it is critical
that the information in the GIS is up to date. In most instances,
what aerial or satellite imagery provides is the most up to date
source of data available, helping to ensure accurate and reliable
decisions. However, with technological advancements come new
opportunities and challenges. The challenge now facing the
geotechnology industry is two fold - how best to fully exploit
high-resolution imagery and how to get access to it in a timely
manner. Is high-resolution imagery making a difference? There is no
doubt that the GIS press has been deluged with high-resolution
imagery for the last few years. Showing an application with an
imagery backdrop provides an immediate visual cue for readers.
Without the imagery backdrop, the context is lost and the basic
map, comprising polygons, lines and points becomes more difficult
for the layman to interpret. It is the context or visual clues that
provide the useful information and it is this information that is
the inherent value of the imagery. The higher the resolution of the
imagery, the more man made objects that can be identified. The
human eye the best image processor of all can quickly detect and
identify these objects. If the application is therefore one that
just requires an operator to identify objects and manually add them
into the GIS database, then the imagery is making a positive
difference. It is adding a new data source for the GIS Manager to
use. However, if the imagery requires information to be extracted
from it in an automated and semi automated fashion (for example, a
land cover classification), it is a different matter. If the same
techniques that were developed for earlier lower resolution
satellite imagery are used on the high-resolution imagery, (such as
maximum likelihood classification), the results can actually create
a negative impact. Whilst lower resolution imagery isnt affected
greatly by artifacts such as shadows, high-resolution data can be.
Lower resolution data also smoothes out variations across ranges of
individual pixels, allowing statistical processing to create
effective land cover maps. Higher resolution data doesnt do this
individual pixels can represent individual objects like manhole
covers, puddles and bushes - and contiguous pixels in an image can
vary dramatically, creating very mixed or confused classification
results. There is also the issue of linear feature extraction.
Lines of communication on a lower resolution image (such as roads)
can be identified and extracted as a single line. However, on a
high-resolution image, a road comprises the road markings, the road
itself, the kerb (and its shadow) and the pavement (or sidewalk). A
very different method of feature extraction is therefore
needed.
-
Its not just the spatial resolution that can affect the usage of
the imagery. With 11 bit imagery becoming available, the ability of
the GIS to work with high spectral content imagery becomes key. 11
bit data means that up to 2048 levels of grey can be stored and
viewed. If the software being used to view the imagery assumes it
is 8 bit (256 levels), then it will either a) display only the
information below the 255 level (creating either a black or very
poor image) or b) try to compress the 2048 levels into 256, also
reducing the quality of the displayed image considerably. Having
2048 levels allows more information in shadowy areas to be
extracted as well as enabling more precise spectral signatures to
be defined to aid in feature identification. However, without the
correct software, this added bonus can easily turn into a problem.
Information Extraction from Remotely Sensed Images:
Geoinformation extraction using image data involves the
consuction of explicit,meaningful descriptions of physical objects
(Ballard & Brown, 1982).When performing analysis of complex
data one of the major problems stems from the number of variables
involved. Analysis with a large number of variables generally
requires a large amount of memory and computation power or a
classification algorithm which overfits the training sample and
generalizes poorly to new samples. Feature extraction is a general
term for methods of constructing combinations of the variables to
simplify these problems while still describing the data with
sufficient accuracy. Best results are achieved when an expert
constructs a set of application-dependent features. All approaches
usually include object recognition i.e. interpretation using
eye-brain/computer system and object reconstruction i.e. coding,
digitizing,sructuring.
It can be used in the area of image processing which involves
using algorithms to detect and isolate various desired portions or
shapes (features) of a digitized image or video stream. Generally
approaches for information extraction using image processing
technoques may be grouped as follows:
Low-level
Edge detection Corner detection Blob detection Ridge detection
Scale-invariant feature transform
Curvature
Edge direction, changing intensity, autocorrelation.
Image motion
Motion detection.
-
Shape Based
Thresholding Blob extraction Template matching Hough transform
(Lines, Circles/Ellipse, Arbitrary shapes -Generalized Hough
Transform)
Flexible methods
Deformable, parameterized shapes Active contours (snakes)
Given below in scompiled form is various terminology used in
context of geoinformation extraction particlarly from image
data:
Scene Part of the visible world that one would like to
describe
-
INFORMATION EXTRACTION FROM REMOTELY SENSED IMAGES
Data acquisition and data updating are important aspects in
developing and maintaining Geographical Information Systems (GISs).
The spatial data in most existing GISs are derived from existing
maps through digitization. This method is prone to errors, and the
accuracy of the data derived from the existing maps is relatively
low, especially temporally. Photogrammetric measurement is another
important method for data acquisition. The data produced using this
method can have good spatial accuracy. However, the method is
relatively expensive, as it needs precision photogrammetric
instruments and well-trained professionals. Therefore, methods of
obtaining spatial data for GISs efficiently and precisely have
become a focus of photogrammetric research. Feature Extraction The
automatic extraction of information from aerial photographs and
satellite images is a major requirement of the new digital based
technology in photogrammetry. While a number of tasks such as DEM
and orthophoto determination can be achieved with a large degree of
automation, the extraction of linear and other features must still
be undertaken manually. Research described below aims to develop
methods of incorporating a greater level of automation in these
tasks. Semi-automatic Feature Extraction The semi-automatic method
for the extraction of linear features on remotely sensed images in
2D and 3D, is based on active contour models or 'snakes'. Snakes
are a method of interpolation by regular curves to represent linear
features on images. The initial feature extraction is achieved by
image processing operators, such as the Canny operator for single
edges, and morphological tools for narrow linear features of 1 or 2
pixels in width. The approach developed is semi-automatic, and
hence is assisted by an operator to locate a selection of points
along and near, but not necessarily exactly on the feature. The
iterative computation then locates the feature as closely as the
details in the image will allow by an optimisation process, based
on the definition of the snakes by cubic B-splines. The features
are extracted on single images by 2D snakes in terms of the local
image coordinates, or in 3 dimensions using overlapping images in
terms of 3D object coordinates. Tests of the method applied to
aerial photography and SPOT satellite images have been carried out
in terms of the accuracy of the extracted features and pull-in
range, for a range of features in 2 dimensions, and in 3 dimensions
in terms of their object coordinates derived from photogrammetric
measurements and from maps. Automatic Feature Extraction In
contrast to semi-automatic methods, automatic road extraction aims
at locating a road in images without input from an operator on its
initial position. Locating a road in images auto-matically has two
tasks, ie, recognition of a road and determination of its position.
Recog-nizing a road in an image is much more difficult than
determining its
-
position as it requires not only the information which can be
derived from the image, but also a priori knowledge about the
properties of a road and its relationships with other features in
the image and other related knowledge such as knowledge on the
imaging system. Due to the complexity of aerial images and
existence of image noise and disturbances, the information derived
from the image is always incomplete and ambiguous. This makes the
recognition process more complex. A knowledge-based method for
automatic road extraction from aerial images has been developed in
this laboratory. The method includes bottom-up hypothesis
generation of road segments and top-down verification of
hypothesized road segments. The generation of hypotheses starts
with low-level processing in which linear features are detected,
tracked and linked. The results of this step are numerous edge
segments. They are then grouped to form the structure of road
segments based on the general knowledge of a road, and the
generated structures of road segments are represented symbolically
in terms of geometric and radio-metric attributes. Finally,
applying the knowledge stored in the knowledge base to the
generated road structures hypothesizes road segments. As hypotheses
of road segments are generated in a local context, ambiguity is
unavoidable. To remove spurious hypothesized road segments, all
hypotheses are checked in a global context using the topological
information of road networks, which is derived from low-resolution
images. The missing road segments are predicted using topological
information of road networks. This method has been applied to a
number of aerial images with encouraging results.
-
EXTRACTION OF POINTS General Principles for Point Extraction
Definition : Points are image objects, whose geometric properties
can be represented by only two coordinates (x, y). One can
distinguish between several types of points. A circular symmetric
point is a local heterogeneity in the interior of a homogeneous
image region. CSPs are too small to be extracted as regions
(depending on the image scale) and are characterised by properties
of circular symmetry (e.g., peaks, geodetic control point signals,
man-holes). CPSs can be interpreted as region attributes; they do
not affect the image structure. Endpoints (start point or end point
of a line), corners (intersection of two lines) and junctions
(intersections of more than two lines) are used for the geometrical
description of edges and region boundaries. Missing of these points
can cause fatal consequences for the symbolic image
description.
REPRESENTATION: The symbolic description of points can be given
as a list containing geometric attributes (the coordinates),
radiometric attributes (e.g., strength) and relational attributes
(e.g., the edges, intersecting at this point). APPLICATIONS Major
applications for extracted image points are image-matching
operations. Assuming that extracted points refer to significant
points in the real world, we can look for the same real point in
two images taken from a different view. This technique is used for
image orientation (PADERS et al. 1984) or DTM-generation (e.g.,
KRZYSTEK 1991).
-
BASIC APPROACHES Here we only review approaches that solely use
the image data (one could also think of point extraction methods,
which determine junctions or intersections from already extracted
contours). Three prominent methods are: Point template matching
Corner detection based on properties of differential geometry Point
detection by local optimization Deriving the point coordinates,
normally follows a three-step procedure: in the first step point
regions are selected, applying a threshold procedure. These are
image regions where points are supposed to lie inside. In a
subsequent step the best point pixels within these regions are
selected this operation could be referred to as thinning. An even
more accurate determination of the point position can be derived by
a least squares estimation (LSE), so in this step we look for the
real valued coordinates of points. Point Templates One possibility
to detect point regions is to define a point pattern (template),
which represents the point structure we are looking for. The main
idea of template matching is to find the places in the image where
the template fits best in the image. The similarity between the
template and the image can be evaluated by multiplication of the
template values with the underlying image intensities or by the
estimation of the correlation coefficients. Disadvantages of
template matching in general are the limitation by the number and
types of templates, and sensitivity to changes in scale and to
image rotation (assuming that the template are rotational
invariant). Corner Detection by Curvature Let us assume that the
image data is stored in an image function g(r,c), r refers to the
row of the image, c to the column. Several approaches are based on
the curvature of g, which can be expressed by the second partial
derivatives in the coordinates axes r and c. The sign of the
curvature can be used for the classification of the pixels and for
the detection of corners. An overview and evaluation of these
approaches can be found in (DERICHE AND GIRAUDON 1990). Point
Detection by Optimization: MORAVEC (1977) was the first who
proposed an approach aiming at detecting points, which can be
easily identified and matched in stereo pairs. He suggested
measuring the suitability or interest of an image point by the
estimation of the
-
variances in a small window (4x4, 8x8 pixels). This method is
used in many stereo matching algorithms and initiated further
investigations leading to the interest operators proposed by
PADERES et. Al (1984) and FORSTNER AND GULCH (1987). Similar to the
Moravec-Operator, the objective of these operators is the detection
of adequate points (but with higher accuracy). Adequate points are
those which meet the two criteria of (1) local distinctness (to
increase geometric precision) and (2) global uniqueness (to
decrease search complexity), in figure the Forester-Operator is
able to detect different point types with the same algorithm and
can be used either for image matching or image analysis
approaches.
-
Interest-operator in a 1-D case: Image matching can be reduced
to a one-dimensional problem, using the epipolar geometry of two
images. In this case the aim is to match two intensity profiles.
The effect of the interest operator in 1-D is identical to finding
the zero crossings of the Laplacian, neglecting saddle points of
the intensity function.
-
EXTRACTION OF EDGES
General Principles for Edge Extraction : DEFINITION Referring to
BALLARD AND BROWN 1983, ROSENFELD AND KAK 1982, NALWA 1993 and edge
is an image contour, where a certain property like brightness,
depth, color or texture (see Fig.11a) changes abruptly
perpendicular to the edge. Moreover, we assume that on each side of
the edge the adjacent regions are homogeneous in this property.
According to these characteristics, edges can be classified into
two general types, step edges (edges) and bar edges (lines)
Edges represent boundaries between two regions. The regions have
two distinct (and approximately constant) pixel values; e.g., in an
aerial image two adjacent agricultural fields with different land
use. Lines either occur at a discontinuity in the orientation of
surfaces, or they are thin, elongated objects like streets in a
small-scale image. The latter may appear dark on bright background
or vice versa. When the scale is large the street appears as an
elongated 2-D region with edges on both sides. To avoid conflicts
in the symbolic image description it might be necessary to make an
explicit distinction between edges and lines. REPRESENTATION Edges
extraction usually leads to an incomplete description of the image,
i.e. edges do not build closed boundaries of homogeneous image
regions. The types of representation of single edges are manifold
depending on the intended use. The symbolic description of edges
can be given, e.g. as a list, containing geometric, radiometric
(e.g. strength, contrast) and relational attributes (e.g. adjacent
regions, junctions, etc.). The geometric attributes depend on the
choice
-
of the approximation function (see step 5 below). For linear
edges it is sufficient to specify the start and endpoint.
APPLICATIONS Contrary to points as image features, one can argue
that a list of all edges in an image contains all the desired image
information, but its representation is much more reduced and is
easier to be interpreted by a computer. To support this statement,
consider again the image in Figure 3a. Just by looking at the edges
it is possible to recognize the object. If in addition each had
stored the brightness of its left and right adjacent region, the
information would be even more complete. Another justification
could be based on information theory, COVER AND THOMAS (1991)
wrote: the less a certain structure can be found in an image, the
more unexpected it is. This means that an unexpected structure
contains much more information than a frequent one (like
homogeneous regions). Edges can be used to solve a broad range of
problems because their importance, some of them are: Relative
orientation: Edge-based matching in stereo pairs are applied for
relative orientation, e.g. L1 and SCHENK (1991) use curved edges.
Absolute orientation: Matching edges with wire frame models of
buildings can be used for absolute orientation. Object recognition
and reconstruction: In many cases object models consist of
structural descriptions of object parts. Straight lines often bound
parts of man-made objects. The structural description based on edge
extraction provides besides its completeness highest geometrical
accuracy. Models about the expected shape of object boundaries can
be involved easily in the process, e.g. searching for straight
lines. Therefore, extracting edges is widely used for object
recognition. BASIC APPROACHES Both edges types can be detected by
the discontinuity in the image domain and in the following we will
make no distinction between these types as long as it makes no
difference for the algorithm. Since the beginning of digital image
processing, edge detection has been an important and very active
research area. As a result, a lot of edge detectors have been
developed, which differ in the image or edge model they are based
on, the complexity, the flexibility and the performance. In
particular, the performance depends on 1) the quality of detection,
i.e. the probability of missing edges and yielding spurious edges
and 2) the accuracy of the edge location. Unfortunately both
criteria are conflicting.
-
Even a short description of all approaches is beyond the scope
here, so we only outline the principles by looking at the main
processing steps most edge detector algorithms have in common. A
typical approach consists of five steps: Extraction of edge
regions: Extraction of all pixels, which probably belong to an
edge. The result is elongated edge regions. Extraction of edge
pixels: Extraction of the most probable edge pixels within the edge
regions reducing the regions to one pixel wide edge pixel chains.
Extraction of edge elements (edgels): Estimating edge pixel
attributes, e.g. real valued position of the edge pixels, accuracy,
strength, orientation, etc. Extraction of streaks: Aggregation or
grouping of the edgels that belong to the same edge. Extraction of
edges: Approximation of the streaks by a set of analytic functions,
for example polygons. In the following section the main objectives
and the most common techniques of each step will be mentioned. Edge
Regions The aim of this step is to extract all pixels from an input
image, which are likely to be edge pixels. The extraction could be
done by template matching, by parametric edge models or by
gradients. Starting from an image with the intensity function g,
the result is a binary image where all edge pixels are labeled. In
addition, iconic features, e.g. the edge magnitude and the edge
direction, of each edge pixel are extracted and stored as they are
required in subsequent steps. Template Matching: Edge templates are
patterns, which represent certain edge shapes. For each edge type
(different edge models, different edge directions, edge widths and
strengths) a special pattern is required. Operators can be found
e.g. in ROSENFELD AND KAK (1982). Gradient Operators (Difference
Operators): The main idea of these approaches is that in terms of
differential geometry the derivatives of an image intensity
function g can be used to detect edges, which is more general than
template matching procedures. The first step is to apply linear
filters (convolution) to obtain difference (slope) images. The
slope images represent the components of the gradient of g; from
these the edge direction and edge strength (magnitude) can be
calculated for each pixel.
-
The convolution of the image with one of the many known
difference operators is followed by a threshold procedure for
distinguishing between the heterogeneous image areas, i.e. pixels
with high gradients and the homogeneous area, i.e. pixels with low
gradients (see Sec. 2.3). All pixels above a certain threshold are
edge region pixels. Parametric Edge Models: An example for a
parametric solution of edge detection is Haralicks Facet Model
(HARLICK AND WATSON 1981), which can be used either for edge
detection or for extracting regions and points. The idea is to fit
local parts of the image surface g by a first order polynomial f
(sloped planes or facets). Three parameter , and represent the
facet f, which can be evaluated by least squares estimation. The
model is given by g (r, c) = r + c + + n (r,c), where and , are the
slopes in the two coordinate axes r and c, the altitude of the
facet and n(r,c) the image noise. HARALICK AND SHAPIRO 1992 showed
that the result of this approach is identical to the convolution
with a difference operator. The classification of edge pixels is a
function of the estimated slopes (,) : if the slopes are greater
than a given threshold and, in addition, the variances are small
enough (to avoid noisy image areas, which are assumed to be
horizontal), the pixel belongs to an edge region.
-
Edge Pixels Due to low contrast, image noise, image smoothing,
etc. the first step leads to edge regions, which are possibly more
than one pixel wide. The aim of this step is, to thin the edge
regions to one pixel wide edge chains. These pixels should
represent the real edges with highest probability. Assuming the
real edge is located in the mid-line (skeleton) of the edge
regions, thinning or skeleton algorithms can be applied. Obviously
these midlines of edge areas are not necessarily identical to the
real edges. To improve the accuracy of edge location, the
properties of the pixel like the gradient or the Laplacian may be
used for extracting the most probable location of the edges. This
can be done by the analysis of the local neighbourhood of each
pixel (non-maxima-suppression) or by global techniques (relaxation,
Hough transformation). The non-maxima-suppression is the most
widely used method.
Non-Maxima-Suppression: The process consists of two steps: 1)
The selection of the neighbour pixels in the gradients direction,
which have to be used for the comparison; 2) Suppression of pixels,
which are found to have lower gradient magnitudes than their
neighbours. An example for (1) is given by CANNY (1983) : his
algorithm is defined in a N8-Neighbourhood (see Fig. 7 and Fig.
7a). Given an edge pixel (r,c) and its gradient direction g
perpendicular to the edge e, the first step is the estimation of
the two points P1 and P2. The gradient magnitudes for P1 and P2 can
be approximated by a simple linear interpolation of the gradient
magnitude of the two adjacent pixels. The location of the edges
also can be determined analyzing the zero-crossings of the
Laplacian. One problem is, however, that zero crossings occur both
at the extreme of the gradient function and at saddle points of g.
The saddle points should be neglected.
-
After the selection of the edge pixels by
non-maxima-suppression, the edge areas are in most cases reduced to
thin lines. Due to the discrete image raster and image noise, edge
regions might occur which are still more than one pixel wide. In
this case subsequent thinning is required.
Edge Elements The extraction of edgels is the first transition
stage from the edge pixels in the discrete image domain to the
symbolic description of the edge. This step contains the estimation
of properties of the edge pixels required for subsequent
interpretation processes (e.g. real values coordinates, contrast,
sharpness, strength, type) and which are stored as attributes of
the symbolic edge elements.
-
Edge Streaks The next step is to group all edgels, which belong
to the same edge. One can say that now the real detection of the
image feature edge happensbut the real edge is represented as a
list of edgesl. The aggregation of the edge elements can be done
using local (edge tracking) or global techniques (Hough
transformation, dynamic programming, heuristic search algorithm).
The grouping process should ensure that each streak 1) consists of
connected edgels, where each pixel pair is connected by a
non-ambiguous pixel path and 2) delineates at most regions (usually
edges delineate two regions, except dean lines or open edges, which
are surrounded by the same region). To satisfy the second criterion
we define a streak as an edge pixel chain between two edge pixels,
which are either end pixel(s) and/or node pixel(s). According to
the number of neighbours in a N8-Neighbourhood we classify the
pixels as node, line or end pixels as shown in Fig. 9. Given the
classification, the easiest aggregation method is an edge following
or edge tracking algorithm: first one has to look for an unlabeled
edge pixel, which means, that this edge pixel does not yet belong
to an edge. If you found one, you track all direct and indirect
neighbours until en end-or node pixel appears. All these collected
edge pixels belongs to one edge and will be labeled with a unique
edge number.
-
Edge Approximation Up to now, the extracted streaks are still
defined in the discrete image model as they are represented by a
set of connected edge elements. Thus, for deriving a symbolic
description of the edges a last processing step is required. This
step is very important since the representation domain changes from
the discrete image raster to a continuous image model, the
plane.
It is not obvious how to approximate a list of edgels by an
analytic representation. For example, you could apply curve-fitting
techniques like splines, Bezier curves, Fourier series, etc. This
may give you smooth curves and probably better visual
-
results, but it would be too much hassle if you only look for
straight lines. Furthermore a polygon as a set of straight lines
can also approximate a curved edge. As usual, the choice of the
approximation depends on what you want (or the application
requires). Here we look at straight-line fitting. Approximation by
Straight Lines: For the approximation of the edges by straight
lines many different approaches are possible like merging,
splitting or split and merge algorithms. The critical point is to
find the breakpoints or corners, which lead to the best
approximation. The merging algorithm sequentially follows an edge
and considers each pixel to belong to a straight line as long as it
fits the line. If the current pixel does not fit anymore, the line
ends and a new breakpoint is established. A disadvantage of this
approach is its dependency on the merging order: starting from the
other end of the edge would probably lead to different breakpoints.
Splitting algorithms divide recursively the edges in (usually) two
parts, until the parts fulfill some fitting conditions. Considering
an edge consisting of a sequence of edge pixel P1, P2,..Pn then P1
and Pn being the end points are joined by an arc. For each pixel on
that arc, the distance to the edge is calculated. If the maximum
distance is larger than a given threshold the edge segment is
divided into two new segments at the position where the maximum
distance was found. It is possible to combine the advantages of the
merging the splitting methods by developing a split and merge
algorithm. First we split, and then we do a merging step by
grouping lines if the new line fits the streak well enough, see
Fig. 10. The accuracy of the symbolic description, i.e. the edge
parameters can be improved applying a least square estimation
taking all edgels belonging to one edge into account. The
observation values are given by the real valued coordinates (xI,
yI) of each edgel and the weights are defined by e.g. the squared
gradient magnitude. The covariance matrix of the estimated edge
parameters contains the accuracy of the edge. Thus, the uncertainty
of the discrete image information is preserved in the accuracy of
the edges, which could be important for the image interpretation
processes.
-
Extraction of Regions
General Principles for Region Extraction DEFINITION Regions are
image areas, which fulfill a certain similarity criterion, we call
such regions blobs. A similarity or homogeneity criterion could be
intensity value of the image pixel or some texture properties of
the surrounded area of the pixel. The result of such a region
extraction should divide or segment the image to a number of blobs.
Ideally the union of these blobs will give the image again. The
regions themselves should be connected and bounded by simple lines.
REPRESENTATION Depending on the strategy of the region extraction,
we distinguish between different segmentation results. Incomplete
segmentation: The image is divided into homogeneous and
heterogeneous area first. The latter (we call those areas
background) do not fulfill the homogeneity criterion and therefore
do not fulfill the above definition exactly. Complete Segmentation:
The image is completely divided into regions, fulfilling the
definition as given above for the discrete image, too. That might
yield to conflicting topology of the image regions, depending on
the definition of the neighborhood (N8 or N4) (see PAVLIDIS 1977)
but also to inaccurate region boundaries, depending on the cost of
the approach. The final symbolic representation of blobs consists
of geometric, radiometric and relational attributes. A blob itself
can be represented by its boundaries (if the blob contains holes,
the blob has more than one boundary) or by a list of pixels inside
the blob. Blob boundaries define the location of the blob.
Representing blob boundaries is equivalent to representing image
edges. Geometric attributes of blobs are size, shape, center of
gravity, mean direction, etc.) Algorithms for extracting these
attributes can be found in literature, particular in the field of
binary image analysis. Radiometric attributes are e.g. mean
intensity within the blob, variances of the intensities, texture
parameter. Lists of adjacent blobs mutual boundaries, junctions and
corners are examples for relational attributes.
-
APPLICATIONS Region information has the advantage that it covers
geometrically large parts of the image. Therefore it can be used
for several applications like compression or interpretation tasks.
Data compression: Grouping all pixels, which are connected in image
space and have similar properties to one object (i.e. the blob) and
representing the object by characteristics attributes, reduces the
amount of data and the redundancy of information. Analysing range
images: Region-based segmentation algorithms were found to be more
robust when analysing range image. Binary image analysis: In many
cases region extraction is a prerequisite for binary image
analysis, widely used in industrial applications. High-level image
interpretation: in many case object models consist of the
structural description of object parts, where the interior of each
part is assumed to have similar surface and reflectance properties.
Therefore, extracting blobs and their attributes is quite useful
for object recognition. BASIC APPROACHES Given a digital image with
a discrete image function, region extraction is the process of
grouping pixels to regions according to connectivity and similarity
(homogeneity). The large amount of region extraction methods can be
classified in several ways. One possibility is to separate the
methods by the number of pixels, which are used for the grouping
decision and are therefore called local or global techniques.
Further on we distinguish the methods depending on where the
grouping is done: In the first place the grouping process is
defined in the image domain. That means, that the decision that
connected pixels can be merged or should be split is done directly
by the analysis of the properties of adjacent pixels. Thus, both
the similarity and the connectivity are considered in one
processing step. Examples of this types are: region growing or
region merging, region splitting and split and merge algorithms.
The second approach applies the similarity and connectivity
evaluation in two separate steps: The goal is first to analyse the
discriminating properties of the pixels of the entire image and use
the result to define several classes of objects. Examples are
thresholding and cluster techniques. This is done outside the image
raster by storing all pixel properties in a so-called measurement
space (e.g. a histogram). Then, the definition of the classes can
be used to classify the pixels: Going back to the image domain,
each pixel is labeled with the identify number of the class. In the
second step, pixels of the same class and which are
-
also connected in the image space are grouped to homogeneous
regions. Connected components algorithms can easily do this. In the
following a short overview is given on thresholding techniques,
region growing/merging and split and merge approaches. An overview
on further region-based segmentation techniques can be found in
(HARALICK AND SHAPIRO 1985) or (ZUCKER 1976). Thresholding
Techniques Thresholding techniques consist of 4 steps (step 1 and 2
are not necessary when the thresholds are known in advance).
Determination of the histogram: Choosing the thresholds: The choice
of the thresholds is the most sensitive and the most difficult
step. Unfortunately, it is not always the case, that the peaks of
the histogram (may be more than two) are clearly separated by
valleys. Also the histogram often contains many local valleys,
which are probably not interesting. A survey on several techniques
for estimating thresholds automatically can be found in (HARALICK
AND SHAPIRO) 1992. Labeling or classification of the pixels: If the
thresholds are determined the pixels can be classified easily. The
result of the labeling process can be called segmented image,
because the labels are associated with object classes. Extraction
of blobs by connected-components: this processing step performs the
change from single pixels to blobs. Pixels that are labeled with
the same number must be connected by at least one pixel path, which
are all labeled by the same value. The connectivity can be defined
in a N8- or N4- neighborhood. Connected components algorithms are
usually defined in binary images. A description and comparison can
be found e.g. in (HARALICK AND SHAPIRO 1992) and (ROSENFELD AND KAK
1982). After this step, every pixel is labeled with a value, where
the value is associated with the blob number the pixel belongs to.
Thresholding techniques work well and fast if the objects that have
to be recognised or analysed are not too complex. This is the case
for many industrial applications. The main problem is the automatic
estimation of the thresholds. Even when the peaks are well
separated, the threshold result may not lead to the accurate
regions. Moreover, they may produce holes and ragged boundaries due
to the similarity grouping is performed in the measurement space
and not in the image domain. In this sense, threshold techniques
may not fulfill the criteria of a good region extraction
method.
-
Region Growing / Region Merging As the name indicates, region
growing and region merging methods follow a bottom-up approach:
starting from a single pixel or a small region (the seed or the
seed region) the region extraction is done by appending all
adjacent pixels to the expanding region which perform a certain
similarity criterion. If the image consists of more than one region
(that is normally the case) for each of them a separate region
growing process is required, which can be done sequentially or in
parallel. The process consists of the following steps:
Determination of the seeds: The determination of the seeds must
ensure that every region, which has to be extracted, contains at
least one starting point. In case the number and positions of the
seeds are known in advance, region growing can be applied. If the
seeds are not given, they may be defined by each pixel of the image
raster. In this case a region merging procedure is required.
However, the subsequent region-growing step produces probably many
small adjacent regions, which are not significantly different from
each other. So more processing steps are required to merge as many
regions as possible, if they are considered to be similar enough.
Region growing starts at the seeds and stops if all pixels are
labeled. Referring to HARALICK AND SHAPIRO 1985 the region growing
techniques can be distinguished by the number of pixels, which are
involved in the grouping decision, i.e. in the evaluation of the
homogeneity. In the easiest case the growing algorithm consist just
of the comparison of two adjacent pixels. It is obvious that the
result is very sensitive to noisy data. Less sensitivity to
noise
-
can be obtained by investigating not only the pixel properties
themselves, but a mean property of the local neighbourhood or the
properties of already extracted regions. Local neighbourhood
properties are e.g. the mean values and variances, but also
gradients or Laplacians. The latter area also used for many edge
detectors. Using gradient or Laplacian, edges and regions can be
extracted by the same operator, which directly takes the duality of
regions and edges into account. Combinations of different
techniques provide further improvements by consequently using their
positive properties. Criteria are the accuracy of the regions are
significantly different, the ability to place boundaries in weak
areas, and the robustness to noisy data. Region Merging: Assuming
the image area being completely partitioned into regions, the aim
is to merge adjacent region, which are not significantly different.
The main problem of region extraction by region growing algorithms
is the question of the merging order. Except of methods working in
a highly parallel manner (e.g. relaxation techniques), the result
depends on which region was extracted first and which of the
adjacent pixels or regions are attended first (usually more than
one neighbour fulfils the homogeneity criterion). The determination
of the best merging candidate is a time-consuming search algorithm
and is difficult to be solved. Less complex approaches consist of
well (and locally) defined merging rules.
Spilt and Merge The splitting algorithm is a process of dividing
the image area successively into sub-areas unless the sub-areas
satisfy a certain homogeneity criterion. To improve the efficiency
the partitioning in sub-areas can be done regularly, i.e. by the
partitioning of the still inhomogeneous areas into quarters. This
regularity causes squared, artificial and also inaccurate
boundaries. To cope with this problem, combinations of split and
merge were developed: the strategy starts from any given partition.
Adjacent regions are merged if the result
-
is homogeneous, single regions are split if they do not meet the
homogeneity criterion. The process continues until no more merging
or splitting can be done. A further advantage of this method is
that is faster than a single splitting or merging process.
Drawbacks The independent application of the techniques presented
here reveals a number of drawbacks:
Techniques aiming at complete partitioning of the image area
like region-based approaches lead to uncertain or even artificial
boundaries.
Region-based techniques conceptually are not able to incorporate
mid-level knowledge such as the straightness of the boundaries.
Edge based techniques normally cannot guarantee closed
boundaries, thus do not lead to a complete partitioning. Edges are
likely to be broken or do not represent the boundaries of the
regions (spurious edges) because of image noise.
Corner detectors usually dont work at junctions. All point
detectors have difficulties at smooth corners.
The used models are either wrong or at least not adaptive to the
local image content (e.g. edge detection at junctions).
To avoid inconsistencies all three-feature types could be
extracted simultaneously and therefore be embedded in the same
model. A complete feature extraction using points, lines, regions,
and their relations leads to a richer and also topologically
description of the image. Such an integrated approach (polymorphic
feature extraction) is addressed in (LANG and FORSTNER 1996).
-
Expert System for Information Extraction As mentioned above,
high-resolution imagery from both aerial and space borne sensors
provides a challenge to the user community in terms of information
extraction. The human eye and brain can identify objects in the
image but the computer finds it difficult. If we cannot automate
this process, then we will most certainly lose out on some of the
major economic benefits of the imagery. If the human brain can do
it, why cant the computer? Well it actually can, if it uses rules
or knowledge based processing, just as the human brain does. The
brain can make a decision on an image very quickly by understand
and using context. If we see grassland in the center of an urban
development, we can easily decide that it is a park, as opposed to
agricultural land. To make this decision we are using knowledge and
experience to create expertise and computer based expert systems
are beginning to emerge that mimic this process. For many years,
expert systems have been used successfully for medical diagnoses
and various information technology (IT) applications but only
recently have they been applied successfully to GIS applications.
Statistical image processing routines, such as maximum likelihood
and ISODATA classifiers, work extremely well at performing
pixel-by-pixel analyses of images to identify land-cover types by
common spectral signature. Expert-system technology takes the
classification concept a giant step further by analyzing and
identifying features based on spatial relationships with other
features and their context within an image. Expert systems contain
sets of decision rules that examine spatial relationships and image
context. These rules are structured like tree branches with
questions, conditions and hypotheses that must be answered or
satisfied. Each answer directs the analysis down a different branch
to another set of questions.
The beauty of an expert system is that because true experts,
such as foresters or geologists, create the rules, also called a
knowledge base, non-experts can use the system successfully. In
terms of satellite images, the knowledge base identifies features
by applying questions and hypotheses that examine pixel values,
relationships with other features and spatial conditions, such as
altitude, slope, aspect and shape. Most importantly, the know ledge
base can accept inputs of multiple data types, such as digital
elevation models, digital maps, GIS layers and other pre-processed
thematic satellite images, to make the necessary assessments.
-
Automatic Information Extraction
In recent years it has become clear that most of the value of
Geographic Information Systems lies in its data, rather than in its
hard- or software. For data to be valuable they need to be
up-to-date in terms of data completeness, consistency, and
accuracy. Mapping is often posed as an end-to-end process where new
source imagery is collected to meet certain project specifications
and the entire compilation process is performed using a homogeneous
set of spatially and temporally consistent data sources. In
contrast, other mapping applications require the ability to perform
incremental of existing spatial databases from a variety of
disparate sources. Thus, a timely revision of GIS databases plays a
major role in the overall process of acquiring, manipulating,
analyzing, and presenting topographic data.
Besides techniques like the digitization of large maps and
terrestrial surveys, photogrammetry seems to be especially well
suited for generating or updating GIS databases, since it has
already had a major impact in traditional map updating. Digital
photogrammetry based on digital images has the potential to further
increase this impact, mainly due to the possibility to at least
partly automate and thus speed up the generation and/or the
revision process.
Expect the manual or semi-automatic measurement of ground
control points, almost all steps are automated, but frequently some
manual post editing is required. Image matching has e.g. still
problems in built-up areas and has limitations in forest areas. No
robust solutions for break-line detection or object extraction in
these data exist so far. Degree of automation: The automated
extraction of 3D objects like buildings, roads, bridges, street
furniture or vegetation is not yet widely used in practice, which
is mainly due to the big technical problems. In order to solve the
object extraction task, methods of image understanding and image
interpretation are applied. Keywords are image segmentation, object
modeling and information fusion in order to detect and reconstruct
3-D objects from 2-D images. Major research efforts are currently
put on the extraction of man-made structures like roads and
buildings from digital aerial imagery and from space imagery. The
approaches range from manual methods, to semi-automated and
automated feature extraction methods from single and multiple image
frames. New developments on high-resolution space sensors might
allow medium and large scale mapping from space. Linear objects
like roads, railroads or river networks have since long attracted
researchers, but due to the limited resolution of space imagery
they could not successfully be extracted for mapping in medium or
large scales. With new high resolution sensors of 1 m and much
better ground sampling distances 1m-5m the possibilities to extract
linear objects have increased dramatically.
-
Schenk (Schenk, 1999) proposes the expression autonomous for a
system that can perform autonomously from human interaction. Also
those which are called automatic (like automatic DEM generation)
are not purely automatic, as they solve the task up to a certain
percent of errors. In extension to that Heuel (Heuel, 2000) gives a
proposal to classify the automation degree of systems using the
terms quantitative and qualitative interaction: methods are defined
automatic, if only simple y/n decisions or a selection of
alternatives, i.e. qualitative interaction are needed, they are
regarded as semi-automatic, if qualitative decisions and
quantitative input parameters are needed. We need to initialize the
extraction process, we might need to interact during run-time and
we certainly need to validate or correct the results. The less
interaction we need, the higher is the degree of automation. We
expect from the integration of automatic processes, that the
overall efficiency of the system is increased, but we know, that
those processes can give erroneous results, which are costly for
the user and thus may decrease the efficiency of the system. We may
want to reduce the level of training by avoiding complexity and
skill requirements in decision-making, but we also want to reduce
the number of manual actions in the collection phase. Here we
should not only refer to the amount of human interaction referred
to time and number of mouse operations, but also to the type of
interaction needed. We certainly have to select parameters
according to the task we want to solve and the data, which is
available. This is valid for all systems. We need to give the image
numbers of overlapping photographs, we need to defined the units (m
or feet) or we need to give the type of features searched for alike
buildings and/or roads. We have to provide instructions on how to
collect buildings in an interactive system or we need to give a set
of building models and some min-max values if we want to extract
them automatically. If we need to get deeper involved in the
algorithms we might need to give thresholds and steering parameters
(window sizes, minimal angle difference, minimal line length in the
image etc), which are not always interpretable. Sometimes it is
difficult to connect them to task and imager material. This holds
also for some stopping criteria for the algorithms, like maximal
number of iterations etc. Also the type of post editing can vary.
We might need to correct single vertex or corner points, or the
topology of whole structures or we need to manually check for
completeness. Summarizing the above statements we propose the
following scheme, starting from an interactive system, where we can
solve all tasks required, to a semi-automatic system, where we
interact during the measurement phase, to an automated system,
where the interaction is focused at the beginning and the end of
the automatic process and to an autonomous system, which is behind
horizon right now.
1. Interactive system (purely manual measurement, no automation
for any measurement task).
-
2. Semiautomatic system (interactive environment and integration
of automatic modules in the workflow)
3. Automated system (interactive environment with interaction
before and after the automatic phase).
4. Autonomous system.
-
Cartographic Feature Extraction
Of all tasks in photogrammetry the extraction of cartographic
features is the most time consuming. Since the introduction of
digital photogrammetry much attention therefore has been paid to
the development of tools for a more efficient acquisition of
cartographic features. Fully automatic acquisition of features like
roads and buildings, however, appears to be very difficult and may
even be impossible. The extraction of cartographic features from
digital aerial imagery requires interpretation of this imagery. The
knowledge one needs about the topographic objects and their
appearance in aerial images in order to recognize these objects and
extract the relevant object outlines is difficult to model and to
implement in computer algorithms. Therefore, only limited success
has been obtain in developing automatic cartographic feature
extraction procedures. Human operators appear to be indispensable
for a reliable interpretation of aerial images. Still, computer
algorithms can contribute significantly to the improvement of the
efficiency of feature extraction from aerial imagery. Whereas human
operators are better in interpretation, computer algorithms often
outperform operators in case of specific measurement tasks.
So-called semi-automatic procedures therefore combine the
interpretation skills of the operator with the measurement speed of
a computer. This paper reviews the most common strategies for
semi-automatic cartographic feature extraction from aerial imagery.
In several strategies knowledge about the features to be extracted
can easily be incorporated into the measurement part perform by a
computer algorithm. Some examples of the usage of such knowledge
will be described in the discussion at the end of this paper.
Semi-automatic feature extraction Semi-automatic feature extraction
is an interactive process between an operator and one or more
computer algorithms. To initiate the process, the operator
interprets the image and decides which features are to be measured
and which algorithms are to be used for this task. By positioning
the mouse cursor the approximate location of a feature is pointed
out to the algorithm. If required the operator also may tune some
of the algorithms parameters and select an object model for the
current feature. Semi-automatic feature extraction algorithms have
been developed for measuring primitive features such as points,
lines and regions, but also for more complex, often parameterized,
objects.
Extraction of points : Semi-automatic measurement of points is
used for measuring height points as well as for measuring specific
object corners. The first case is usually known as a cursor on the
ground utility, which is available in several commercial digital
photogrammetric workstations. Here, the operator positions the
cursor at some XY-position in a stereoscopic model, whereas the
terrain height at this position is determined automatically by
matching patches of the stereo image pair. After this determination
the 3D
-
cursor snaps to the local terrain surface. Thus, the operator is
relieved from a precise stereoscopic measurement and can therefore
increase the speed of data acquisition. The second type of point
measure algorithms is used to make the cursor snap to a specific
object corner. These algorithms can be used for monoplotting as
well as for stereoplotting. For monoplotting the operator
approximately indicates the location of an object corner to be
measured. The image patch around this approximate point will
usually contain grey value gradients caused by the edges of the
object. By applying an interest operator (see e.g. [Frstner and
Glch, 1987]) to this patch the location of the object corner can be
determined. Thus, such utilities can make the cursor snap to the
nearest point of interest. When using the same principle for
stereoplotting, the operator has to supply an approximate 3D
position of the object corner. The interest operator can then be
applied to both stereo images, whereas the estimated 3D corner
position will be constrained by the known epipolar geometry. For
the measurement of house roof corners, this procedure was reported
to double the speed of data acquisition and reduce the operator
fatigue [Firestone et al., 1996].
Extraction of lines : The extraction of lines from digital
images has been a topic of research for many years in the area of
computer vision [Rosenfeld, 1969, ueckel, 1971, Davis, 1975, Canny,
1986]. First attempts to extract linear features from digital
aerial and space imagery were reported in [Bajcsy and Tavakoli,
1976, Nagao and Matsuyama, 1980]. Semi-automatic algorithms have
been eveloped for the extraction of roads. These algorithms can be
classified into two categories: algorithms using deformable
templates and road trackers.
-
Deformable templates: Before starting algorithms using
deformable templates the operator needs to provide the approximate
outline of the road. This initial template of the road is usually
represented by a polygon with a few nodes near to the road to be
measured. The task of the algorithm is to refine the initial
template to a new polygon with many more nodes that accurately
outline the road edges or the road centre (depending on the road
model used). This is achieved by deforming the template such that a
combination of two criteria is optimised: the template should
coincide with image pixels with high grey value gradients and the
shape of the template should be relatively smooth. The latter
criterion is often accomplished by constraining the (first and)
second derivatives of the template. This constraint is needed for
regularisation but is also leading to more likely outline results,
since road shapes generally are quite smooth. Most algorithms of
this kind are based on so-called snakes [Kass et al., 1988]. The
snakes approach uses an energy function in which the two
optimisation objectives are combined. After computing the energy
gradients due to changes in the positions of the polygon nodes the
optimal direction for the template deformation can be found by
solving a set of differential equations. In an iterative process
the polygon nodes are shifted in this optimal direction. The
resulting behaviour of the template looks like that of a moving
snake, hence the name. Whereas snakes were initially formulated for
optimally outlining linear features in a single image, they can
also be used to outline a feature in 3D object space by combining
grey value gradients from multiple images together with the
exterior orientation of these images [Trinder and Li, 1995,
Neuenschwander et al., 1995]. This snakes approach has also been
extended to outline both sides of a road simultaneously. More
research is conducted to further improve the efficiency of mapping
with snakes by reducing the requirements on the precision of the
initial template provided by the operator and by incorporating
scene knowledge into the template deformation process
[Neuenschwander et al., 1995, Fua, 1996].
-
Road trackers In the case of snakes, the operator needs to
provide a rough outline of the complete road to be measured. In
contrast, the input for road trackers only consists of a small road
segment outlined by the operator. The purpose of the road tracker
is then to find the adjacent parts of the road. Most road trackers
are based on matching grey value profiles [McKeown and Denlinger,
1988, Quam and Strat, 1991, Vosselman and Knecht, 1995]. Based on
the initial road segment outlined by the operator, a characteristic
grey value profile of the road is derived. Furthermore, the local
direction and curvature of the road is estimated. This estimation
is used to predict the position of the road at some step size after
the initial road segment. At this position and perpendicular to the
predicted road direction at this position a grey value profile is
extracted from the image. By matching this profile with the
characteristic road profile a shift between the two profiles can be
determined. Based on this shift, an estimate for the road position
along the extracted profile is obtained. By incorporating
previously estimated positions, other road parameters like the road
direction and the road curvature can also be updated. The updated
road parameters can then be used to make a next prediction of the
road position at
-
some step size further along the road. This recursive process of
prediction, measurement by profile matching and updating the road
parameters can be implemented elegantly in a Kalman filter
[Vosselman and Knecht, 1995]. The road tracking continues until the
profile matching fails at several consecutive predicted positions,
i.e. it stops when the several extracted profiles show little
correspondence with the characteristic grey value profile. Some
characteristic results are shown in figure 3. Trees along the road
or road crossings and junctions can often explain matching
failures. Due to these objects the grey value profiles extracted at
those positions deviate substantially from the characteristic
profile. By making predictions with increasing step sizes, the road
tracker is often able to jump over these kinds of obstacles and
continue the outlining of the road.
Extraction of areas Due to the lack of modeled knowledge about
objects, the computer-supported extraction of area features is more
of less limited to areas that are homogeneous with respect to some
attribute. Of course, in images the most common attributes to look
at are the pixels grey value, colour and texture attributes.
Algorithms that extract homogeneous grey value areas can facilitate
the extraction of objects like water areas and house roofs. The
most common approach is to let the operator indicate a point on the
homogeneous object surface and let an algorithm find the outlines
of that surface. An example can be seen in figure 4. It is clear
that the results of such an algorithm still require some editing by
an operator. Overhanging trees at the left side of the river and
trees that cast dark shadows at the right side of the river cause
differences between the bounds of the homogeneous area and the
river borders, as they should be mapped. Similar differences will
also arise when using these techniques to extract building roofs.
Most objects are not
-
homogeneous enough to allow a perfect delineation. Still, the
majority of the lines to be mapped may be at the correct place.
Thus, editing the results of such an area feature extraction will
often be faster than a complete manual mapping process. Firestone
et al. [1996] report the use this technique for mapping lakeshores.
Especially for small scale mapping this can be very efficient since
the water surface generally appears homogeneous and the disturbing
effects of trees along the shoreline, as in the example, may be
negligible at small scale. The algorithms used to find the
boundaries of a homogeneous area are usually based on the
region-growing algorithm [Haralick and Shapiro, 1992]. Starting at
the pixel indicated by the operator, this algorithm checks whether
an adjacent pixel has similar attributes (e.g. grey value). If the
difference is below some threshold, the two pixels are merged to
one area. Next, the attributes of another pixel adjacent to this
area are examined and this pixel is also merged with the area if
the attribute differences are small. In this way a homogeneous area
is grown pixel by pixel. This process is repeated until all pixels
that are adjacent to the grown area have significantly different
attributes.
Extraction of complex objects As requirements to geographical
data are shifting from 2D to 3D and from vector data to object
oriented data the acquisition of these data with digital
photogrammetry is also increasingly three-dimensional and object
based. In particular for the acquisition of 3D objects like
buildings and other highly structured objects the usage of object
models can be beneficiary. These models contain the topology and
the internal geometrical constraints of the object. The usage of
these models relieves the operator from specifying these data
within the measurement process and will improve the robustness and
precision of the data acquisition. A common interactive approach is
illustrated in figure 5. After the selection of an appropriate
object model by an operator, the operator approximately aligns the
object model with the image (left image). In a second step a
fitting algorithm is
-
used to find the best correspondence between the edges of the
object model and the location of high gradients in the image
(middle image). Especially in presence of neighboring edges with
High contrast (like the windows on the house front in the example)
the resulting fit does often not correspond to the desired result
and therefore requires one or more additional corrective
measurements by the operator (right image). Different approaches
are being used to find the optimal alignment of the object model to
the image. Fua [1996] extended the above described snake algorithm
for fitting object models. The energy function is defined as a
function of the sum of the grey value gradients along the model
edges. Derivatives of this energy function with respect to changes
in the co-ordinates of the object corners determine the optimal
direction for changes in these co-ordinates, whereas constraints on
the co-ordinates ensure that a valid building model with parallel
and rectangular edges is maintained. Lowe [1991] and Lang and
Schickler [1993] use parametric object descriptions and determine
the optimal parameter values by fitting the object edges to edge
pixels (pixels with high grey value gradients) and extracted linear
edges respectively. Veldhuis [1998] analysed the approaches of Fua
[1996] and Lowe [1991] with respect to suitability for mapping.
Semi-automatic measurement techniques as reviewed in this paper
surely improve the efficiency of cartographic feature extraction.
In most cases there is a clear interaction between the human
operator and one or more measurement algorithms. Prior to the
measurement the task of the operator is to identify the object to
be measured, to select the correct object model and algorithm and
to provide approximate values. After the measurement by the
computer the operator needs to correct part of the measurements,
since the delineation resulting from the objective of the
measurement algorithm often does not correspond with the desired
object boundaries. Robustness as well as precision
-
of the semi-automatic measurements can be improved by
incorporating knowledge about the topographic features into the
measurement process. A clear example of this was already shown for
the case of complex object measurement. Further knowledge can be
knowledge can be added in the form of constraints between
neighbouring houses and roads. Hwang et al. [1986] e.g. uses the
fact that most houses are parallel to a road and that houses are
often connected to a road by a driveway. In the case of linear
features many more heuristics can be used to guide the feature
extraction. Cleynenbreugel et al. [1990] notice that roads usually
have no steep slopes and that, therefore, digital elevation models
can be useful for road extraction in mountainous areas. Furthermore
they notice that the road patterns are often typical for the type
of landscape (mountainous, flat rural, urban). Soft bounds on the
usually low curvature of principal roads are used in the road
tracker described in [Vosselman and Knecht, 1995]. Useful
properties of water surfaces are related to height. Fua [1996]
extracts rivers as 3D linear features and imposes the constraint
that the height of the river decreases monotonously. Furthermore,
when lakes are extracted as 3D surfaces they can often be assumed
to be horizontal. The latter constraint can be used to
automatically detect delineation errors caused by occluding trees
along the lakeshore. To obtain a higher degree of automation by
interpretation of the aerial image by computer algorithms much more
knowledge is to be modelled. Knowledge based interpretation of
aerial images and the usage of existing GIS databases within this
process is a topic of current research efforts [Kraus and Waldhusl,
1996, Gruen et al., 1997].