Science

www.elsevier.com/locate/isprsjprs

ISPRS Journal of Photogrammetry & Remote Sensing 58 (2004) 225–238

Object-based classification of remote sensing data

for change detection

Volker Walter*

Institute for Photogrammetry, University of Stuttgart, Geschwister-Scholl-Str. 24 D, Stuttgart D-70174, Germany

Received 31 January 2003; accepted 26 September 2003

Abstract

In this paper, a change detection approach based on an object-based classification of remote sensing data is introduced. The

approach classifies not single pixels but groups of pixels that represent already existing objects in a GIS database. The approach

is based on a supervised maximum likelihood classification. The multispectral bands grouped by objects and very different

measures that can be derived from multispectral bands represent the n-dimensional feature space for the classification. The

training areas are derived automatically from the geographical information system (GIS) database.

After an introduction into the general approach, different input channels for the classification are defined and discussed. The

results of a test on two test areas are presented. Afterwards, further measures, which can improve the result of the classification

and enable the distinction between more land-use classes than with the introduced approach, are presented.

D 2003 Elsevier B.V. All rights reserved.

Keywords: change detection; classification; object-oriented image analysis; data fusion

1. Introduction the real world is very small compared with the number

In Walter and Fritsch (2000), a concept for the

automatic revision of geographical information sys-

tem (GIS) databases using multispectral remote sens-

ing data was introduced. This approach can be

subdivided into two steps (see Fig. 1). In a first step,

remote sensing data are classified with a supervised

maximum likelihood classification into different land-

use classes. The training areas are derived from an

already existing GIS database in order to avoid the

time-consuming task of manual acquisition. This can

be done if it is assumed that the number of changes in

0924-2716/$ - see front matter D 2003 Elsevier B.V. All rights reserved.

doi:10.1016/j.isprsjprs.2003.09.007

* Tel.: +49-711-121-4091; fax: +49-711-121-3297.

E-mail address: [email protected] (V. Walter).

of all GIS objects in the database. This assumption is

justified because we want to realise update cycles in

the range of several months.

In a second step, the classified remote sensing data

have to be matched with the existing GIS objects in

order to find those objects where a change occurred, or

which were collected wrongly. We solved this task by

measuring per object the percentage, homogeneity, and

form of the pixels, which are classified to the same

object class as the respective object stored in the

database (Walter, 2000). All objects are classified into

the classes fully verified, partly verified, and not found

by using thresholds that can be defined interactively by

the user.

The problem of using thresholds is that they are

data-dependent. For example, the percentage of veg-

Fig. 1. Pixel-based classification approach.

V. Walter / ISPRS Journal of Photogrammetry & Remote Sensing 58 (2004) 225–238226

etation pixels varies significantly between data that

are captured in summer or in winter. Other influencing

factors are light and weather conditions, soil type, or

daytime. Therefore, we cannot use the same thresh-

olds for different datasets. In order to avoid the

problem of defining data-dependent thresholds, we

introduce an object-based supervised classification

approach. The object-based classification works in

the same way as a pixel-based classification (see

Fig. 2), with the difference that we do not classify

each pixel but combine all pixels of each object and

classify them together. Again, the training areas for

the classification of the objects are derived from the

existing database in order to avoid a time-consuming

manual acquisition.

In a ‘‘normal’’ classification, the greyscale values

of each pixel in different multispectral channels and

possibly some other preprocessed texture channels are

used as input. For the classification of groups of

pixels, we have to define new measures that can be

very simple (e.g., the mean grey value of all pixels of

an object in a specific channel) but also very complex,

like measures that describe the form of an object. This

approach is very flexible because it can combine very

different measures for describing an object. We can

even use the result of a pixel-based classification and

count for each object the percentage of pixels that are

classified to a specific land-use class.

Because the result of the approach is a classifica-

tion into the most likely class, the problematic part of

matching is now replaced by a single comparison of

the classification result with the GIS database without

using any thresholds.

1.1. Related work

This kind of approach is an object-oriented image

analysis that is also successfully applied to other

Fig. 2. Differences between object-based and pixel-based classification.

V. Walter / ISPRS Journal of Photogrammetry & Remote Sensing 58 (2004) 225–238 227

problems. A good overview of different approaches

can be found in Blaschke et al. (2000). These

approaches can be subdivided into approaches that

use existing GIS data to superimpose it on an image

(per-field or per-parcel classification), and approaches

that use object-oriented classification rules without

any GIS input. Approaches that use existing GIS data

are not very widely used today. In Aplin et al. (1999),

an example for a per-field classification approach is

introduced, which first classifies the image into

different land-use classes. Afterwards, the fields

(which represent forest parcels from a GIS database)

are subdivided into different classes, depending on

the classification result, by using thresholds. The

main difference of existing approaches compared

with our approach is that no thresholds are used in

our approach.

2. Object-based classification

2.1. Input data

The following tests were carried out with ATKIS

datasets. ATKIS is the German national topographic

and cartographic database, and captures the landscape

in the scale of 1:25,000 (AdV, 1988). In Walter

(1999), it was shown that a spatial resolution of at

least 2 m is needed to update data in the scale of

1:25,000. The remote sensing data were captured with


the DPA system, which is an optical airborne digital

camera (Hahn et al., 1996). The original resolution of

0.5 m was resampled to a resolution of 2 m. The DPA

system has four multispectral channels [blue 440–525

nm, green 520–600 nm, red 610–685 nm, near-

infrared (NIR) 770–890 nm].

2.2. Classification classes

Currently, 63 different object classes are collected

in ATKIS. There are a lot of object classes that can

have very similar appearances in an image of 2 m

pixel size (e.g., industrial areas, residential areas, or

areas of mixed use). Therefore, we do not use 63 land-

use classes for the classification but subdivide all

object classes into the five land-use classes: water,

forest, settlement, greenland, and roads. The land-use

Fig. 3. Input data for (a) object-based a

class roads is only used in the first step in the process

for the pixel-based classification. Because of the

linear shape, roads consist of many mixed pixels in

a resolution of 2 m and have to be checked with other

techniques (see Walter, 1998).

2.3. Input channels

Like in a pixel-based classification, we can use all

spectral bands as input channels. The difference is that

in the pixel-based classification, each pixel is classi-

fied separately, whereas in the object-based classifi-

cation, all pixels that belong to one GIS object are

grouped together. In order to analyse the spectral

behaviour of objects, we calculate the mean grey

value of each channel for all GIS objects. Fig. 3

shows as an example the original input data (b) and

nd (b) pixel-based classification.


the mean RGB (red green blue) value (a) of each GIS

object. The result of the pixel grouping is like a

smoothing of the data. The spectral behaviour of the

objects is similar to the typical spectral behaviour of

the pixels. For example, forest areas are represented in

the green channel by dark pixel/objects, whereas

settlements are represented by bright pixel/objects.

This behaviour can be also seen in Fig. 4. The

scatterplots show the distribution of (a) the grey values

of settlement and forest pixels compared with the

distribution of (b) the mean grey value of settlement

and forest objects in the channels red and NIR. It can

be seen that the behaviour is similar but the separation

of the two classes becomes blurred because of the

smoothing effect. In the object-based classification, all

multispectral bands of the DPA camera system (blue,

green, red, and NIR) are used as input channels.

Fig. 4. Scatterplot of (a) p

Different land-use classes cannot be distinguished

only by their spectral behaviour but also by their

different textures. Texture operators transform input

images in such a way that the texture is coded in grey

values. In our approach, we use a texture operator

based on a co-occurrence matrix that measures the

contrast in a 5� 5 pixel window. Fig. 5 shows the

used texture operator in an example. The input image

is shown in Fig. 5a, the texture (calculated from the

blue band) in Fig. 5b, and the average object textures

in Fig. 5c. Settlements are represented with dark

pixels, greenlands with bright pixels, and forests with

middle grey pixels.

The variance of the grey values of the pixels of an

object is also a good indicator of the roughness of a

texture. Fig. 6 shows the calculated mean variance in

the blue band for all objects. Settlement objects have

ixels vs. (b) objects.

Fig. 5. (a) Input image, (b) texture blue band, and (c) average object texture.


high variance, greenland objects have middle variance,

and forest objects have low variance. Fig. 7 shows the

behaviour of the variance in the different bands: blue,

green, red, and NIR. The best discrimination between

land-use classes using the variance can be seen in the

blue band. In the NIR band, all land-use classes have a

similar distribution, which makes discrimination in this

band impossible.

Vegetation indices are very often used in pixel-

based classification as an input channel to improve

the classification result. They are based on the spectral

behaviour of chlorophyll, which absorbs red light and

reflects NIR light. In our approach, we employ the most

widely used normalised difference (Campbell, 1987):

VI ¼ IR� R

IRþ Rð1Þ

Fig. 8a shows the calculated vegetation index for

pixels and Fig. 8b for objects. It can be seen that

Fig. 6. Mean variance of GIS objects in blue band.


settlements are represented typically by dark areas,

whereas forests are represented mostly by bright

areas. The classification of greenlands is difficult

because they can be represented by very bright areas

(e.g., fields with a high amount of vegetation) as well

as by very dark areas (e.g., fields shortly after the

harvest).

All so far defined input channels are also used in

‘‘normal’’ pixel-based classification. In object-based

classification, it is possible to add further input

channels, which do not describe directly spectral or

textural characteristics. For example, we can use the

result of a pixel-based classification and count the

percentage of pixels that are classified to a specific

land-use class. This evaluation is shown in Fig. 9. The

input image is shown in Fig. 9a and the pixel-based

classification result in Fig. 9b. Fig. 9c shows for each

object the percentage of pixels that are classified to

the land-use class forest. White colour represents

100% and black colour represents 0%. In Fig. 9b

and c, it can be seen that forest is a land-use class that

can be classified with high accuracy in pixel-based as

well as object-based classifications. Fig. 9d shows the

percentage of settlement pixels. Because of the high

resolution (2 m) of the data, settlements cannot be

detected as homogenous areas but they are split into

different land-use classes depending on what the

pixels are actually representing. Therefore, settlement

objects contain typically only 50–70% settlement

pixels in 2-m resolution images. This can be also seen

in Fig. 9e, which shows the percentage of greenland

pixels. Whereas greenlands contain up to 100% green-

land pixels, it can be seen that, in settlement areas,

pixels are also classified as greenlands.

An interesting visualisation of the feature space of

the object-based classification can be made with the

combination of three object-based evaluations of the

pixel-based classification. In Fig. 10, the percentage

of settlement pixels is assigned to the red band, the

percentage of forest pixels to the green band, and the

percentage of greenland pixels to the blue band of an

RGB image. The combination of these three bands

shows that the pixel-based classification of forests and

greenlands is very reliable, which can be seen on the

bright green and blue colour of the corresponding

objects. Settlement areas in contrast cannot be classi-

fied as homogenous areas. Therefore, settlement

objects are represented in a reddish colour that can

be brownish or purple.

3. Classification results

The approach was tested on two test areas (16 and

9.1 km2), which were acquired at different dates with

a total of 951 objects (194 forests, 252 greenlands,

497 settlements, and 8 water objects). The input

channels were:

� mean grey value blue band� mean grey value green band� mean grey value red band� mean grey value NIR band� mean grey value vegetation index� mean grey value texture from blue band� variance blue band� variance green band� variance red band

Fig. 7. Object variance in different bands (x-axis, variance; y-axis,

number of objects).


� variance NIR band� variance vegetation index� variance texture� percentage forest pixel� percentage greenland pixel� percentage settlement pixel� percentage water pixel.

The input channels span a 16-dimensional feature

space. All objects of the test areas are used as training

objects for the classification. That means that those

objects are also training objects that are wrong in the

database. In a manual revision, we compared the GIS

data with the images. The number of objects that were

not collected correctly, or where it was not possible to

decide if they are collected correctly without further

information sources is 63, which is more than 6% of

all objects. The average percentage of changes in

topographic maps in western Europe per year are

6.4% in scale 1:50,000, 7.4% in scale 1:25,000 and

8% in scale 1:1,000,000 (Konecny, 1996). Therefore,

the approach is robust enough if we want to update the

GIS database in 1-year cycles.

Fig. 11a shows the GIS data and Fig. 11b shows

the result of the object-based classification on a part of

one test area. Altogether, 82 objects (which are 8.6%

of all objects) were classified into a different land-use

class than the one assigned to them in the GIS

database.

These objects were subdivided manually into three

classes. The first class contains all objects where a

change in the landscape has happened and an update

in the GIS database has to be done. In this class, there

are 37 objects (45%). The second class contains all

objects where it is not clear if the GIS objects were

collected correctly. Higher-resolution data or some-

times even field inspections are needed to decide if the

GIS database has to be updated or not. In this class,

there are 26 objects (31%). The third class contains all

objects where the result of the classification is incor-

rect. In this class, there are 19 objects (23%).

4. Further work

The approach subdivides all objects into the classes

water, forest, settlement, and greenland. This can be

refined if more object characteristics are evaluated. In

Fig. 8. Vegetation index for (a) single pixels and (b) objects.


the following, we suggest three possible extensions of

the approach.

4.1. Additional use of laser data

In Haala and Walter (1999), it was shown that the

result of a pixel-based classification can be improved

significantly by the combined use of multispectral and

laser data. Fig. 12 shows a pixel-based classification

result of a CIR (colored infrared) image with (b) and

without (c) the use of laser data as an additional

channel. The laser data improve the classification

result because they have a complementary ‘‘behav-

iour’’ to the multispectral data. With laser data, the

classes greenland and road can be separated very well

from the classes forest and settlement because of the

different heights of the pixels above the ground,

whereas in multispectral data, the classes greenland

and forest can be separated very well from the classes

roads and settlement because of the strongly different

percentages of chlorophyll. The four input channels,

which were calculated from the result of the pixel-

based classification (percentage forest pixels, percent-

age greenland pixel, percentage settlement pixels, and

percentage water pixels), are the channels with the

highest amount of influence for the object-based

classification. Therefore, the object-based classifica-

tion should also be improved by the combined use of

multispectral and laser data.

With laser data, further input channels can be

calculated like slope, average object height, average

object slope, etc. With high-density laser data, it could

be possible to distinguish, for example, between

residential areas and industrial areas. Fig. 13 shows

a laser profile (1 m raster width) of a residential area

(a) and an industrial area (b). In residential areas, there

are typically houses with sloped roofs and a lot of

vegetation between the houses, whereas in industrial

areas, there are buildings with flat roofs and less

vegetation. This characteristic can be described by a


two-dimensional evaluation of the slope directions of

each object and could be also useful to distinguish

between different types of vegetation.

The fusion of data from different sensors for

image segmentation is a relatively new field (Pohl

and van Genderen, 1998). The general aim is to

increase the information content in order to make the

segmentation easier. Instead of laser data, it could be

also possible to make a fusion with SAR data (e.g.,

see Dupas, 2000).

4.2. More texture measures

At the moment, we use a co-occurrence matrix,

mean variance, and mean contrast to describe the

texture of objects. These texture measures can be also

used in pixel-based classification by measuring the

variance and contrast of each pixel in an n� n

window. The problem of a window with a fixed size

is that mixed pixels at the object borders are classified

very often to a wrong land-use class. The larger is the

window, the more pixels will be classified wrongly.

This problem does not appear in object-based classi-

fication because we do not evaluate a window with a

fixed size but use the existing object geometry (in

order not to use mixed pixels at the object boarder, a

buffer is used and border pixels are removed). There-

fore, we suggest using more texture measures. Fig. 14

shows an example of a possible evaluation of the

texture. The images are processed with a Sobel

operator. Typically, farmland objects contain many

edges with one main edge direction (a), whereas in

forest objects, the direction of the edges is equally

distributed (b) and in settlement objects, several main

directions can be found (c). Other texture measures

could be, for example, the average length or contrast

of the edges. However, several tests have to be

performed in order to prove these ideas.

4.3. Use of multitemporal data

The main reason that the approach classifies

objects into a wrong class is that in practice, the

Fig. 9. Percentage right classified pixel. (a) Input image, (b) pixel-

based classification result, (c) percentage right classified forest pixels,

(d) percentage right classified settlement pixels, (e) percentage right

classified greenland pixels.

Fig. 10. Visualisation of the feature space of the object-based classification.


appearance of objects can be very inhomogeneous. If,

for example, a settlement object contains large areas

of greenland but only few pixels that represent a

house or a road, it will be classified as greenland

and not as settlement. The object will be marked as an

updated object and an operator has to check the object

each time the data are revised because the approach

will classify the object every time as greenland.

A solution for this problem is to store all param-

eters of the n-dimensional feature space (mean grey

values, mean variance, etc.) of an object when it is

checked for the first time. If, then, later the object is

marked again as an update, the program can measure

the distance of the object in the current and the earlier

stored feature space. If the distance is under a specific

threshold, it can be assumed that the object is still the

same and therefore does not have to be updated.

5. Conclusion

The basic idea of the approach is that image

interpretation is not based only on the interpretation

of single pixels but on whole object structures. There-

fore, we do not classify only single pixels but groups

of pixels that represent already existing objects in a

GIS database. Each object is described by an n-

dimensional feature vector and classified to the most

likely class based on a supervised maximum likeli-

hood classification. The object-based classification

needs no tuning parameters like user-defined thresh-

olds. It works fully automatically because all infor-

mation for the classification is derived from

automatically generated training areas. The result is

not only a change detection but also a classification

into the most likely land-use class.

The results show that approximately 8.6% of all

objects (82 objects from 951) are marked as changes.

From these 82 objects, 45% are real changes, 31% are

potential changes, and 23% are wrongly classified.

That means that the amount of interactive checking of

the data can be decreased significantly. On the other

hand, we have to ask if the object-based classification

finds all changes. A change in the landscape can only

be detected if it affects a large part of an object

because the object-based classification uses the exist-

Fig. 12. (a) Input image, (b) classification with multispectral data, and (c) classification with multispectral and laser data.

Fig. 11. (a) GIS data and (b) result of the classification.


Fig. 13. Laser profiles of (a) a residential and (b) an industrial area.


ing object geometry. If, for example, a forest object

has a size of 5000 m2 and in that forest object a small

settlement area with 200 m2 is built up, then this

approach will fail.

Further techniques have to be developed in order to

cover this problem. Because forest areas can be

classified very accurately in pixel-based classification,

it could be additionally tested whether there are large

areas in a forest object that are classified to another

Fig. 14. Different gradient directions for (a)

land-use class. The same approach could be used for

water areas because water is also a land-use class that

can be classified very accurately in pixel-based clas-

sification. More difficult is the situation for the land-

use classes greenland and settlement, which have

typically an inhomogeneous appearance in a pixel-

based classification. Here, we suggest using a multi-

scale approach to make additional verification of the

objects (e.g., see Heipke and Straub, 1999).

greenland, (b) forest, (c) settlement.


Up to now, we can only distinguish between the

land-use classes forest, settlement, greenland, and

water. This can be refined if more object character-

istics are evaluated. Some possible object character-

istics are defined in this paper and have to be tested in

future work.

References

Aplin, P., Atkinson, P., Curran, P., 1999. Per-field classification of

landuse using the forthcoming very fine resolution satellite sen-

sors: problems and potential solutions. In: Atkinson, P., Tate, N.

(Eds.), Advances in Remote Sensing and GIS Analysis. Wiley,

Chichester, pp. 219–239.

Arbeitsgemeinschaft der Vermessungsverwaltungen der Lander der

Bundesrepublik Deutschland (AdV), 1988. Amtlich Topogra-

phisches-Kartographisches Informationssystem (ATKIS). Land-

esvermessungsamt Nordrhein-Westfalen, Bonn.

Blaschke, T., Lang, S., Lorup, E., Strobl, J., Zeil, P., 2000. Object-

oriented image processing in an integrated GIS/remote sensing

environment and perspectives for environmental applications.

In: Cremers, A., Greve, K. (Eds.), Environmental Information

for Planning, Politics and the Public, vol. II. Metropolis-Verlag,

Marburg, pp. 555–570.

Campbell, J.B., 1987. Introduction into Remote Sensing. The

Guildford Press, New York.

Dupas, C.A., 2000. SAR and LANDSAT TM image fusion for land

cover classification in the Brazilian Atlantic Forest Domain.

International Archives for Photogrammetry and Remote Sensing

XXXIII (Part B1), 96–103.

Haala, N., Walter, V., 1999. Classification of urban environ-

ments using LIDAR and color aerial imagery. International

Archives for Photogrammetry and Remote Sensing XXXII

(Part 7-4-3W6), 76–82.

Hahn, M., Stallmann, D., Staetter, C., 1996. The DPA-sensor

system for topographic and thematic mapping. International

Archives of Photogrammetry and Remote Sensing XXXI

(Part B2), 141–146.

Heipke, C., Straub, B.-M., 1999. Relations between multi scale

imagery and GIS aggregation levels for the automatic extrac-

tion of vegetation areas. Proceedings of the ISPRS Joint Work-

shop on ‘‘Sensors and Mapping from Space’’, Hannover. On

CD-ROM.

Konecny, G., 1996. Hochauflosende Fernerkundungssensoren fur

kartographische Anwendungen in Entwicklungslander. ZPF 64

(2), 39–51.

Pohl, C., van Genderen, J., 1998. Multisensor image fusion in

remote sensing: concepts, methods and applications. Interna-

tional Journal on Remote Sensing 19 (5), 823–864.

Walter, V., 1998. Automatic classification of remote sensing data

for GIS database revision. International Archives for Photo-

grammetry and Remote Sensing XXXII (Part 4), 641–648.

Walter, V., 1999. Comparison of the potential of different sensors

for an automatic approach for change detection in GIS data-

bases. Lecture Notes in Computer Science, Integrated Spatial

Databases: Digital Images and GIS, International Workshop

ISD ’99. Springer, Heidelberg, pp. 47–63.

Walter, V., 2000. Automatic change detection in GIS databases

based on classification of multispectral data. International

Archives of Photogrammetry and Remote Sensing XXXIII

(Part B4), 1138–1145.

Walter, V., Fritsch, D., 2000. Automatic verification of GIS data

using high resolution multispectral data. International Archives

of Photogrammetry and Remote Sensing XXXII (Part 3/1),

485–489.

Science

Education

supervised classification

normal classification

thethe classification

respective object

object andtion

existing gis objects

multispectral remote

percentage of pixels