Research Article Creating a hydrographic network from its cartographic representation: a case study using Ordnance Survey MasterMap data NICOLAS REGNAULD* and WILLIAM A. MACKANESS{ Institute of Geography, School of GeoSciences, The University of Edinburgh, Drummond St, Edinburgh EH8 9XP, UK (Received 28 August 2004; in final form 11 January 2006 ) A meaningful hydrological network is critical to spatial analysis and modelling. ‘Meaningful’ in that it is topologically correct, provides a basis for modelling flow and differentiates between different types of water features. In Great Britain, large-scale digital mapping of hydrological features was captured from paper maps and had a cartographic emphasis that had poor attribution, and no underlying model that supported geographical modelling. This emphasis gave rise to rivers and lakes that were variously ‘broken’ into sections by features such as dams, bridges, and culverts. This paper reports on research to create automatically a topologically connected hydrological network that underpins the detailed cartographic representation of such features. The network was created by joining these hydrographic features together according to rules of both continuity and proximity between river sections, and their flow direction (using an underlying digital elevation model). Confidence values were associated with each section link reflecting the certainty of that connection. The confidence values provided a basis for directing human intervention to uncertain connections as part of the final editing process. The project took as its input OS MasterMap ‘water feature’ data. A skeletonisation process was used to create the medial axis of the network. The paper reports in detail the methodology, the implementation and evaluation. The algorithm worked well in rural areas where interruptions are small and there is greater variation in height. In urban areas the challenges were greater where typically relatively long sections of river may be re- engineered and culverted, and where the fidelity of the digital elevation model was insufficient to discern the subtle changes in elevation. Keywords: Data modelling; Hydrographic network topological modelling 1. Introduction OS MasterMapH is one of a series of digital products engineered by the Ordnance Survey, the National Mapping Agency of Great Britain. OS MasterMap is a digital map designed by Ordnance Survey for use with geographical information systems (GIS) and databases. This data set has a high level of detail, being captured at 1:1 250 scale in urban areas, 1:2 500 scale in rural areas and 1:10 000 scale in mountain and moor land areas (www.ordnancesurvey.co.uk/oswebsite/products/ osmastermap/). OS MasterMap represents an engineering improvement to {Corresponding author. Email: [email protected]*Present address: Research and Innovation, Ordnance Survey, Romsey Road, Southampton SO16 4GU. Email: [email protected]International Journal of Geographical Information Science Vol. 20, No. 6, July 2006, 611–631 International Journal of Geographical Information Science ISSN 1365-8816 print/ISSN 1362-3087 online # 2006 Taylor & Francis http://www.tandf.co.uk/journals DOI: 10.1080/13658810600607402
21
Embed
Research Article Creating a hydrographic network from its ... · order to support the process of creating a connected river network. Sections4, 5 and 6 detail the three main parts
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research Article
Creating a hydrographic network from its cartographic representation: acase study using Ordnance Survey MasterMap data
NICOLAS REGNAULD* and WILLIAM A. MACKANESS{
Institute of Geography, School of GeoSciences, The University of Edinburgh,
Drummond St, Edinburgh EH8 9XP, UK
(Received 28 August 2004; in final form 11 January 2006 )
A meaningful hydrological network is critical to spatial analysis and modelling.
‘Meaningful’ in that it is topologically correct, provides a basis for modelling
flow and differentiates between different types of water features. In Great
Britain, large-scale digital mapping of hydrological features was captured from
paper maps and had a cartographic emphasis that had poor attribution, and no
underlying model that supported geographical modelling. This emphasis gave
rise to rivers and lakes that were variously ‘broken’ into sections by features such
as dams, bridges, and culverts. This paper reports on research to create
automatically a topologically connected hydrological network that underpins the
detailed cartographic representation of such features. The network was created
by joining these hydrographic features together according to rules of both
continuity and proximity between river sections, and their flow direction (using
an underlying digital elevation model). Confidence values were associated with
each section link reflecting the certainty of that connection. The confidence
values provided a basis for directing human intervention to uncertain
connections as part of the final editing process. The project took as its input
OS MasterMap ‘water feature’ data. A skeletonisation process was used to create
the medial axis of the network. The paper reports in detail the methodology, the
implementation and evaluation. The algorithm worked well in rural areas where
interruptions are small and there is greater variation in height. In urban areas the
challenges were greater where typically relatively long sections of river may be re-
engineered and culverted, and where the fidelity of the digital elevation model
was insufficient to discern the subtle changes in elevation.
Keywords: Data modelling; Hydrographic network topological modelling
1. Introduction
OS MasterMapH is one of a series of digital products engineered by the Ordnance
Survey, the National Mapping Agency of Great Britain. OS MasterMap is a digitalmap designed by Ordnance Survey for use with geographical information systems
(GIS) and databases. This data set has a high level of detail, being captured at
1:1 250 scale in urban areas, 1:2 500 scale in rural areas and 1:10 000 scale in
mountain and moor land areas (www.ordnancesurvey.co.uk/oswebsite/products/
osmastermap/). OS MasterMap represents an engineering improvement to
Land-LineH, Ordnance Survey’s original large-scale digital product. Land-Line
originates from a time when the anticipated output was a cartographic map.
Although OS MasterMap represents a very significant improvement in terms of its
structure, which enabled it to become suitable for use with GIS, it still has areas that
reflect its cartographic heritage. Contemporary uses of OS MasterMap extend
beyond the visual, to ideas of analysis and integration with other data sets in order
to derive new types of information. This has required a shift away from a
cartographic focus, to more of a geographical one. For example, in the context of
water features, we might want to model various hydrological processes and physical
interdependencies (such as flood profiling, its economic impact and mitigation). In
this context a geographical model must reflect the surface connectedness and flow
among various water features over the entire network. This is quite different from a
cartographic viewpoint, where we might view a river, say, passing through a city, in
which we see sections of water, ‘broken’ by bridges, or ‘disappearing’ when flowing
into tunnels or culverts. In these instances we rely on the viewer to infer connection
at the point of re-emergence. OS MasterMap has been formed from this
cartographic perspective, yet the requirements for cartographic portrayal are quite
different from the requirements for modelling flow. The aim of this paper is to
report on a project that took as its starting point the cartographic representation of
a river and, using a range of techniques, converted it into a fully connected network,
in anticipation of its use in hydrological modelling, navigation and environmental
applications (such as water management, flood prevention and pollution control).
The paper begins with a description of the format and structure of OS
MasterMap, identifying the weaknesses in the current model. Section 3 describes
our strategy for enriching the data in the hydrographic layer of OS MasterMap in
order to support the process of creating a connected river network. Sections 4, 5 and
6 detail the three main parts of our solution, respectively the creation of
geographical, topological and statistical models used to enrich the current data.
The last section presents some results and a discussion that illustrates the degree of
automation achieved and how human intervention can be directed to solutions with
a low confidence.
2. OS MasterMap
OS MasterMap is large-scale digital map data designed by the Ordnance Survey,
covering Great Britain, supplied in geographical markup language (GML). GML is
a spatially enabled dialect of the XML schema (Lake et al. 2004) intended to
support a wide range of customer applications. It includes topographic information
for many different features (both natural and anthropogenic). Real-world objects
are represented in the form of polygon, point, line and text features—each feature
with its own unique topographic identifier. OS MasterMap currently contains four
separate layers—topographic, address, imagery and integrated transportation
network (ITN) layers. The ITN layer does contain a topologically connected layer,
but this relates solely to the road network. The focus of this research is on the
topographic layer, which contains detailed surface features of the landscape, under
nine themes, one of which is water. The data model supports storage of various
physical water features such as canals, lakes, reservoirs and rivers. Rivers and
streams are shown at their true scale width (in a polygonal form) or by a single line
where the width is less than 1 meter in urban areas, and 2 meters in rural areas (OS
2004). There is no representation of water features where they are obscured by other
612 N. Regnauld and W.A. Mackaness
objects (such as bridges). Figure 1 shows a sample of OS MasterMap data, showing
only hydrographic features. The breaks in the river segments are indicative of
bridges, dam walls or other features that intersect the river.
As can be seen from figure 1, there is no inherent structure in the network (it is a
disconnected network), the features are comprised of polygons of varying shape,
and the attribution is such that it is not possible to differentiate between rivers,
water pounds, and lakes. It is therefore of limited use in hydrological modelling. For
example, it is not possible to model flow through the network or to characterise it in
any way, for example through Hortonian modelling (Horton 1945, Strahler 1964).
The aim of this research was to link together broken sections of river in order to
create an integrated river network that would be able to support various types of
hydrological modelling.
3. Enhancement of MasterMap to support the creation of a river network
A range of factors governs the likelihood of two disjoint rivers being one and the
same. For example, factors include the local morphology of the land, the shortness
in distance between two river sections, and the angle at which the two rivers meet.
Where these factors act to corroborate one another, then a high degree of certainty
can be attached to the link that joins those two river sections. The converse is also
true; one may be less certain of how rivers are connected in cities covering relatively
flat ground, with long stretches of culverting and where permutations exist among a
number of joining rivers. Given this variation in certainty, a likelihood value was
Figure 1. Hydrographic features in MasterMap (Survey mapping # Crown copyright. Allrights reserved. Media licence W01).
Creating a hydrographic network 613
attached to each link that reflected the degree of certainty with which a join had
been made. This value was stored as metadata in anticipation of future spatial
analysis but could also be used as a basis for directing human intervention in
interactive editing environments, or as a basis in identifying areas in need of moredetailed ground-truthing.
Three models were created to support the process of converting the water theme
of OS MasterMap from a cartographic to a geographic model. The first, the
geographic model, entailed making meaningful geographic entities (such as lakes,
connected rivers) from a collection of general ‘water body’ polygons. This wasrequired because of the weak entity and attribute definitions that existed in the
classification of water features in OS MasterMap. The second, the topological
model, stored information on the connectivity of the river network. Initially this
took the form of a set of subgraphs, because of the disconnected nature of the data
(figure 1). The third, the statistical model, contained information about the degree of
certainty associated with connection of the subgraphs. A number of analytical steps
took place within this model in deriving this statistic—taking into account slope, the
physical length of the gap between rivers, and the angle at which they joined. Thismodel explicitly stored information on what links had been added in order to make
the topologically connected river network.
3.1 Building the geographic model
The geographic model recorded representations of the various types of hydro-
graphic features, as well as the links between them. A number of cartometric
techniques were used to distinguish automatically between rivers and lakes. Twomain stages are needed to compute the geographic model from OS MasterMap:
N Identify rivers and lakes: the initial MasterMap polygons needed to be
reclassified into rivers or lakes in order that they could be subsequently
associated with a node or a link in the network. By examining the width of
polygons it was possible to identify the split points between lakes and rivers.
N Form the skeleton of the river: the results from the first stage were thenskeletonised to provide the linear, medial axis of the river polygons. This
provided a node and edge model that could be used as a basis for recording
connections and confluences between sections of river and lakes.
3.2 Adding a topological model
The topological model stored the adjacency relationships between the objects of the
geographic model. This allowed fast traversal of the network. The topological modelwas comprised of nodes and links. Every river was associated with a link, every lake,
river intersection and ‘ends’ of each river section had a node associated with them.
We kept dual references between objects of the geographic model and their
corresponding objects in the topological model. Such a referencing system would
enable MasterMap hydrology to be used for both analysis, and display.
3.3 Building a statistical model
The topological model allows us to identify all the disconnected hydrographic
subgraphs; disconnected by virtue of the existence of bridges and culverts. The
614 N. Regnauld and W.A. Mackaness
statistical model records the links (formed using a range of criteria to determine the
most likely connections) and the degree of certainty associated with that link.
The region selected as test data was in Lancashire, a region of the West Pennine
Moors, midway between Manchester and Blackburn in the UK (defined by a box
with latitude and longitude values of: 53u35960N, 2u369200W, and 53u409320N,
2u279200W). The area was selected because of its mix of rural and small urban areas,
the range of elevations (varying from 150 to 320 m above sea level), and variety ofhydrological features (lakes, rivers) both natural and anthropogenic.
4. Building a geographic model of the hydrology
A hydrological model was built in which hydrographic objects were stored in two
classes: rivers and lakes. Lakes were represented as polygons with a single node,
while rivers were recorded in linear form. This added value to the data model for
several reasons:
N the rivers can be modelled in graph theoretic form (Hartsfield and Ringel 1990,
Wolf 1992), irrespective of whether or not they are made up of polygons or
lines;
N the model could support generalisation operations (Mackaness and Beard
1993). The way a lake is generalised is quite different from the way we
generalise a river, so we need to be able (1) to differentiate between the two,
and (2) to maintain connection between the river and the lake during and after
the generalisation process;
N the process of symbolisation is made simpler. Representing rivers at small scale
will only require us to apply symbology to the river centre line, which is muchsimpler than having to generalise river polygons and join them to other linear
representations of the river.
4.1 Splitting the features into rivers and lakes
In the absence of adequate attribution, it was necessary to devise a cartometric
method of differentiating between rivers and lakes based on the variation in width.
The first step consisted of amalgamating all of the adjacent hydrographic polygons.
Figure 2 is a sample of MasterMap data revealing the rather arbitrary form of the
Figure 2. Initial hydrographic polygons in MasterMap (Survey mapping # Crowncopyright. All rights reserved. Media licence W01).
Creating a hydrographic network 615
composing polygons (this being an artefact of how the data was originally digitised).
Figure 3 shows the effect of dissolving those boundaries. This result was used as
input to the next stage: namely identifying the point at which a river becomes a lake.
Splitting the amalgamated polygons between rivers and lakes followed a five step
process.
N Step 1. The polygon vertices were indexed according to the width of the water
feature. At every vertex of a polygon, the perpendicular distance to the other
side of the polygon was calculated and classified into three categories according
to that length (the idea being to classify them according to whether they were
likely to be part of a river section, part of a lake section, or something in
between). Category 1 (shown black in figure 4) corresponds to a distances of
less than 15 m; category 2 (shown white in figure 4), are distances between 15
Figure 3. Result after amalgamation of polygons (Survey mapping # Crown copyright. Allrights reserved. Media licence W01).
Figure 4. Initial classification of the vertices following step 1.
616 N. Regnauld and W.A. Mackaness
and 50 m, and category 3 (shown grey in figure 4), are vertices with
perpendiculars greater than 50 m.
N Step 2. ‘Smooth’ the riverside classification. The polygon boundaries were
searched for continuous chains of vertices that were not category 1, that
formed a section of a polygon edge that was shorter than 10 metres in length,
and bounded by vertices of category 1. These vertices were reclassified to
category 1. This reclassified anomalous localised sections of river that were
wider than 15 m (bulges in a section of river) as well as ensuring that the ‘ends’
of rivers (typically where they abut bridges) were also classified as such.
N Step 3. ‘Smooth’ the lakeside classifications: Where a lake boundary is
convoluted, some vertices in small recesses may initially be classified in
category 1. To avoid this local effect, we reclassified all the chains of category 1
vertices that were shorter than 10 m and bounded by category 2, into category2 vertices.
N Step 4. Distinguish between rivers and lakes by eliminating category 2. All
Category 2 vertices that were bounded on either side by category 1 vertices,were reclassified into category 1, and the remainder into category 3 vertices. At
this stage, all the vertices were classified either as category 1 or 3, i.e. either a
river or a lake.
N Step 5. Find the boundary between a river and a lake: For each end vertex of ariver of category 1, find the closest vertex on the opposite bank of the river.
Keep the shortest couple (end vertex – opposite vertex) as the limit between the
river and the lake (figure 5 illustrates the end result of this process).
Differentiation by width appealed as an intuitive solution to the problem and was
straightforward to implement. The width-measuring parameter was adjusted by
empirical examination of results, and for the given region was found to produce
consistent results in about 90% of cases (but see section 7.1). Some problems did
Figure 5. Results of automatic identification of the limit between rivers and lakes.
Creating a hydrographic network 617
arise, however. For example, the algorithm classified small ponds as rivers (because
of their small width). A solution might be to develop a compactness measure
(Unwin 1981) in order to identify ponds. At this early stage of processing, such a
measure would likely classify isolated sections of rivers as ponds (for example, short
sections of river between two bridges that were close). It would therefore be unwise
to use this rule at this stage of processing but instead to wait until after the complete
network had been created.
4.2 The centre line model within the geographical model
The next phase focused on creating a centreline model as part of the geographical
representation of the rivers. This was done by applying a skeletonisation algorithm
to the river polygons. A range of techniques and software already exist to undertake
this task (Ballard and Brown 1982, Davies 1990, Costa 2000, McAllister and
Snoeyink 2000). The algorithm used in this work was based on Delaunay
Triangulation, using a technique proposed by Tsai (1993). Before computing the
triangulation, it was first necessary to resample the river outlines at a finer level,
namely 1 m intervals. This was done in order to produce a smooth medial axis. The
triangulation produces three types of triangles inside the polygon, each one having a
different role in the construction of the skeleton.
N Triangles that share no edge with the polygon: they will produce a fork in the
skeleton. The centre of gravity of the triangle will be the fork point, and three
branches will connect it to the middle of the three edges of the triangle.
Examples of these are annotated ‘fork point triangle’ in figure 6.
N Triangles that share two edges with the polygon: they will be at the end of a
branch of the skeleton. The branch will come through the middle of the third
edge and will be prolonged till joining the polygon outline. These are annotated
‘end triangles’ in figure 6.
N Triangles that share one edge with the polygon. They will be crossed by a
branch of the skeleton, going through the middle of the two other edges of the
triangle. They are annotated ‘branch triangle’ in figure 6.
Figure 6 shows the triangulation of a polygon and the resulting skeleton using this
method. A final step in this process was to reduce the number of bifurcating dead
ends of the skeleton, and to improve the quality of termination at the end of the
river polygon. The filtering of the dead-ends was done by comparing the length of
the dead-ends against either the width of the river at the fork or a threshold. Any
dead-ends shorter than the river width or 5 m were discarded. If a fork occurs at the
end of the skeleton, the two branches are discarded and the previous branch
extended until it intersects with the end edge of the river polygon. A fork is defined
as joining dead-ends, both shorter than 5 meters. In the special case where the river
joins a lake, the end of the skeleton is forced to the middle of the separation line
between the river and the lake. A comparison between figures 6 and 7 show the effect
of this process.
5. Adding a topological model to the hydrographic data
The next phase was to build the topological model that would enable rivers and
lakes to be linked. Here, lakes were treated as single nodes. For each river object
618 N. Regnauld and W.A. Mackaness
from the geographic model, we created a link in the topological model. An edge is a
logical link between two nodes. Three types of nodes were distinguished:
N lake nodes: where a river section is adjacent to a lake. The node associated with
the lake is used as the end node of the topological edge;
N confluence nodes: where two or more river sections join together. A node is
created at their junction; and
N end nodes: where a river section has a free end (not connected to another river
or lake). A node is created at its end vertex.
Each node consists of a pair of coordinates, a reference to a lake object in the
geographic model in instances where the node corresponds to a lake, and a list of
references to the edges connected to it. Each edge is made of two references to its end
nodes, and a reference to its associated river object. An example of a topological
graph created from MasterMap data is given in figure 8.
This topological layer provided a framework by which river segments could be
reconnected. The connection of the river segments into a topological network would
Figure 6. Skeletonisation process using Delaunay triangulation.
Creating a hydrographic network 619
then be able to support queries such as: how many rivers flow into a specific lake, or
which river connects these two lakes—questions that are fundamental to
hydrological modelling.
6. Automatic reconnection of the network
At this stage of the process, the MasterMap data set was comprised of a large
number of small independent subgraphs. This lack of connectivity was due to the
cartographic nature of the database. In this section we present the methodology
used to automatically create linkages between these apparently independent graphs.
The first stage was to identify the many disconnected parts of the graph, termed the
‘initial subgraphs’. The system then created a set of candidate links that might
Figure 7. Example output resulting from the automatic derivation of the centreline.
620 N. Regnauld and W.A. Mackaness
potentially reconnect these initial subgraphs. The candidates were ranked according
to proximity, flow direction and continuity, and the highest ranked was chosen as
the link between the subgraphs.
6.1 Creating the initial subgraphs
As an initial step, we identified all the distinct subgraphs and stored them as
independent objects, which for the purposes of reconnecting, could themselves be
considered as nodes. In essence, the algorithm that created the candidate links
sought to connect all these nodes. An iterative process was used to build the
subgraphs, by first picking a link which was not part of an already built subgraph,
finding all the connected links, and creating the corresponding subgraph. This
process is repeated until all the links had been processed.
6.2 Creating the candidate links to connect the subgraphs
When connecting the initial subgraphs, it was assumed that the most likely
candidate was the one that was closest. Therefore for each subgraph, we search
among the remainder, for subgraphs (lakes or rivers) falling within a given radius of
that subgraph. A connection was made between the subgraph and its nearest
subgraph. Normally subgraphs are connected via nodes already existing in the other
subgraph. Where the subgraphs are rivers, and the minimum distance between the
two lies between nodes, a new node was added. Figure 9 illustrates why it is
necessary to do this in order to avoid creating unlikely links between two candidate
subgraphs.
Each new link was labelled with a confidence value, inversely proportional to the
linking distance. In essence, the shorter the distance, the greater the likelihood of it
being a correctly proposed link. The confidence value of a link was computed as
follows:
Figure 8. The graph (comprising dark nodes and straight edges) overlain the polygonallakes and river centre line.
Creating a hydrographic network 621
Confidence~100�Search radius mð ÞLength of link mð Þ
where ‘length of link’ is the length of the proposed link between the two subgraphs,
and the ‘search radius’ is the maximum length of a potential link. By empirical
observation, an effective value for the search radius was found to be 100 m;
reflecting the idea that structures crossing rivers are rarely wider than 100 m. It is
invariably the case that it is only in urban areas that we might find instances where
streams are channelled underground for distances greater than 100 m. As discussed
later, a number of challenges present themselves when interpreting connections in
urban environments. Figure 10 shows an example of the candidate links emanating
from a sample node. The higher the value, the greater the confidence.
The complexity of river networks meant that distance alone was not a sufficient
basis for guaranteeing a correct or realistic link between two subgraphs. Since rivers
flow downhill, local elevation data can be used to refine this process; it is also the
case that rivers tend to curve gently, therefore the angle of curvature can also be
used. These two ideas are now described in more detail.
Figure 9. Illustration of the need to connect rivers using shortest distance intersection ratherthan node to node connections.
Figure 10. Example of candidate links. The likelihood of correctness value (as a percentage)has been placed alongside each potential link.
622 N. Regnauld and W.A. Mackaness
6.3 Utilising the talweg to select likely candidate links
A candidate link is likely to follow a descending line of slope in the valley containing
the two relevant river subgraphs. Such information, derived from a digital terrain
model (DTM) can thus be used to inform on the likelihood of two subgraphs being
connected. Various authors have highlighted the potential for using DTMs in
studies of terrain characterisation, and delineation of drainage networks and
watershed boundaries (Haggett and Chorley 1968, Jenson 1991, Lee and Snyder
1991, Hogg et al. 1993, McCormack et al. 1993, Martinez-Casasnovas and Stuvier
1998, Wood 1998). DTMs have been used as a basis for modelling surface water
hydrology, deriving flood characteristics, channel gradients, and predicting stream
discharge. The quality of the solution is dependent upon the precision of the DTM
and morphological complexity of the terrain. We used the term ‘talweg’ to describe
the set of points defining the lowest points along the valley floor. In essence, if a link
was found to follow the line of the talweg, among a set of candidate links, it is more
likely to be the correct one. The talweg was calculated using a flow accumulation
model derived using OS’s Profile DTM data set—detailed height data defining the
physical shape of the landscape of Great Britain. The source comes from contours
surveyed at 1:10 000 scale and the height accuracy of the DTM is ¡5.0 m in
mountain and moorland area, and ¡2.5 m in other areas. The DTM was first used
to calculate a flow direction for each cell. Each cell had a flow accumulation value—
an integer number that represented the number of upstream DTM cells whose flow
paths ‘passed through’ that given DTM cell (Mark 1984, McCormack et al. 1993).
Once the talweg network had been created, it was split into branches (where a
branch was defined as a polyline with no junction). For each branch, the algorithm
identified all sections of the river network that were in close proximity to that talweg
branch. This was done by buffering the talweg, and finding all river sections that fell
within the buffer. From these sections we identified the subgraphs that needed to be
connected in order to preserve the match between the river network and the talweg
network.
Using the test data set, it was determined from empirical observations that a
buffer of radius 100 m was sufficient to ‘capture’ the river subgraphs. Figure 11
shows a sample of the hydrographic network from the trial data set, in which the
talweg network is superimposed (the angular thicker line) together with the
associated buffer. The centreline of the underlying rivers is also presented,
illustrating how the talweg was used to ‘capture’ those graphs that crossed the
buffer zone.
For each talweg branch in turn, we examined the subgraphs falling within the
buffer, identifying all the potential links that connect two of these subgraphs, and
assigned them a ‘talweg confidence value’. The talweg confidence value expressed
the degree of fit between the connection provided by the link and that of the talweg
network. It was calculated as the average distance between the proposed link and the
talweg line (as illustrated in the two separate examples shown in figure 12). The
lower the average, the better the fit. Thus for the two different examples in figure 12,
a higher confidence would be attached to (a) than (b).
The confidence value associated with each candidate link which lay within the
talweg buffer was related to the ratio between the average distance from the link to
the talweg and the radius of 100 m. Thus the confidence value varied from 0 for a
link which is outside the talweg network influence, to 1 for a link which was
precisely coincident with a section of talweg. The following formula was used to
Creating a hydrographic network 623
compute the talweg confidence value.
Talweg Confidence~100� 1{ Average distance=radiusð Þð Þ
6.4 The degree of continuity
Continuity relates to one of the principal criteria (often referred as ‘good
continuation’) that influences the human perception of groups of objects
(Thorisson 1994). It is especially well suited to the analysis of linear features, and
is something that cartographers take advantage of, for example in allowing text
placements to ‘break’ across linear features. It has proved very valuable to the
process of street prioritisation as part of the process of street network generalisation
(Thomson and Richardson 1999). For this research we computed a continuity value
for each candidate link depending on the degree to which a link ran parallel to the
sections of graph either side of the link. In effect, this value was the combination of
two continuity components, one at each end of the link. Figure 13 illustrates four
different cases, where the continuity value associated with the candidate link (thin
line) decreases from left to right.
Figure 11. The talweg is overlaid and buffered as a basis for ‘capturing’ the underlyingsubgraphs.
Figure 12. Examples illustrating the calculation of the goodness of fit between the talwegand river centreline in order to calculate the confidence of them being one and the same.
624 N. Regnauld and W.A. Mackaness
The need for this additional criteria arose from observations that in several places
candidate links were proposed by the system, whereas visually we would have
chosen a different one. In figure 14 we show one such example, where in the absence
of taking into account the continuity, the system created a less likely link; the correct
answer is shown dashed.
As this example shows, the more likely solution is driven by the continuity of the
lines either end of the proposed link—in which we minimise the deviation of the
angle between the link and the subgraph river. The continuity value was taken as the
average of the continuity values associated with each end of the link. At each end of
the link, the value related to the difference of orientation between the link and the
local orientation of the river at the node where the link ends. When the end of a link
does not meet a river ‘end on’ (for example as in figure 9), then the weight on this
side of the link is set to 0. This is done to give preference to links that connect to
rivers end on. The issue then becomes how the angle between the link and the end
section of the river subgraph is calculated. A simple solution would be to measure
the angle between the link and the last edge of the polyline defining the centreline of
the river. This was found to be not always reliable since the skeletonisation process
sometimes created an edge that did not reflect the perceived centreline of the river.
For this reason a second angle was recorded, between the link and the edge
associated with a distance 5 m from the end of the centreline. By comparing the two
angles we can determine if the angle reflects the approximate centreline or not. We
compute the angular deviation between the link orientation and the two local
orientations of the river, and keep only the smaller of the two deviations.
Figure 13. Qualitative views on the likelihood of a connection based on shared orientationbetween the subgraphs and the link.
Figure 14. An incorrect solution (dark line) illustrating the need for some sort of continuitymeasure (correct answer shown dashed).
Creating a hydrographic network 625
Figure 15 illustrates how the difference of orientation were computed for one end
of the link, I (shown as the thin straight edge). The first angle, a?
is the angle between
the link and the last edge of the centreline. The second angle to be computed was the
angle between the link and the edge IA, where A is situated at the curvilinear
distance of 5 m from I along the river line. This resulted in the angle difference b.
The chosen value would be a since it is less than b.
From the difference of the angle between the link and the river polyline, we
compute a continuity value that lies between 0 and 100, using the formula:
continuityI~100�p=2ð Þ{a
p=2
� �
Where a>(p/2), the continuity value will be negative, so for the purposes of this task,
in such cases the value was set to 0. The final continuity value for the link is the
average of the continuity values at its two ends.
6.5 Selecting the candidates to reconnect the subgraphs
The selection of the link among candidate links was done in two stages. The first
stage was to consider the talweg, and to connect the subgraphs using only those
links that lay within the talweg buffer (i.e. those having a talweg confidence value
greater than 0). If this did not create a link, candidate links outside the talweg buffer
were considered. During both stages, links were built up using a minimum spanning
tree construction method (Kruskal 1956, Prim 1957, Dijkstra 1960). The principle
was to start from a node (representing an entire subgraph), and select the edge
having the highest confidence connected to that node (the edge being a candidate
link). The two nodes connected by the chosen edge formed a cluster, which was
further expanded by selecting an edge not currently part of that cluster and with the
next highest confidence value. This process was repeated until all the nodes were
part of the same cluster. The value used to select the candidate links was computed
using the following formula:
If the talweg_confidence is greater than 0 (i.e. the link lies within the talweg buffer),
The weightings (multipliers of 2 and 3) are there to give greater importance to
short links (the shorter the link, the greater the likelihood that the candidate link
was correct) than to the fact that the link and the river share a similar angle of
continuity. The continuity value was used to differentiate between two links having
Figure 15. Calculations of relative orientation and their tolerances.
626 N. Regnauld and W.A. Mackaness
similar distance_confidence (we would not want its influence to be greater than the
talweg confidence).
7. Results and analysis
Figure 16 shows a sample of the results obtained using the test data set (around the
small village of Belmont, midway between Manchester and Blackburn, England).
The links generated by this system (together with their confidence values) can be
seen as thin straight lines. Some links seem visually correct, others seem circumspect,
while some appear to be missing (perhaps because of the size of the gap or relative
continuity). A low confidence value reflects high uncertainty, and vice versa.
This output was assessed in a number of ways; by viewing MasterMap data, by
inspection of aerial photography and by ground survey. It was possible to infer the
correctness of results in rural areas using the aerial photography (although, of
course, the photography itself captures features overlying the network, not the
network alone). To that extent, aerial photography was only marginally better than
making inference from inspection of the MasterMap data. In urban areas, neither
source was a sufficient yardstick, and ground-truthing was undertaken via field-
based inspections. Even field-based inspection could not verify every instance. In the
case of long-distance urban conduits, it was determined that specialist skills and
equipment would be needed to determine the exact hydrological connectivity and
flow. Each link was marked as either being correct, or of type I (the link should have
been made but was not), or Type II (a link was made where no real world
Figure 16. Sample data showing straight-line connections (labelled with their associatedlikelihood value) between the river subgraphs. Latitude and longitude for a bounding box,bottom left and top right: N: 53u379540, W: 2u30970 and N: 53u389260, W: 2u289570.
Creating a hydrographic network 627
connection exists). Overall 66% of the inferred links were found to be correct.
Figure 17 summarises the distribution of correct and incorrect links according to the
assigned confidence value. The y axis is the percentage of links, and the x axis is the
range of confidence values. From the graph it can be seen that all those links with a
calculated confidence value of 80% or greater were indeed found to be correct. All
those with a confidence value of 50% or less were found to be incorrect. Links with
confidence values between 50% and 80% were mixed in their correctness. Therefore,
from a human–computer interaction point of view, it would be appropriate to
highlight and field check all links with confidence values of between 50% and 80%.
Thus the confidence value is a simple mechanism for guiding users only to those
links likely to be wrong, rather than leaving the user to guess which links should be
inspected.
Although the test has been conducted on a single data set covering approximately
65 km2, it is envisaged that the same kind of results would be obtained on different
data sets. The only restriction is on urban data sets, where large portions of the
hydrographic network run underground. Experimenting on different data sets (such
as the Quantock Hills or the Somerset levels) would allow us to tune the model
according to regional morphologies, and urban/rural regions. The model is only ever
intended to connect surface networks and would not be effective in limestone
regions, for example. Just as others have noted (McCormack et al. 1993), urban
areas remain problematic. The algorithm provides a good solution in rural areas,
allowing human resources to be focused on the urban.
7.1 Further work
The process of triangulation (as part of the skeletonisation process) was found to be
extremely computationally intensive. Switching to a proper constrained triangula-
tion (Chew 1989) instead of simulating it by resampling the initial geometries would
dramatically improve the performance and robustness of this component of the
solution. Additionally, the confidence value could be further refined by examining
Figure 17. Histogram showing the percentage frequency of correct and incorrect links andtheir associated final probability intervals.
628 N. Regnauld and W.A. Mackaness
those geographic features associated with the link. For example, information of a
bridge or road located close to the link could be used to reinforce the likelihood of
the link being correct. Conversely if the link crossed open land, then the link
probably did not correspond to a ‘hidden’ water feature, because there was nothing
to hide it. This approach might help in karst scenery, where rivers that ‘disappear’
and re-emerge reflect a reality rather than indicate an obscuring built structure. We
have not carried out research using this refinement, because in the version of
MasterMap used in this project, the classification of such objects was not sufficiently
explicit to allow this line of enquiry.
It was intended that a generic set of parameters would be devised, such that the
algorithm could be applied to any landscape. But there is enormous variability in
geology (limestone, granite, etc.), in landscape (complex mountain chains, karst
scenery, flat deltaic structures, etc.), and in modification of river networks from
human activities (canalisation, river straightening, creation of drainage networks in
low-lying areas). Furthermore, the size of hydrological features varies enormously
from narrow headstreams to large meandering estuarine structures. Consequently, it
was necessary to paramaterise the algorithm by region type. Further work is
therefore required to define a minimum set of landscape types (and associated
parameter settings) that would secure the greatest percentage of correct solutions.
The algorithm assumed that the river network was acyclic. However, anthro-
pogenic drainage structures (such as are found in the Fens of Norfolk) may indeed
be cyclic. In these cases the algorithm would stop connecting rivers once an acyclic
graph was formed—thus sometimes falling short of a complete solution. Any
solution is further complicated by the flatness of such regions. These would be cases
that would require much greater ground-truthing.
8. Conclusion
This research sought to develop a geographic model of the river network and
associated water features within an OS MasterMap data set. The input data set
reflected a cartographic perspective—whereby water features were a mix of polygon
and linear geometries, ‘broken’ by other features such as roads that crossed them.
Although OS MasterMap supports attribution of different water features, in reality
the input data set was attributed in a very simple manner (labelled simply ‘inland
water’), and was made up of unordered, miscellaneous, disjoint collections of
polygons and linear features. This required a lot of data cleaning, and development
of automated techniques to both detect and form meaningful geographical entities
(rivers and lakes). The methodology relied on aggregation of adjacent area features
and their subsequent reclassification into two classes, rivers and lakes, based upon
shape analysis. A centreline representation of rivers was an essential pre-process
stage prior to connecting the river sections. The process of reconnection was based
on three criteria: degree of continuity, the proximity, and the fit with the line of
slope (based on a DTM). The region to which the algorithm was applied to was
chosen deliberately because of the variability in topography, link lengths, and mix of
rural and urban structures. The Research and Innovation group within Ordnance
Survey is exploring the idea of including a topologically connected hydrology
network similar in detail and resolution to the ITN Transportation theme. The
results presented here indicate that it is feasible to partially automate the process by
which a large-scale, detailed, integrated river network could be included to give
added value to OS MasterMap. This would be of specific interest to environmental
Creating a hydrographic network 629
engineers, and those interested in the analysis of various aspects of river networks,
including their visualisation and generalisation.
Acknowledgements
The authors are very grateful for funding of this research by the Ordnance Survey,
and the constructive and helpful comments of the reviewers.
ReferencesBALLARD, D. and BROWN, C., 1982, Computer Vision, Chapter 8 (Englewood Cliffs, NJ:
Prentice-Hall).
CHEW, L.P., 1989, Constrained Delaunay triangulation. Algorithmica, 4, pp. 97–108.
COSTA, L. DA F., 2000, Robust skeletonization through exact Euclidean distance transform
and its application to neuromorphometry. Journal of Real-Time Imaging, 6, pp.
415–431.
DAVIES, E., 1990, Machine Vision: Theory, Algorithms and Practicalities, pp. 149–161
(London: Academic Press).
DIJKSTRA, E.W., 1960, Some theorems on spanning subtrees of a graph. Indag. Math, 22, pp.
196–199.
HAGGETT, P. and CHORLEY, R.J., 1968, Network Analysis in Geography (New York: St
Martins Press).
HARTSFIELD, N. and RINGEL, G., 1990, Pearls in Graph Theory—A Comprehensive