Research Article Creating a hydrographic network from its ... · order to support the process of creating a connected river network. Sections4, 5 and 6 detail the three main parts

Research Article

Creating a hydrographic network from its cartographic representation: acase study using Ordnance Survey MasterMap data

NICOLAS REGNAULD* and WILLIAM A. MACKANESS{

Institute of Geography, School of GeoSciences, The University of Edinburgh,

Drummond St, Edinburgh EH8 9XP, UK

(Received 28 August 2004; in final form 11 January 2006 )

A meaningful hydrological network is critical to spatial analysis and modelling.

‘Meaningful’ in that it is topologically correct, provides a basis for modelling

flow and differentiates between different types of water features. In Great

Britain, large-scale digital mapping of hydrological features was captured from

paper maps and had a cartographic emphasis that had poor attribution, and no

underlying model that supported geographical modelling. This emphasis gave

rise to rivers and lakes that were variously ‘broken’ into sections by features such

as dams, bridges, and culverts. This paper reports on research to create

automatically a topologically connected hydrological network that underpins the

detailed cartographic representation of such features. The network was created

by joining these hydrographic features together according to rules of both

continuity and proximity between river sections, and their flow direction (using

an underlying digital elevation model). Confidence values were associated with

each section link reflecting the certainty of that connection. The confidence

values provided a basis for directing human intervention to uncertain

connections as part of the final editing process. The project took as its input

OS MasterMap ‘water feature’ data. A skeletonisation process was used to create

the medial axis of the network. The paper reports in detail the methodology, the

implementation and evaluation. The algorithm worked well in rural areas where

interruptions are small and there is greater variation in height. In urban areas the

challenges were greater where typically relatively long sections of river may be re-

engineered and culverted, and where the fidelity of the digital elevation model

was insufficient to discern the subtle changes in elevation.

Keywords: Data modelling; Hydrographic network topological modelling

1. Introduction

OS MasterMapH is one of a series of digital products engineered by the Ordnance

Survey, the National Mapping Agency of Great Britain. OS MasterMap is a digitalmap designed by Ordnance Survey for use with geographical information systems

(GIS) and databases. This data set has a high level of detail, being captured at

1:1 250 scale in urban areas, 1:2 500 scale in rural areas and 1:10 000 scale in

mountain and moor land areas (www.ordnancesurvey.co.uk/oswebsite/products/

osmastermap/). OS MasterMap represents an engineering improvement to

{Corresponding author. Email: [email protected]

*Present address: Research and Innovation, Ordnance Survey, Romsey Road,Southampton SO16 4GU. Email: [email protected]

International Journal of Geographical Information Science

Vol. 20, No. 6, July 2006, 611–631

International Journal of Geographical Information ScienceISSN 1365-8816 print/ISSN 1362-3087 online # 2006 Taylor & Francis

http://www.tandf.co.uk/journalsDOI: 10.1080/13658810600607402

Land-LineH, Ordnance Survey’s original large-scale digital product. Land-Line

originates from a time when the anticipated output was a cartographic map.

Although OS MasterMap represents a very significant improvement in terms of its

structure, which enabled it to become suitable for use with GIS, it still has areas that

reflect its cartographic heritage. Contemporary uses of OS MasterMap extend

beyond the visual, to ideas of analysis and integration with other data sets in order

to derive new types of information. This has required a shift away from a

cartographic focus, to more of a geographical one. For example, in the context of

water features, we might want to model various hydrological processes and physical

interdependencies (such as flood profiling, its economic impact and mitigation). In

this context a geographical model must reflect the surface connectedness and flow

among various water features over the entire network. This is quite different from a

cartographic viewpoint, where we might view a river, say, passing through a city, in

which we see sections of water, ‘broken’ by bridges, or ‘disappearing’ when flowing

into tunnels or culverts. In these instances we rely on the viewer to infer connection

at the point of re-emergence. OS MasterMap has been formed from this

cartographic perspective, yet the requirements for cartographic portrayal are quite

different from the requirements for modelling flow. The aim of this paper is to

report on a project that took as its starting point the cartographic representation of

a river and, using a range of techniques, converted it into a fully connected network,

in anticipation of its use in hydrological modelling, navigation and environmental

applications (such as water management, flood prevention and pollution control).

The paper begins with a description of the format and structure of OS

MasterMap, identifying the weaknesses in the current model. Section 3 describes

our strategy for enriching the data in the hydrographic layer of OS MasterMap in

order to support the process of creating a connected river network. Sections 4, 5 and

6 detail the three main parts of our solution, respectively the creation of

geographical, topological and statistical models used to enrich the current data.

The last section presents some results and a discussion that illustrates the degree of

automation achieved and how human intervention can be directed to solutions with

a low confidence.

2. OS MasterMap

OS MasterMap is large-scale digital map data designed by the Ordnance Survey,

covering Great Britain, supplied in geographical markup language (GML). GML is

a spatially enabled dialect of the XML schema (Lake et al. 2004) intended to

support a wide range of customer applications. It includes topographic information

for many different features (both natural and anthropogenic). Real-world objects

are represented in the form of polygon, point, line and text features—each feature

with its own unique topographic identifier. OS MasterMap currently contains four

separate layers—topographic, address, imagery and integrated transportation

network (ITN) layers. The ITN layer does contain a topologically connected layer,

but this relates solely to the road network. The focus of this research is on the

topographic layer, which contains detailed surface features of the landscape, under

nine themes, one of which is water. The data model supports storage of various

physical water features such as canals, lakes, reservoirs and rivers. Rivers and

streams are shown at their true scale width (in a polygonal form) or by a single line

where the width is less than 1 meter in urban areas, and 2 meters in rural areas (OS

2004). There is no representation of water features where they are obscured by other

612 N. Regnauld and W.A. Mackaness

objects (such as bridges). Figure 1 shows a sample of OS MasterMap data, showing

only hydrographic features. The breaks in the river segments are indicative of

bridges, dam walls or other features that intersect the river.

As can be seen from figure 1, there is no inherent structure in the network (it is a

disconnected network), the features are comprised of polygons of varying shape,

and the attribution is such that it is not possible to differentiate between rivers,

water pounds, and lakes. It is therefore of limited use in hydrological modelling. For

example, it is not possible to model flow through the network or to characterise it in

any way, for example through Hortonian modelling (Horton 1945, Strahler 1964).

The aim of this research was to link together broken sections of river in order to

create an integrated river network that would be able to support various types of

hydrological modelling.

3. Enhancement of MasterMap to support the creation of a river network

A range of factors governs the likelihood of two disjoint rivers being one and the

same. For example, factors include the local morphology of the land, the shortness

in distance between two river sections, and the angle at which the two rivers meet.

Where these factors act to corroborate one another, then a high degree of certainty

can be attached to the link that joins those two river sections. The converse is also

true; one may be less certain of how rivers are connected in cities covering relatively

flat ground, with long stretches of culverting and where permutations exist among a

number of joining rivers. Given this variation in certainty, a likelihood value was

Figure 1. Hydrographic features in MasterMap (Survey mapping # Crown copyright. Allrights reserved. Media licence W01).

Creating a hydrographic network 613

attached to each link that reflected the degree of certainty with which a join had

been made. This value was stored as metadata in anticipation of future spatial

analysis but could also be used as a basis for directing human intervention in

interactive editing environments, or as a basis in identifying areas in need of moredetailed ground-truthing.

Three models were created to support the process of converting the water theme

of OS MasterMap from a cartographic to a geographic model. The first, the

geographic model, entailed making meaningful geographic entities (such as lakes,

connected rivers) from a collection of general ‘water body’ polygons. This wasrequired because of the weak entity and attribute definitions that existed in the

classification of water features in OS MasterMap. The second, the topological

model, stored information on the connectivity of the river network. Initially this

took the form of a set of subgraphs, because of the disconnected nature of the data

(figure 1). The third, the statistical model, contained information about the degree of

certainty associated with connection of the subgraphs. A number of analytical steps

took place within this model in deriving this statistic—taking into account slope, the

physical length of the gap between rivers, and the angle at which they joined. Thismodel explicitly stored information on what links had been added in order to make

the topologically connected river network.

3.1 Building the geographic model

The geographic model recorded representations of the various types of hydro-

graphic features, as well as the links between them. A number of cartometric

techniques were used to distinguish automatically between rivers and lakes. Twomain stages are needed to compute the geographic model from OS MasterMap:

N Identify rivers and lakes: the initial MasterMap polygons needed to be

reclassified into rivers or lakes in order that they could be subsequently

associated with a node or a link in the network. By examining the width of

polygons it was possible to identify the split points between lakes and rivers.

N Form the skeleton of the river: the results from the first stage were thenskeletonised to provide the linear, medial axis of the river polygons. This

provided a node and edge model that could be used as a basis for recording

connections and confluences between sections of river and lakes.

3.2 Adding a topological model

The topological model stored the adjacency relationships between the objects of the

geographic model. This allowed fast traversal of the network. The topological modelwas comprised of nodes and links. Every river was associated with a link, every lake,

river intersection and ‘ends’ of each river section had a node associated with them.

We kept dual references between objects of the geographic model and their

corresponding objects in the topological model. Such a referencing system would

enable MasterMap hydrology to be used for both analysis, and display.

3.3 Building a statistical model

The topological model allows us to identify all the disconnected hydrographic

subgraphs; disconnected by virtue of the existence of bridges and culverts. The


statistical model records the links (formed using a range of criteria to determine the

most likely connections) and the degree of certainty associated with that link.

The region selected as test data was in Lancashire, a region of the West Pennine

Moors, midway between Manchester and Blackburn in the UK (defined by a box

with latitude and longitude values of: 53u35960N, 2u369200W, and 53u409320N,

2u279200W). The area was selected because of its mix of rural and small urban areas,

the range of elevations (varying from 150 to 320 m above sea level), and variety ofhydrological features (lakes, rivers) both natural and anthropogenic.

4. Building a geographic model of the hydrology

A hydrological model was built in which hydrographic objects were stored in two

classes: rivers and lakes. Lakes were represented as polygons with a single node,

while rivers were recorded in linear form. This added value to the data model for

several reasons:

N the rivers can be modelled in graph theoretic form (Hartsfield and Ringel 1990,

Wolf 1992), irrespective of whether or not they are made up of polygons or

lines;

N the model could support generalisation operations (Mackaness and Beard

1993). The way a lake is generalised is quite different from the way we

generalise a river, so we need to be able (1) to differentiate between the two,

and (2) to maintain connection between the river and the lake during and after

the generalisation process;

N the process of symbolisation is made simpler. Representing rivers at small scale

will only require us to apply symbology to the river centre line, which is muchsimpler than having to generalise river polygons and join them to other linear

representations of the river.

4.1 Splitting the features into rivers and lakes

In the absence of adequate attribution, it was necessary to devise a cartometric

method of differentiating between rivers and lakes based on the variation in width.

The first step consisted of amalgamating all of the adjacent hydrographic polygons.

Figure 2 is a sample of MasterMap data revealing the rather arbitrary form of the

Figure 2. Initial hydrographic polygons in MasterMap (Survey mapping # Crowncopyright. All rights reserved. Media licence W01).


composing polygons (this being an artefact of how the data was originally digitised).

Figure 3 shows the effect of dissolving those boundaries. This result was used as

input to the next stage: namely identifying the point at which a river becomes a lake.

Splitting the amalgamated polygons between rivers and lakes followed a five step

process.

N Step 1. The polygon vertices were indexed according to the width of the water

feature. At every vertex of a polygon, the perpendicular distance to the other

side of the polygon was calculated and classified into three categories according

to that length (the idea being to classify them according to whether they were

likely to be part of a river section, part of a lake section, or something in

between). Category 1 (shown black in figure 4) corresponds to a distances of

less than 15 m; category 2 (shown white in figure 4), are distances between 15

Figure 3. Result after amalgamation of polygons (Survey mapping # Crown copyright. Allrights reserved. Media licence W01).

Figure 4. Initial classification of the vertices following step 1.


and 50 m, and category 3 (shown grey in figure 4), are vertices with

perpendiculars greater than 50 m.

N Step 2. ‘Smooth’ the riverside classification. The polygon boundaries were

searched for continuous chains of vertices that were not category 1, that

formed a section of a polygon edge that was shorter than 10 metres in length,

and bounded by vertices of category 1. These vertices were reclassified to

category 1. This reclassified anomalous localised sections of river that were

wider than 15 m (bulges in a section of river) as well as ensuring that the ‘ends’

of rivers (typically where they abut bridges) were also classified as such.

N Step 3. ‘Smooth’ the lakeside classifications: Where a lake boundary is

convoluted, some vertices in small recesses may initially be classified in

category 1. To avoid this local effect, we reclassified all the chains of category 1

vertices that were shorter than 10 m and bounded by category 2, into category2 vertices.

N Step 4. Distinguish between rivers and lakes by eliminating category 2. All

Category 2 vertices that were bounded on either side by category 1 vertices,were reclassified into category 1, and the remainder into category 3 vertices. At

this stage, all the vertices were classified either as category 1 or 3, i.e. either a

river or a lake.

N Step 5. Find the boundary between a river and a lake: For each end vertex of ariver of category 1, find the closest vertex on the opposite bank of the river.

Keep the shortest couple (end vertex – opposite vertex) as the limit between the

river and the lake (figure 5 illustrates the end result of this process).

Differentiation by width appealed as an intuitive solution to the problem and was

straightforward to implement. The width-measuring parameter was adjusted by

empirical examination of results, and for the given region was found to produce

consistent results in about 90% of cases (but see section 7.1). Some problems did

Figure 5. Results of automatic identification of the limit between rivers and lakes.


arise, however. For example, the algorithm classified small ponds as rivers (because

of their small width). A solution might be to develop a compactness measure

(Unwin 1981) in order to identify ponds. At this early stage of processing, such a

measure would likely classify isolated sections of rivers as ponds (for example, short

sections of river between two bridges that were close). It would therefore be unwise

to use this rule at this stage of processing but instead to wait until after the complete

network had been created.

4.2 The centre line model within the geographical model

The next phase focused on creating a centreline model as part of the geographical

representation of the rivers. This was done by applying a skeletonisation algorithm

to the river polygons. A range of techniques and software already exist to undertake

this task (Ballard and Brown 1982, Davies 1990, Costa 2000, McAllister and

Snoeyink 2000). The algorithm used in this work was based on Delaunay

Triangulation, using a technique proposed by Tsai (1993). Before computing the

triangulation, it was first necessary to resample the river outlines at a finer level,

namely 1 m intervals. This was done in order to produce a smooth medial axis. The

triangulation produces three types of triangles inside the polygon, each one having a

different role in the construction of the skeleton.

N Triangles that share no edge with the polygon: they will produce a fork in the

skeleton. The centre of gravity of the triangle will be the fork point, and three

branches will connect it to the middle of the three edges of the triangle.

Examples of these are annotated ‘fork point triangle’ in figure 6.

N Triangles that share two edges with the polygon: they will be at the end of a

branch of the skeleton. The branch will come through the middle of the third

edge and will be prolonged till joining the polygon outline. These are annotated

‘end triangles’ in figure 6.

N Triangles that share one edge with the polygon. They will be crossed by a

branch of the skeleton, going through the middle of the two other edges of the

triangle. They are annotated ‘branch triangle’ in figure 6.

Figure 6 shows the triangulation of a polygon and the resulting skeleton using this

method. A final step in this process was to reduce the number of bifurcating dead

ends of the skeleton, and to improve the quality of termination at the end of the

river polygon. The filtering of the dead-ends was done by comparing the length of

the dead-ends against either the width of the river at the fork or a threshold. Any

dead-ends shorter than the river width or 5 m were discarded. If a fork occurs at the

end of the skeleton, the two branches are discarded and the previous branch

extended until it intersects with the end edge of the river polygon. A fork is defined

as joining dead-ends, both shorter than 5 meters. In the special case where the river

joins a lake, the end of the skeleton is forced to the middle of the separation line

between the river and the lake. A comparison between figures 6 and 7 show the effect

of this process.

5. Adding a topological model to the hydrographic data

The next phase was to build the topological model that would enable rivers and

lakes to be linked. Here, lakes were treated as single nodes. For each river object


from the geographic model, we created a link in the topological model. An edge is a

logical link between two nodes. Three types of nodes were distinguished:

N lake nodes: where a river section is adjacent to a lake. The node associated with

the lake is used as the end node of the topological edge;

N confluence nodes: where two or more river sections join together. A node is

created at their junction; and

N end nodes: where a river section has a free end (not connected to another river

or lake). A node is created at its end vertex.

Each node consists of a pair of coordinates, a reference to a lake object in the

geographic model in instances where the node corresponds to a lake, and a list of

references to the edges connected to it. Each edge is made of two references to its end

nodes, and a reference to its associated river object. An example of a topological

graph created from MasterMap data is given in figure 8.

This topological layer provided a framework by which river segments could be

reconnected. The connection of the river segments into a topological network would

Figure 6. Skeletonisation process using Delaunay triangulation.


then be able to support queries such as: how many rivers flow into a specific lake, or

which river connects these two lakes—questions that are fundamental to

hydrological modelling.

6. Automatic reconnection of the network

At this stage of the process, the MasterMap data set was comprised of a large

number of small independent subgraphs. This lack of connectivity was due to the

cartographic nature of the database. In this section we present the methodology

used to automatically create linkages between these apparently independent graphs.

The first stage was to identify the many disconnected parts of the graph, termed the

‘initial subgraphs’. The system then created a set of candidate links that might

Figure 7. Example output resulting from the automatic derivation of the centreline.


potentially reconnect these initial subgraphs. The candidates were ranked according

to proximity, flow direction and continuity, and the highest ranked was chosen as

the link between the subgraphs.

6.1 Creating the initial subgraphs

As an initial step, we identified all the distinct subgraphs and stored them as

independent objects, which for the purposes of reconnecting, could themselves be

considered as nodes. In essence, the algorithm that created the candidate links

sought to connect all these nodes. An iterative process was used to build the

subgraphs, by first picking a link which was not part of an already built subgraph,

finding all the connected links, and creating the corresponding subgraph. This

process is repeated until all the links had been processed.

6.2 Creating the candidate links to connect the subgraphs

When connecting the initial subgraphs, it was assumed that the most likely

candidate was the one that was closest. Therefore for each subgraph, we search

among the remainder, for subgraphs (lakes or rivers) falling within a given radius of

that subgraph. A connection was made between the subgraph and its nearest

subgraph. Normally subgraphs are connected via nodes already existing in the other

subgraph. Where the subgraphs are rivers, and the minimum distance between the

two lies between nodes, a new node was added. Figure 9 illustrates why it is

necessary to do this in order to avoid creating unlikely links between two candidate

subgraphs.

Each new link was labelled with a confidence value, inversely proportional to the

linking distance. In essence, the shorter the distance, the greater the likelihood of it

being a correctly proposed link. The confidence value of a link was computed as

follows:

Figure 8. The graph (comprising dark nodes and straight edges) overlain the polygonallakes and river centre line.


Confidence~100�Search radius mð ÞLength of link mð Þ

where ‘length of link’ is the length of the proposed link between the two subgraphs,

and the ‘search radius’ is the maximum length of a potential link. By empirical

observation, an effective value for the search radius was found to be 100 m;

reflecting the idea that structures crossing rivers are rarely wider than 100 m. It is

invariably the case that it is only in urban areas that we might find instances where

streams are channelled underground for distances greater than 100 m. As discussed

later, a number of challenges present themselves when interpreting connections in

urban environments. Figure 10 shows an example of the candidate links emanating

from a sample node. The higher the value, the greater the confidence.

The complexity of river networks meant that distance alone was not a sufficient

basis for guaranteeing a correct or realistic link between two subgraphs. Since rivers

flow downhill, local elevation data can be used to refine this process; it is also the

case that rivers tend to curve gently, therefore the angle of curvature can also be

used. These two ideas are now described in more detail.

Figure 9. Illustration of the need to connect rivers using shortest distance intersection ratherthan node to node connections.

Figure 10. Example of candidate links. The likelihood of correctness value (as a percentage)has been placed alongside each potential link.


6.3 Utilising the talweg to select likely candidate links

A candidate link is likely to follow a descending line of slope in the valley containing

the two relevant river subgraphs. Such information, derived from a digital terrain

model (DTM) can thus be used to inform on the likelihood of two subgraphs being

connected. Various authors have highlighted the potential for using DTMs in

studies of terrain characterisation, and delineation of drainage networks and

watershed boundaries (Haggett and Chorley 1968, Jenson 1991, Lee and Snyder

1991, Hogg et al. 1993, McCormack et al. 1993, Martinez-Casasnovas and Stuvier

1998, Wood 1998). DTMs have been used as a basis for modelling surface water

hydrology, deriving flood characteristics, channel gradients, and predicting stream

discharge. The quality of the solution is dependent upon the precision of the DTM

and morphological complexity of the terrain. We used the term ‘talweg’ to describe

the set of points defining the lowest points along the valley floor. In essence, if a link

was found to follow the line of the talweg, among a set of candidate links, it is more

likely to be the correct one. The talweg was calculated using a flow accumulation

model derived using OS’s Profile DTM data set—detailed height data defining the

physical shape of the landscape of Great Britain. The source comes from contours

surveyed at 1:10 000 scale and the height accuracy of the DTM is ¡5.0 m in

mountain and moorland area, and ¡2.5 m in other areas. The DTM was first used

to calculate a flow direction for each cell. Each cell had a flow accumulation value—

an integer number that represented the number of upstream DTM cells whose flow

paths ‘passed through’ that given DTM cell (Mark 1984, McCormack et al. 1993).

Once the talweg network had been created, it was split into branches (where a

branch was defined as a polyline with no junction). For each branch, the algorithm

identified all sections of the river network that were in close proximity to that talweg

branch. This was done by buffering the talweg, and finding all river sections that fell

within the buffer. From these sections we identified the subgraphs that needed to be

connected in order to preserve the match between the river network and the talweg

network.

Using the test data set, it was determined from empirical observations that a

buffer of radius 100 m was sufficient to ‘capture’ the river subgraphs. Figure 11

shows a sample of the hydrographic network from the trial data set, in which the

talweg network is superimposed (the angular thicker line) together with the

associated buffer. The centreline of the underlying rivers is also presented,

illustrating how the talweg was used to ‘capture’ those graphs that crossed the

buffer zone.

For each talweg branch in turn, we examined the subgraphs falling within the

buffer, identifying all the potential links that connect two of these subgraphs, and

assigned them a ‘talweg confidence value’. The talweg confidence value expressed

the degree of fit between the connection provided by the link and that of the talweg

network. It was calculated as the average distance between the proposed link and the

talweg line (as illustrated in the two separate examples shown in figure 12). The

lower the average, the better the fit. Thus for the two different examples in figure 12,

a higher confidence would be attached to (a) than (b).

The confidence value associated with each candidate link which lay within the

talweg buffer was related to the ratio between the average distance from the link to

the talweg and the radius of 100 m. Thus the confidence value varied from 0 for a

link which is outside the talweg network influence, to 1 for a link which was

precisely coincident with a section of talweg. The following formula was used to


compute the talweg confidence value.

Talweg Confidence~100� 1{ Average distance=radiusð Þð Þ

6.4 The degree of continuity

Continuity relates to one of the principal criteria (often referred as ‘good

continuation’) that influences the human perception of groups of objects

(Thorisson 1994). It is especially well suited to the analysis of linear features, and

is something that cartographers take advantage of, for example in allowing text

placements to ‘break’ across linear features. It has proved very valuable to the

process of street prioritisation as part of the process of street network generalisation

(Thomson and Richardson 1999). For this research we computed a continuity value

for each candidate link depending on the degree to which a link ran parallel to the

sections of graph either side of the link. In effect, this value was the combination of

two continuity components, one at each end of the link. Figure 13 illustrates four

different cases, where the continuity value associated with the candidate link (thin

line) decreases from left to right.

Figure 11. The talweg is overlaid and buffered as a basis for ‘capturing’ the underlyingsubgraphs.

Figure 12. Examples illustrating the calculation of the goodness of fit between the talwegand river centreline in order to calculate the confidence of them being one and the same.


The need for this additional criteria arose from observations that in several places

candidate links were proposed by the system, whereas visually we would have

chosen a different one. In figure 14 we show one such example, where in the absence

of taking into account the continuity, the system created a less likely link; the correct

answer is shown dashed.

As this example shows, the more likely solution is driven by the continuity of the

lines either end of the proposed link—in which we minimise the deviation of the

angle between the link and the subgraph river. The continuity value was taken as the

average of the continuity values associated with each end of the link. At each end of

the link, the value related to the difference of orientation between the link and the

local orientation of the river at the node where the link ends. When the end of a link

does not meet a river ‘end on’ (for example as in figure 9), then the weight on this

side of the link is set to 0. This is done to give preference to links that connect to

rivers end on. The issue then becomes how the angle between the link and the end

section of the river subgraph is calculated. A simple solution would be to measure

the angle between the link and the last edge of the polyline defining the centreline of

the river. This was found to be not always reliable since the skeletonisation process

sometimes created an edge that did not reflect the perceived centreline of the river.

For this reason a second angle was recorded, between the link and the edge

associated with a distance 5 m from the end of the centreline. By comparing the two

angles we can determine if the angle reflects the approximate centreline or not. We

compute the angular deviation between the link orientation and the two local

orientations of the river, and keep only the smaller of the two deviations.

Figure 13. Qualitative views on the likelihood of a connection based on shared orientationbetween the subgraphs and the link.

Figure 14. An incorrect solution (dark line) illustrating the need for some sort of continuitymeasure (correct answer shown dashed).


Figure 15 illustrates how the difference of orientation were computed for one end

of the link, I (shown as the thin straight edge). The first angle, a?

is the angle between

the link and the last edge of the centreline. The second angle to be computed was the

angle between the link and the edge IA, where A is situated at the curvilinear

distance of 5 m from I along the river line. This resulted in the angle difference b.

The chosen value would be a since it is less than b.

From the difference of the angle between the link and the river polyline, we

compute a continuity value that lies between 0 and 100, using the formula:

continuityI~100�p=2ð Þ{a

p=2

� �

Where a>(p/2), the continuity value will be negative, so for the purposes of this task,

in such cases the value was set to 0. The final continuity value for the link is the

average of the continuity values at its two ends.

6.5 Selecting the candidates to reconnect the subgraphs

The selection of the link among candidate links was done in two stages. The first

stage was to consider the talweg, and to connect the subgraphs using only those

links that lay within the talweg buffer (i.e. those having a talweg confidence value

greater than 0). If this did not create a link, candidate links outside the talweg buffer

were considered. During both stages, links were built up using a minimum spanning

tree construction method (Kruskal 1956, Prim 1957, Dijkstra 1960). The principle

was to start from a node (representing an entire subgraph), and select the edge

having the highest confidence connected to that node (the edge being a candidate

link). The two nodes connected by the chosen edge formed a cluster, which was

further expanded by selecting an edge not currently part of that cluster and with the

next highest confidence value. This process was repeated until all the nodes were

part of the same cluster. The value used to select the candidate links was computed

using the following formula:

If the talweg_confidence is greater than 0 (i.e. the link lies within the talweg buffer),

then

Link_confidence5(talweg_confidence + continuity_confidence + 2*distance_confidence)/4

Else (i.e. the talweg is not able to help in the calculation of the confidence value)

Link_confidence5(continuity_confidence + 3*distance_confidence)/4

The weightings (multipliers of 2 and 3) are there to give greater importance to

short links (the shorter the link, the greater the likelihood that the candidate link

was correct) than to the fact that the link and the river share a similar angle of

continuity. The continuity value was used to differentiate between two links having

Figure 15. Calculations of relative orientation and their tolerances.


similar distance_confidence (we would not want its influence to be greater than the

talweg confidence).

7. Results and analysis

Figure 16 shows a sample of the results obtained using the test data set (around the

small village of Belmont, midway between Manchester and Blackburn, England).

The links generated by this system (together with their confidence values) can be

seen as thin straight lines. Some links seem visually correct, others seem circumspect,

while some appear to be missing (perhaps because of the size of the gap or relative

continuity). A low confidence value reflects high uncertainty, and vice versa.

This output was assessed in a number of ways; by viewing MasterMap data, by

inspection of aerial photography and by ground survey. It was possible to infer the

correctness of results in rural areas using the aerial photography (although, of

course, the photography itself captures features overlying the network, not the

network alone). To that extent, aerial photography was only marginally better than

making inference from inspection of the MasterMap data. In urban areas, neither

source was a sufficient yardstick, and ground-truthing was undertaken via field-

based inspections. Even field-based inspection could not verify every instance. In the

case of long-distance urban conduits, it was determined that specialist skills and

equipment would be needed to determine the exact hydrological connectivity and

flow. Each link was marked as either being correct, or of type I (the link should have

been made but was not), or Type II (a link was made where no real world

Figure 16. Sample data showing straight-line connections (labelled with their associatedlikelihood value) between the river subgraphs. Latitude and longitude for a bounding box,bottom left and top right: N: 53u379540, W: 2u30970 and N: 53u389260, W: 2u289570.


connection exists). Overall 66% of the inferred links were found to be correct.

Figure 17 summarises the distribution of correct and incorrect links according to the

assigned confidence value. The y axis is the percentage of links, and the x axis is the

range of confidence values. From the graph it can be seen that all those links with a

calculated confidence value of 80% or greater were indeed found to be correct. All

those with a confidence value of 50% or less were found to be incorrect. Links with

confidence values between 50% and 80% were mixed in their correctness. Therefore,

from a human–computer interaction point of view, it would be appropriate to

highlight and field check all links with confidence values of between 50% and 80%.

Thus the confidence value is a simple mechanism for guiding users only to those

links likely to be wrong, rather than leaving the user to guess which links should be

inspected.

Although the test has been conducted on a single data set covering approximately

65 km2, it is envisaged that the same kind of results would be obtained on different

data sets. The only restriction is on urban data sets, where large portions of the

hydrographic network run underground. Experimenting on different data sets (such

as the Quantock Hills or the Somerset levels) would allow us to tune the model

according to regional morphologies, and urban/rural regions. The model is only ever

intended to connect surface networks and would not be effective in limestone

regions, for example. Just as others have noted (McCormack et al. 1993), urban

areas remain problematic. The algorithm provides a good solution in rural areas,

allowing human resources to be focused on the urban.

7.1 Further work

The process of triangulation (as part of the skeletonisation process) was found to be

extremely computationally intensive. Switching to a proper constrained triangula-

tion (Chew 1989) instead of simulating it by resampling the initial geometries would

dramatically improve the performance and robustness of this component of the

solution. Additionally, the confidence value could be further refined by examining

Figure 17. Histogram showing the percentage frequency of correct and incorrect links andtheir associated final probability intervals.


those geographic features associated with the link. For example, information of a

bridge or road located close to the link could be used to reinforce the likelihood of

the link being correct. Conversely if the link crossed open land, then the link

probably did not correspond to a ‘hidden’ water feature, because there was nothing

to hide it. This approach might help in karst scenery, where rivers that ‘disappear’

and re-emerge reflect a reality rather than indicate an obscuring built structure. We

have not carried out research using this refinement, because in the version of

MasterMap used in this project, the classification of such objects was not sufficiently

explicit to allow this line of enquiry.

It was intended that a generic set of parameters would be devised, such that the

algorithm could be applied to any landscape. But there is enormous variability in

geology (limestone, granite, etc.), in landscape (complex mountain chains, karst

scenery, flat deltaic structures, etc.), and in modification of river networks from

human activities (canalisation, river straightening, creation of drainage networks in

low-lying areas). Furthermore, the size of hydrological features varies enormously

from narrow headstreams to large meandering estuarine structures. Consequently, it

was necessary to paramaterise the algorithm by region type. Further work is

therefore required to define a minimum set of landscape types (and associated

parameter settings) that would secure the greatest percentage of correct solutions.

The algorithm assumed that the river network was acyclic. However, anthro-

pogenic drainage structures (such as are found in the Fens of Norfolk) may indeed

be cyclic. In these cases the algorithm would stop connecting rivers once an acyclic

graph was formed—thus sometimes falling short of a complete solution. Any

solution is further complicated by the flatness of such regions. These would be cases

that would require much greater ground-truthing.

8. Conclusion

This research sought to develop a geographic model of the river network and

associated water features within an OS MasterMap data set. The input data set

reflected a cartographic perspective—whereby water features were a mix of polygon

and linear geometries, ‘broken’ by other features such as roads that crossed them.

Although OS MasterMap supports attribution of different water features, in reality

the input data set was attributed in a very simple manner (labelled simply ‘inland

water’), and was made up of unordered, miscellaneous, disjoint collections of

polygons and linear features. This required a lot of data cleaning, and development

of automated techniques to both detect and form meaningful geographical entities

(rivers and lakes). The methodology relied on aggregation of adjacent area features

and their subsequent reclassification into two classes, rivers and lakes, based upon

shape analysis. A centreline representation of rivers was an essential pre-process

stage prior to connecting the river sections. The process of reconnection was based

on three criteria: degree of continuity, the proximity, and the fit with the line of

slope (based on a DTM). The region to which the algorithm was applied to was

chosen deliberately because of the variability in topography, link lengths, and mix of

rural and urban structures. The Research and Innovation group within Ordnance

Survey is exploring the idea of including a topologically connected hydrology

network similar in detail and resolution to the ITN Transportation theme. The

results presented here indicate that it is feasible to partially automate the process by

which a large-scale, detailed, integrated river network could be included to give

added value to OS MasterMap. This would be of specific interest to environmental


engineers, and those interested in the analysis of various aspects of river networks,

including their visualisation and generalisation.

Acknowledgements

The authors are very grateful for funding of this research by the Ordnance Survey,

and the constructive and helpful comments of the reviewers.

ReferencesBALLARD, D. and BROWN, C., 1982, Computer Vision, Chapter 8 (Englewood Cliffs, NJ:

Prentice-Hall).

CHEW, L.P., 1989, Constrained Delaunay triangulation. Algorithmica, 4, pp. 97–108.

COSTA, L. DA F., 2000, Robust skeletonization through exact Euclidean distance transform

and its application to neuromorphometry. Journal of Real-Time Imaging, 6, pp.

415–431.

DAVIES, E., 1990, Machine Vision: Theory, Algorithms and Practicalities, pp. 149–161

(London: Academic Press).

DIJKSTRA, E.W., 1960, Some theorems on spanning subtrees of a graph. Indag. Math, 22, pp.

196–199.

HAGGETT, P. and CHORLEY, R.J., 1968, Network Analysis in Geography (New York: St

Martins Press).

HARTSFIELD, N. and RINGEL, G., 1990, Pearls in Graph Theory—A Comprehensive

Introduction (Boston: Academic Press Inc.).

HOGG, J., MCCORMACK, J.E., ROBERTS, S.A., GAHEGAN, M.N. and HOYLE, B.S., 1993,

Automated derivation of stream channel networks and related catchment character-

istics from digital elevation models. In Geographical Information Handling—Research

and Applications, edited by P. Mather, pp. 207–235 (London: Wiley).

HORTON, R.E., 1945, Erosional development of stream and their drainage basins;

hydrophysical approach to quantitative morphology. Geological Society of America

Bulletin, 56, pp. 275–370.

JENSON, S.K., 1991, Applications of hydrologic information automatically extracted from

digital elevation models. Hydrological Processes, 5, pp. 31–41.

KRUSKAL, J.B., 1956, On the shortest spanning subtree of a graph and the traveling salesman

problem. Proceedings of the American Mathematical Society, 7, pp. 48–50.

LAKE, R., BURGGRAF, D.S., TRININIC, M. and RAE, L., 2004, GML—Geography Mark-Up

Language (Chichester: Wiley).

LEE, J. and SNYDER, P.K., 1991, Modelling spatial patterns of digital elevation errors for

drainage network analysis. In Proceedings of GIS/LIS 991, 28 October–1 November

1991, 1, pp. 71–79 (Maryland, USA: American Society for Photogrammetry and

Remote Sensing).

MACkANESS, W.A. and BEARD, M.K., 1993, Use of graph theory to support map

generalization. Cartography and Geographic Information Systems, 20, pp. 210–221.

MARK, D., 1984, Automated detection of drainage networks from digital elevation models.

Cartographica, 21, pp. 168–178.

MARTINEZ-CASASNOVAS, J.A. and STUVIER, H.J., 1998, Automated delineation of drainage

networks and elementary catchments from digital elevation models. ITC Journal, 3,

pp. 198–208.

MCALLISTER, M. and SNOEYINK, J., 2000, Medial axis generalisation of river networks.

Cartography and Geographic Information Science, 27, pp. 129–138.

MCCORMACK, J.E., GAHEGAN, M.N., ROBERTS, S.A., HOGG, J. and HOYLE, B.S., 1993,

Feature-based derivation of drainage networks. International Journal of Geographical

Information Systems, 7, pp. 263–280.

OS 2004 OS MasterMap Userguide Product Specification v5.2. Available at: www.ordnance-

survey.co.uk/oswebsite/products/osmastermap/pdf/userguidepart1.pdf


PRIM, R., 1957, Shortest connection networks and some generalizations. Bell System

Technical Journal, 36, pp. 1389–1401.

STRAHLER, A.N., 1964, Quantitative geomorphology of drainage basins and channel

networks. In Handbook of Applied Hydrology, edited by V.T. Chow, pp. 439–476

(New York: McGraw-Hill).

THOMSON, R.C. and RICHARDSON, D.E., 1999, The ‘good continuation’ principle of

perceptual organisation applied to the generalisation of road networks. In

Proceedings of the 19th International Cartographic Conference, 14–21 August 1999,

Ottawa, pp. 1215–1223 (British Columbia: University of Victoria).

THORISSON, K.R., 1994, Simulated Perceptual Grouping: An application to the Human

Computer Interaction. Proceedings of the 16th Annual Conference of the Cognitive

Science Society. 13–16 August 1994, Atlanta, Georgia, edited by A. Ram and K.

Eiselt, pp. 876–81 (Mahwah, NJ: Lawrence Erlbaum Associates).

TSAI, V., 1993, Fast topological construction of Delaunay triangulation and Voronoı

diagrams. Computers and Geosciences, 19, pp. 1463–1474.

UNWIN, D., 1981, Introductory Spatial Analysis (London: Metheuen).

WOLF, G.W., 1992, Hydrologic applications of weighted surface networks. In Proceedings of

the 5th International Symposium on Spatial Data Handling, edited by E. Corwin, D.

Cowen and P. Bresnahan, 23–25 August 1992, pp. 567–579 (Charleston: IGU

Commission of GIS).

WOOD, J., 1998, Modelling the continuity of surface form using digital elevation models. In

Proceedings of the 8th International Symposium on Spatial Data Handling, edited by T.

Poiker and N. Chrisman, 11–15 July 1998, pp. 725–736 (New York: IGU Commission

of GIS).


Research Article Creating a hydrographic network from its ... · order to support the process of creating a connected river network. Sections4, 5 and 6 detail the three main parts

Documents