5. Data Model and Data Strutures

8/19/2019 5. Data Model and Data Strutures

1/24

5. Data Model And Data Strutures

Introduction

Since DBMS used to store one-dimensional data, like integer or real numbers and strings, considerable interest has beendeveloped in using DBMS to store spatial data as well. It has been observed that that ordinary DBMS do not handle spatial datasuch as boxes, polygons, or even points in a multidimensional space efficiently.

Spatial and non-spatial data

Spatial data refers to the data or information that describes the absolute or relative location of geographic features on theearth. The non-spatial data or the attribute data on the other hand describes the characteristics of the spatial features. Thesecharacteristics can be quantitative or qualitative.

Representation of Space

Burrough & McDonnell (1998) described two ways to represent the space(an area, landscape or some bigger unit), which are asfollows:

a. Discrete Entities: The space could be seen as occupied with entities that are described by their properties and can be locatedon earth using coordinate systems. The entities have a clear boundary.Buildings, roads, land parcels etc. are the example ofdiscrete entities.

b. Continuous fields: The variation of an attribute over the space as a continuous field. No physical boundary can ever beobserved in such case. Temperature, pressure, elevation etc. across an area are the examples of continuous fields

GIS Data Models

Data models are conceptual models of the real world. These describe us the representation and storage of the geographic

data. The data models used in GIS are described below:


2/24

a. Vector Data Model

The vector data model is closely linked with the discrete object view. In vector data model, geographical phenomena are

represented in three different forms;-point, line and polygon. The shape of a spatial entity is stored using two-dimensional (x,y) coordinate system.

Point : A location depicted by a single set of (x, y) coordinates at the scale of abstraction.

The wells in a village, electricity poles in a town and cities in the world map are the examples of spatial features described by

points.

Note: A city can be marked as a single point on a world map but would be marked as a polygon on a state map. The scale plays an important role in deciding the geometry of a geographical feature.

Line/Arc : Ordered sets of (x, y) coordinate pairs arranged to form a linear feature. The curves in a linear feature aregenerated by increasing the density of points/vertices.

The roads, rails and telephone cables are the examples of the spatial features described by lines.

Polygon : The set of (x, y) coordinate pairs enclosing a homogeneous area.

The land parcels, agricultural farms and water bodies are the examples of the spatial features described by polygons.


3/24

b. Raster Data Model

The raster data model is commonly associated with the field conceptual model. Here, geographic space is represented by array

of cells or pixels (aka picture elements) which are arranged in rows and columns. Each pixel has a value that representsinformation. The value can be in the form of integer, floating points or alphanumeric.

A point can be represented by a single pixel in raster model. A line is a chain of spatially connected cells with the same value.Similarly, a water body in raster data is represented as a set of contiguous pixels having same value that represents a

homogeneous area.


4/24

Vector data structure

Geographic entities encoded using the vector data model, are often called features. The features can be divided into twoclasses:

a. Simple features These are easy to create, store and are rendered on screen very quickly. They lack connectivityrelationships and so are inefficient for modeling phenomena conceptualized as fields.

b. Topological features A topology is a mathematical procedure that describes how features are spatially related andensures data quality of the spatial relationships. Topological relationships include following three basic elements:

I. Connectivity: Information about linkages among spatial objects

II. Contiguity: Information about neighboring spatial object

III. Containment: Information about inclusion of one spatial object within another spatial object

Connectivity

Arc node topology defines connectivity - arcs are connected to each other if they share a common node. This is the basis formany network tracing and path finding operations.

Arcs represent linear features and the borders of area features. Every arc has a from-node which is the first vertex in the arcand a to-node which is the last vertex. These two nodes define the direction of the arc. Nodes indicate the endpoints andintersections of arcs. They do not exist independently and therefore cannot be added or deleted except by adding anddeleting arcs.


5/24

Figure 3: Arc-node Topology

Nodes can, however, be used to represent point features which connect segments of a linear feature (e.g., intersectionsconnecting street segments, valves connecting pipe segments).

Figure 4: Node showing intersection

Arc-node topology is supported through an arc-node list. For each arc in the list there is a from node and a to node.Connected arcs are determined by common node numbers.


6/24

Figure 5: Arc-Node Topology with list

Contiguity

Polygon topology defines contiguity. The polygons are said to be contiguous if they share a common arc. Contiguity allows

the vector data model to determine adjacency.


7/24

Figure 6:Polygon Topology

The from node and to node of an arc indicate its direction, and it helps determining the polygons on its left and right side.

Left-right topology refers to the polygons on the left and right sides of an arc. In the illustration above, polygon B is on the

left and polygon C is on the right of the arc 4.

Polygon A is outside the boundary of the area covered by polygons B, C and D. It is called the external or universe polygon,

and represents the world outside the study area. The universe polygon ensures that each arc always has a left and right side

defined.

Containment

Geographic features cover distinguishable area on the surface of the earth. An area is represented by one or more boundaries

defining a polygon. The polygons can be simple or they can be complex with a hole or island in the middle. In the illustration

given below assume a lake with an island in the middle. The lake actually has two boundaries, one which defines its outer

edge and the other (island) which defines its inner edge. An island defines the inner boundary of a polygon. The polygon D is


8/24

made up of arc 5, 6 and 7. The 0 before the 7 indicates that the arc 7 creates an island in the polygon.

Figure 7: Polygon arc topolgy

Polygons are represented as an ordered list of arcs and not in terms of X, Y coordinates. This is calledPolygon-Arc topology .Since arcs define the boundary of polygon, arc coordinates are stored only once, thereby reducing the amount of data and

ensuring no overlap of boundaries of the adjacent polygons.

Simple Features

Point entities : These represent all geographical entities that are positioned by a single XY coordinate pair. Along with theXY coordinates the point must store other information such as what does the point represent etc.

Line entities : Linear features made by tracing two or more XY coordinate pair.

Simple line: It requires a start and an end point. Arc: A set of XY coordinate pairs describing a continuous complex line. The shorter the line segment and the higher the

number of coordinate pairs, the closer the chain approximates a complex curve.

Simple Polygons : Enclosed structures formed by joining set of XY coordinate pairs. The structure is simple but it carries fewdisadvantages which are mentioned below:


9/24

Lines between adjacent polygons must be digitized and stored twice, improper digitization give rise to slivers and gaps Convey no information about neighbor Creating islands is not possible

Topologic Features

Networks : A network is a topologic feature model which is defined as a line graph composed of links representing linear

channels of flow and nodes representing their connections. The topologic relationship between the features is maintained in aconnectivity table. By consulting connectivity table, it is possible to trace the information flowing in the network

Polygons with explicit topological structures : Introducing explicit topological relationships takes care of islands as wellas neighbors. The topological structures are built either by creating topological links during data input or using software. DualIndependent Map Encoding (DIME) system of US Bureau of the Census is one of the first attempts to create topology ingeographic data.

Figure 8: Polygon as a topological feature

Polygons are formed using the lines and their nodes.


10/24

Once formed, polygons are individually identified by a unique identification number. The topological information among the polygons is computed of the polygons to the left and right of the line) stored

and stored using the adjacency information (the nodes of a line, and identifiers with the lines.

Fully topological polygon network structure

A fully topological polygon network structure is built using boundary chains that are digitized in any direction. It takes care ofislands and lakes and allows automatic checks for improper polygons. Neighborhood searches are fully supported. These

structures are edited by moving the coordinates of individual points and nodes, by changing polygon attributes and by cuttingout or adding sections of lines or whole polygons. Changing coordinates require no modification to the topology but cuttingout or adding lines and polygons requires recalculation of topology and rebuilding the database.

Triangular Irregular Network (TIN)

TIN represents surface as contiguous non-overlapping triangles created by performing Delaunay triangulation. These triangles

have a unique property that the circumcircle that passes through the vertices of a triangle contains no other point inside it.

TIN is created from a set of mass points with x, y and z coordinate values. This topologic data structure manages informationabout the nodes that form each triangle and the neighbors of each triangle.


11/24

Figure 9 : Delaunay Triangulation

Advantages of Delaunay triangulation

The triangles are as equiangular as possible, thus reducing potential numerical precision problems created by longskinny triangles

The triangulation is independent of the order the points are processed

Ensures that any point on the surface is as close as possible to a node


12/24

Because points can be placed irregularly over a surface a TIN can have higher resolution in areas where surface is highlyvariable. The model incorporates original sample points providing a check on the accuracy of the model. The informationrelated to TIN is stored in a file or a database table. Calculation of elevation, slope, and aspect is easy with TIN but these areless widely available than raster surface models and more time consuming in term of construction and processing.


13/24

The TIN model is a vector data model which is stored using the relational attribute tables. A TIN dataset contains three basicattribute tables: Arc attribute table that contains length, from node and to node of all the edges of all the triangles.

Node attribute table that contains x, y coordinates and z (elevation) of the vertices Polygon attribute table that contains the areas of the triangles, the identification number of the edges and the

identifier of the adjacent polygons.

Storing data in this manner eliminated redundancy as all the vertices and edges are stored only once even if they are used formore than one triangle. As TIN stores topological relationships, the datasets can be applied to vector based geoprocessingsuch as automatic contouring, 3D landscape visualization, volumetric design, surface characterization etc.


14/24

Did You Know?

MasterMap

The UK Ordnance Survey MasterMap is a framework for the referencing of geographic information in GreatBritain. It comprises four layers that provide detailed topographic, address, aerial imagery and road network

features positioned on the National Grid.

The MasterMap has following main features:

Data layers provide a seamless topographic database for the UK at the scales of 1:1250 and 1:2500

Real world features are represented by points, lines and polygons each with their own unique reference

called TOID The data can be supplied in a topologically structured format.

There are over 430 million features in the MasterMap database and around 5000 updates are made every day.The data have been used successfully in a range of projects.


15/24

Raster Data Structure

In a simple raster data structure the geographical entities are stored in a matrix of rectangular cells. A code isgiven to each cell which informs users which entity is present in which cell. The simplest way of encoding a rasterdata into computers can be understood as follows:

(a) Entity model: It represents the whole raster data. Let us assume that the raster data belongs to an

area where land is surrounded by water. Here a particular entity (land) is shown in green color andthe area where land is not present is shown by white.

(b) Pixel values: The pixel value for the full image is shown. Cells having a part of the land are

encoded as 1 and others where land is not present are encoded as 0.

(c) File structure: It demonstrates the method of coding raster data. The first row of the file structure

data tells that there are 5 rows and 5 columns in the image, and 1 is the maximum pixel value. The

subsequent rows have cells with value as either 0 or 1 (similar to pixel values).

The huge size of the data is a major problem with raster data. An image consisting of twenty different land-useclasses takes the same storage space as a similar raster map showing the location of a single forest. To addressthis problem many data compaction methods have been developed which are discussed below:


16/24

Run length encoding

Reduction of data on a row by row basis

Stores a single value for a group of cells rather than storing values for individual cells

First line represents the dimension of the matrix (5×5) and the number of entities (1) present. In second

and subsequent lines, the first number in the pair represents absence (0) or presence (1) of the entity andthe second number indicates the number of cells referenced.

Block encoding

Data is stored in blocks in the raster matrix.

The entity is subdivided into hierarchical blocks and the blocks are located using coordinates.

The first cell at top left hand is used as the origin for locating the blocks


17/24

Chain encoding Works by defining boundary of the entity i.e. sequence of cells starting from and returning to the given

origin

Direction of travel is specified using numbers. (0 = North, 1 = East, 2 = South, 3 = West) The first line tells that the coding started at cell (4, 2) and there is only one chain. In the second line the

first number in the pair tells the direction and the second number represents the number of cells lying inthis direction.


18/24

Quadtree

A raster is divided into a hierarchy of quadrants that are subdivided based on similar value pixels. The division of the raster stops when a quadrant is made entirely from cells of the same value. A quadrant that cannot be subdivided is called a leaf node.

A satellite or remote sensing image is a raster data where each cell has some value and together these valuescreate a layer. A raster may have a single layer or multiple layers. In a multi-layer/ multi-band raster each layer iscongruent with all other layers, have identical numbers of rows and columns, and have same locations in theplane. Digital elevation model (DEM) is an example of a single-band raster dataset each cell of which contains onlyone value representing surface elevation.

A single layer raster data can be represented using

a. Two colors (binary): The raster is represented as binary image with cell values as either 0 or 1 appearing black

and white respectively


19/24

Grayscale: Typical remote sensing images are recorded in an 8 bit digital system. A grayscale image is thusrepresented in 256 shades of gray which range from 0 (black) to 255 (white). However a human eye can’t makedistinction between the 255 different shades. It can only interpret 8 to 16 shades of gray.

A satellite image can have multiple bands, i.e. the scene/details are captured at different wavelengths (Ultraviolet-visible- infrared portions) of the electromagnetic spectrum. While creating a map we can choose to display a singleband of data or form a color composite using multiple bands. A combination of any three of the available bands

can be used to create RGB composites. These composites present a greater amount of information as compared tothat provided by a single band raster.


20/24

Comparison between Vector and Raster Data Models

Data Model Advantages Disadvantages

Raster

Simple data structure Cell size determines the resolution at which the data is

represented

Compatible with remote sensing orscanned data

Requires a lot of storage space

Spatial analysis is easier Projection transformations are time consuming

Simulation is easy because each unit hasthe same size and shape

Network linkages are difficult to establish

Vector

Data is represented at its originalresolution and form without

generalization

The location of each vertex is to be stored explicitly

Require less storage space Overlay based on criteria is difficult

Editing is faster and convenient Spatial analysis is cumbersome Network analysis is fast Simulation is difficult because each unit has a different

topological form

Projection transformations are easier


21/24

Geodatabase

The term ‘Geodatabase’ was introduced by Environmental Systems Research Institute, Inc. (ESRI) and is defined as a collection

of geographic datasets of various types that are held in a common file system folder such as MS Access database, Oracle, SQLserver, DB2 etc. The geodatabase is built on extended relational database. In this model, entities are represented as objects

with properties, behavior, and relationships.

Geodatabase supports various elements of GIS such as attribute data, CAD data, geographic features, satellite and aerialimages, GPS data and survey measurements. These types of data can be represented as data objects viz. annotation,

dimension, feature class, geometric network, raster dataset, tables, topology, relationship class etc. Geodatabase design isbased on a fundamental step of GIS design which involves organizing geographic information into a series of data themes thenspecifying the content and representation of the thematic layers. Advance capabilities (network, topology, subtypes etc.) areadded later to the geodatabase to model GIS behavior and maintain data integrity. Other key properties of geodatabase designinclude definition of coordinate properties and spatial properties, tolerances, coordinate resolution and metadata documentationfor each dataset.

Metadata

Metadata is structured information that describes and makes it easier to retrieve, use, or manage an information resource. It isalso known as data about data.

Need of metadata

To enable the process of search over distributed archives: Similar to a library catalog, it sorts data and makes it easy for

a user to find it.


22/24

Helps assessing the fitness of a dataset for a given use: Metadata is needed to determine whether a dataset will satisfya user’s requirement. Does the data have acceptable quality? It may also have comments from previous users.

Provides information about data content: In the case of remotely sensed images, it may include the percentage of cloudobscuring the scene and some other information.

Provide information about handling the dataset: It includes technical specification of the data format, software

compatible with the data, data volume etc.

Geospatial metadata commonly keep records of Geographic Information System (GIS) files, geospatial databases, and earthimagery. It can also be used to document data catalogs, mapping applications, data models and related websites. Metadata hasthe information on library catalog elements such as title, abstract, and publication; geographic elements such as geographicextent and projection information; and database elements such as attributes and their values.

The most widely used standard for metadata is the US Federal Geographic Data Committee’s Content Standards for DigitalGeospatial Metadata (CSDGM). CSDGM describes the items that should be present in a metadata archive but doesn’t prescribe

the format to present it. Developers implement the standards that suit their own ways but make sure that the implementationsare interoperable i.e. can be understood by other.


23/24

Figure 10: Screenshot of metadata of a shapefile displayed in ArcCatalog


24/24

Temporal Dimension in GIS

Spatial features may change over time in terms of space and the content. The changes could be geometrical (change ingeometry of features), positional (change in position of features), or a change in attributes of the features. When changes inlocations of a group of objects are observed together, the changes in the spatial distribution pattern of the objects can be

deciphered.

One may analyze the temporal data sets to monitor the changes that are happening over the time. Though with time, a lot ofthings undergo changes but monitoring the changes must be done prudently as it involves huge investment of resources. The

monitoring intervals must be fixed in a manner that captures the change in the spatial phenomena and at the same time itmust remain efficient and viable.

The effect of urbanization on the land use of an area can be monitored by a change detection analysis that makes use oftemporal satellite images and GIS to determine the nature, extent and rate of land cover change and fragmentation over timeand space. Temporal GIS studies are quite popular in the field of forest conservation and management. One of the studies

described the monitoring of deforestation in a land resource inventory project in Nepal where within an interval of 30 years

(1950-1980) 50% of the forest land was lost to shrub and agriculture. Similar, temporal studies are carried out for varioussectors of natural resources management such as biodiversity, water; land/soil etc. where considering the future needs, makinga balance between consumption and availability of the natural resources is of utmost importance.

References

http://www.ian-ko.com/resources/triangulated_irregular_network.htm viewed on 19 November 2011

Burrough, P. A & McDonnell, R. A. 1998, Principles of geographical information systems, Oxford University Press, UK.

Goodchild, M.F., Longley, P.A., Maguire, D. J. & Rhind, D.W 2001, Geographic information systems and science, John Wiley &Sons Ltd. , England.

Lo, C.P. & Yeung, A. 2009, Concepts and techniques of Geographic Information Systems, PHI Learning Private Limited, NewDelhi.

5. Data Model and Data Strutures

Documents