Top Banner

of 10

5. Data Model and Data Strutures

Jul 08, 2018

Download

Documents

Sujan Singh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/19/2019 5. Data Model and Data Strutures

    1/24

    5. Data Model And Data Strutures 

    Introduction

    Since DBMS used to store one-dimensional data, like integer or real numbers and strings, considerable interest has beendeveloped in using DBMS to store spatial data as well. It has been observed that that ordinary DBMS do not handle spatial datasuch as boxes, polygons, or even points in a multidimensional space efficiently.

    Spatial and non-spatial data 

    Spatial data refers to the data or information that describes the absolute or relative location of geographic features on theearth. The non-spatial data or the attribute data on the other hand describes the characteristics of the spatial features. Thesecharacteristics can be quantitative or qualitative.

    Representation of Space 

    Burrough & McDonnell (1998) described two ways to represent the space(an area, landscape or some bigger unit), which are asfollows:

    a. Discrete Entities: The space could be seen as occupied with entities that are described by their properties and can be locatedon earth using coordinate systems. The entities have a clear boundary.Buildings, roads, land parcels etc. are the example ofdiscrete entities.

    b. Continuous fields: The variation of an attribute over the space as a continuous field. No physical boundary can ever beobserved in such case. Temperature, pressure, elevation etc. across an area are the examples of continuous fields

    GIS Data Models

    Data models are conceptual models of the real world. These describe us the representation and storage of the geographic

    data. The data models used in GIS are described below:

  • 8/19/2019 5. Data Model and Data Strutures

    2/24

    a. Vector Data Model  

    The vector data model is closely linked with the discrete object view. In vector data model, geographical phenomena are

    represented in three different forms;-point, line and polygon. The shape of a spatial entity is stored using two-dimensional (x,y) coordinate system.

    Point : A location depicted by a single set of (x, y) coordinates at the scale of abstraction.

    The wells in a village, electricity poles in a town and cities in the world map are the examples of spatial features described by

    points.

    Note:  A city can be marked as a single point on a world map but would be marked as a polygon on a state map. The scale plays an important role in deciding the geometry of a geographical feature. 

    Line/Arc : Ordered sets of (x, y) coordinate pairs arranged to form a linear feature. The curves in a linear feature aregenerated by increasing the density of points/vertices.

    The roads, rails and telephone cables are the examples of the spatial features described by lines.

    Polygon : The set of (x, y) coordinate pairs enclosing a homogeneous area.

    The land parcels, agricultural farms and water bodies are the examples of the spatial features described by polygons.

  • 8/19/2019 5. Data Model and Data Strutures

    3/24

    b. Raster Data Model  

    The raster data model is commonly associated with the field conceptual model. Here, geographic space is represented by array

    of cells or pixels (aka picture elements) which are arranged in rows and columns. Each pixel has a value that representsinformation. The value can be in the form of integer, floating points or alphanumeric.

    A point can be represented by a single pixel in raster model. A line is a chain of spatially connected cells with the same value.Similarly, a water body in raster data is represented as a set of contiguous pixels having same value that represents a

    homogeneous area.

  • 8/19/2019 5. Data Model and Data Strutures

    4/24

     

    Vector data structure

    Geographic entities encoded using the vector data model, are often called features. The features can be divided into twoclasses:

    a. Simple features  These are easy to create, store and are rendered on screen very quickly. They lack connectivityrelationships and so are inefficient for modeling phenomena conceptualized as fields.

    b. Topological features A topology is a mathematical procedure that describes how features are spatially related andensures data quality of the spatial relationships. Topological relationships include following three basic elements:

    I. Connectivity: Information about linkages among spatial objects

    II. Contiguity: Information about neighboring spatial object

    III. Containment: Information about inclusion of one spatial object within another spatial object

    Connectivity

     Arc node topology  defines connectivity - arcs are connected to each other if they share a common node. This is the basis formany network tracing and path finding operations.

    Arcs represent linear features and the borders of area features. Every arc has a from-node which is the first vertex in the arcand a to-node which is the last vertex. These two nodes define the direction of the arc. Nodes indicate the endpoints andintersections of arcs. They do not exist independently and therefore cannot be added or deleted except by adding anddeleting arcs.

  • 8/19/2019 5. Data Model and Data Strutures

    5/24

     

    Figure 3: Arc-node Topology  

    Nodes can, however, be used to represent point features which connect segments of a linear feature (e.g., intersectionsconnecting street segments, valves connecting pipe segments).

    Figure 4: Node showing intersection 

    Arc-node topology is supported through an arc-node list. For each arc in the list there is a from node and a to node.Connected arcs are determined by common node numbers.

  • 8/19/2019 5. Data Model and Data Strutures

    6/24

     

    Figure 5: Arc-Node Topology with list  

    Contiguity

    Polygon topology  defines contiguity. The polygons are said to be contiguous if they share a common arc. Contiguity allows

    the vector data model to determine adjacency.

  • 8/19/2019 5. Data Model and Data Strutures

    7/24

     

    Figure 6:Polygon Topology  

    The from node and to node of an arc indicate its direction, and it helps determining the polygons on its left and right side.

    Left-right topology refers to the polygons on the left and right sides of an arc. In the illustration above, polygon B is on the

    left and polygon C is on the right of the arc 4.

    Polygon A is outside the boundary of the area covered by polygons B, C and D. It is called the external or universe polygon,

    and represents the world outside the study area. The universe polygon ensures that each arc always has a left and right side

    defined.

    Containment

    Geographic features cover distinguishable area on the surface of the earth. An area is represented by one or more boundaries

    defining a polygon. The polygons can be simple or they can be complex with a hole or island in the middle. In the illustration

    given below assume a lake with an island in the middle. The lake actually has two boundaries, one which defines its outer

    edge and the other (island) which defines its inner edge. An island defines the inner boundary of a polygon. The polygon D is

  • 8/19/2019 5. Data Model and Data Strutures

    8/24

    made up of arc 5, 6 and 7. The 0 before the 7 indicates that the arc 7 creates an island in the polygon.

    Figure 7: Polygon arc topolgy  

    Polygons are represented as an ordered list of arcs and not in terms of X, Y coordinates. This is calledPolygon-Arc topology .Since arcs define the boundary of polygon, arc coordinates are stored only once, thereby reducing the amount of data and

    ensuring no overlap of boundaries of the adjacent polygons.

    Simple Features

    Point entities : These represent all geographical entities that are positioned by a single XY coordinate pair. Along with theXY coordinates the point must store other information such as what does the point represent etc.

    Line entities : Linear features made by tracing two or more XY coordinate pair.

      Simple line: It requires a start and an end point.  Arc: A set of XY coordinate pairs describing a continuous complex line. The shorter the line segment and the higher the

    number of coordinate pairs, the closer the chain approximates a complex curve.

    Simple Polygons : Enclosed structures formed by joining set of XY coordinate pairs. The structure is simple but it carries fewdisadvantages which are mentioned below:

  • 8/19/2019 5. Data Model and Data Strutures

    9/24

      Lines between adjacent polygons must be digitized and stored twice, improper digitization give rise to slivers and gaps  Convey no information about neighbor  Creating islands is not possible

    Topologic Features

    Networks : A network is a topologic feature model which is defined as a line graph composed of links representing linear

    channels of flow and nodes representing their connections. The topologic relationship between the features is maintained in aconnectivity table. By consulting connectivity table, it is possible to trace the information flowing in the network

    Polygons with explicit topological structures : Introducing explicit topological relationships takes care of islands as wellas neighbors. The topological structures are built either by creating topological links during data input or using software. DualIndependent Map Encoding (DIME) system of US Bureau of the Census is one of the first attempts to create topology ingeographic data.

    Figure 8: Polygon as a topological feature 

      Polygons are formed using the lines and their nodes.

  • 8/19/2019 5. Data Model and Data Strutures

    10/24

      Once formed, polygons are individually identified by a unique identification number.  The topological information among the polygons is computed of the polygons to the left and right of the line) stored

    and stored using the adjacency information (the nodes of a line, and identifiers with the lines.

    Fully topological polygon network structure 

    A fully topological polygon network structure is built using boundary chains that are digitized in any direction. It takes care ofislands and lakes and allows automatic checks for improper polygons. Neighborhood searches are fully supported. These

    structures are edited by moving the coordinates of individual points and nodes, by changing polygon attributes and by cuttingout or adding sections of lines or whole polygons. Changing coordinates require no modification to the topology but cuttingout or adding lines and polygons requires recalculation of topology and rebuilding the database.

    Triangular Irregular Network (TIN) 

    TIN represents surface as contiguous non-overlapping triangles created by performing Delaunay triangulation. These triangles

    have a unique property that the circumcircle that passes through the vertices of a triangle contains no other point inside it.

    TIN is created from a set of mass points with x, y and z coordinate values. This topologic data structure manages informationabout the nodes that form each triangle and the neighbors of each triangle.

  • 8/19/2019 5. Data Model and Data Strutures

    11/24

     

    Figure 9 : Delaunay Triangulation 

    Advantages of Delaunay triangulation

      The triangles are as equiangular as possible, thus reducing potential numerical precision problems created by longskinny triangles

      The triangulation is independent of the order the points are processed

     

    Ensures that any point on the surface is as close as possible to a node

  • 8/19/2019 5. Data Model and Data Strutures

    12/24

     

    Because points can be placed irregularly over a surface a TIN can have higher resolution in areas where surface is highlyvariable. The model incorporates original sample points providing a check on the accuracy of the model. The informationrelated to TIN is stored in a file or a database table. Calculation of elevation, slope, and aspect is easy with TIN but these areless widely available than raster surface models and more time consuming in term of construction and processing.

  • 8/19/2019 5. Data Model and Data Strutures

    13/24

     

    The TIN model is a vector data model which is stored using the relational attribute tables. A TIN dataset contains three basicattribute tables: Arc attribute table that contains length, from node and to node of all the edges of all the triangles.

      Node attribute table that contains x, y coordinates and z (elevation) of the vertices  Polygon attribute table that contains the areas of the triangles, the identification number of the edges and the

    identifier of the adjacent polygons.

    Storing data in this manner eliminated redundancy as all the vertices and edges are stored only once even if they are used formore than one triangle. As TIN stores topological relationships, the datasets can be applied to vector based geoprocessingsuch as automatic contouring, 3D landscape visualization, volumetric design, surface characterization etc.

  • 8/19/2019 5. Data Model and Data Strutures

    14/24

     

    Did You Know?

    MasterMap

    The UK Ordnance Survey MasterMap is a framework for the referencing of geographic information in GreatBritain. It comprises four layers that provide detailed topographic, address, aerial imagery and road network

    features positioned on the National Grid.

    The MasterMap has following main features:

      Data layers provide a seamless topographic database for the UK at the scales of 1:1250 and 1:2500

      Real world features are represented by points, lines and polygons each with their own unique reference

    called TOID  The data can be supplied in a topologically structured format.

    There are over 430 million features in the MasterMap database and around 5000 updates are made every day.The data have been used successfully in a range of projects.

  • 8/19/2019 5. Data Model and Data Strutures

    15/24

     

    Raster Data Structure

    In a simple raster data structure the geographical entities are stored in a matrix of rectangular cells. A code isgiven to each cell which informs users which entity is present in which cell. The simplest way of encoding a rasterdata into computers can be understood as follows:

    (a) Entity model: It represents the whole raster data. Let us assume that the raster data belongs to an

    area where land is surrounded by water. Here a particular entity (land) is shown in green color andthe area where land is not present is shown by white.

    (b) Pixel values: The pixel value for the full image is shown. Cells having a part of the land are

    encoded as 1 and others where land is not present are encoded as 0.

    (c) File structure: It demonstrates the method of coding raster data. The first row of the file structure

    data tells that there are 5 rows and 5 columns in the image, and 1 is the maximum pixel value. The

    subsequent rows have cells with value as either 0 or 1 (similar to pixel values).

    The huge size of the data is a major problem with raster data. An image consisting of twenty different land-useclasses takes the same storage space as a similar raster map showing the location of a single forest. To addressthis problem many data compaction methods have been developed which are discussed below:

  • 8/19/2019 5. Data Model and Data Strutures

    16/24

     

    Run length encoding 

      Reduction of data on a row by row basis

     

    Stores a single value for a group of cells rather than storing values for individual cells

      First line represents the dimension of the matrix (5×5) and the number of entities (1) present. In second

    and subsequent lines, the first number in the pair represents absence (0) or presence (1) of the entity andthe second number indicates the number of cells referenced.

    Block encoding 

      Data is stored in blocks in the raster matrix.

      The entity is subdivided into hierarchical blocks and the blocks are located using coordinates.

      The first cell at top left hand is used as the origin for locating the blocks

  • 8/19/2019 5. Data Model and Data Strutures

    17/24

     

    Chain encoding   Works by defining boundary of the entity i.e. sequence of cells starting from and returning to the given

    origin

      Direction of travel is specified using numbers. (0 = North, 1 = East, 2 = South, 3 = West)  The first line tells that the coding started at cell (4, 2) and there is only one chain. In the second line the

    first number in the pair tells the direction and the second number represents the number of cells lying inthis direction.

  • 8/19/2019 5. Data Model and Data Strutures

    18/24

    Quadtree 

      A raster is divided into a hierarchy of quadrants that are subdivided based on similar value pixels.  The division of the raster stops when a quadrant is made entirely from cells of the same value.  A quadrant that cannot be subdivided is called a leaf node.

    A satellite or remote sensing image is a raster data where each cell has some value and together these valuescreate a layer. A raster may have a single layer or multiple layers. In a multi-layer/ multi-band raster each layer iscongruent with all other layers, have identical numbers of rows and columns, and have same locations in theplane. Digital elevation model (DEM) is an example of a single-band raster dataset each cell of which contains onlyone value representing surface elevation.

    A single layer raster data can be represented using

    a. Two colors (binary): The raster is represented as binary image with cell values as either 0 or 1 appearing black

    and white respectively

  • 8/19/2019 5. Data Model and Data Strutures

    19/24

     

    Grayscale: Typical remote sensing images are recorded in an 8 bit digital system. A grayscale image is thusrepresented in 256 shades of gray which range from 0 (black) to 255 (white). However a human eye can’t makedistinction between the 255 different shades. It can only interpret 8 to 16 shades of gray.

    A satellite image can have multiple bands, i.e. the scene/details are captured at different wavelengths (Ultraviolet-visible- infrared portions) of the electromagnetic spectrum. While creating a map we can choose to display a singleband of data or form a color composite using multiple bands. A combination of any three of the available bands

    can be used to create RGB composites. These composites present a greater amount of information as compared tothat provided by a single band raster.

  • 8/19/2019 5. Data Model and Data Strutures

    20/24

     

    Comparison between Vector and Raster Data Models

    Data Model  Advantages  Disadvantages 

    Raster

    Simple data structure Cell size determines the resolution at which the data is

    represented

    Compatible with remote sensing orscanned data

    Requires a lot of storage space

    Spatial analysis is easier Projection transformations are time consuming

    Simulation is easy because each unit hasthe same size and shape

     Network linkages are difficult to establish

    Vector

    Data is represented at its originalresolution and form without

    generalization

    The location of each vertex is to be stored explicitly

    Require less storage space Overlay based on criteria is difficult

    Editing is faster and convenient Spatial analysis is cumbersome Network analysis is fast Simulation is difficult because each unit has a different

    topological form

    Projection transformations are easier

  • 8/19/2019 5. Data Model and Data Strutures

    21/24

     

    Geodatabase

    The term ‘Geodatabase’ was introduced by Environmental Systems Research Institute, Inc. (ESRI) and is defined as a collection

    of geographic datasets of various types that are held in a common file system folder such as MS Access database, Oracle, SQLserver, DB2 etc. The geodatabase is built on extended relational database. In this model, entities are represented as objects

    with properties, behavior, and relationships.

    Geodatabase supports various elements of GIS such as attribute data, CAD data, geographic features, satellite and aerialimages, GPS data and survey measurements. These types of data can be represented as data objects viz. annotation,

    dimension, feature class, geometric network, raster dataset, tables, topology, relationship class etc. Geodatabase design isbased on a fundamental step of GIS design which involves organizing geographic information into a series of data themes thenspecifying the content and representation of the thematic layers. Advance capabilities (network, topology, subtypes etc.) areadded later to the geodatabase to model GIS behavior and maintain data integrity. Other key properties of geodatabase designinclude definition of coordinate properties and spatial properties, tolerances, coordinate resolution and metadata documentationfor each dataset.

    Metadata

    Metadata is structured information that describes and makes it easier to retrieve, use, or manage an information resource. It isalso known as data about data.

    Need of metadata 

      To enable the process of search over distributed archives: Similar to a library catalog, it sorts data and makes it easy for

    a user to find it.

  • 8/19/2019 5. Data Model and Data Strutures

    22/24

      Helps assessing the fitness of a dataset for a given use: Metadata is needed to determine whether a dataset will satisfya user’s requirement. Does the data have acceptable quality? It may also have comments from previous users.

      Provides information about data content: In the case of remotely sensed images, it may include the percentage of cloudobscuring the scene and some other information.

      Provide information about handling the dataset: It includes technical specification of the data format, software

    compatible with the data, data volume etc.

    Geospatial metadata commonly keep records of Geographic Information System (GIS) files, geospatial databases, and earthimagery. It can also be used to document data catalogs, mapping applications, data models and related websites. Metadata hasthe information on library catalog elements such as title, abstract, and publication; geographic elements such as geographicextent and projection information; and database elements such as attributes and their values.

    The most widely used standard for metadata is the US Federal Geographic Data Committee’s Content Standards for DigitalGeospatial Metadata (CSDGM). CSDGM describes the items that should be present in a metadata archive but doesn’t prescribe

    the format to present it. Developers implement the standards that suit their own ways but make sure that the implementationsare interoperable i.e. can be understood by other.

  • 8/19/2019 5. Data Model and Data Strutures

    23/24

     

    Figure 10: Screenshot of metadata of a shapefile displayed in ArcCatalog 

  • 8/19/2019 5. Data Model and Data Strutures

    24/24

    Temporal Dimension in GIS 

    Spatial features may change over time in terms of space and the content. The changes could be geometrical (change ingeometry of features), positional (change in position of features), or a change in attributes of the features. When changes inlocations of a group of objects are observed together, the changes in the spatial distribution pattern of the objects can be

    deciphered.

    One may analyze the temporal data sets to monitor the changes that are happening over the time. Though with time, a lot ofthings undergo changes but monitoring the changes must be done prudently as it involves huge investment of resources. The

    monitoring intervals must be fixed in a manner that captures the change in the spatial phenomena and at the same time itmust remain efficient and viable.

    The effect of urbanization on the land use of an area can be monitored by a change detection analysis that makes use oftemporal satellite images and GIS to determine the nature, extent and rate of land cover change and fragmentation over timeand space. Temporal GIS studies are quite popular in the field of forest conservation and management. One of the studies

    described the monitoring of deforestation in a land resource inventory project in Nepal where within an interval of 30 years

    (1950-1980) 50% of the forest land was lost to shrub and agriculture. Similar, temporal studies are carried out for varioussectors of natural resources management such as biodiversity, water; land/soil etc. where considering the future needs, makinga balance between consumption and availability of the natural resources is of utmost importance.

    References

    http://www.ian-ko.com/resources/triangulated_irregular_network.htm viewed on 19 November 2011

    Burrough, P. A & McDonnell, R. A. 1998, Principles of geographical information systems, Oxford University Press, UK.

    Goodchild, M.F., Longley, P.A., Maguire, D. J. & Rhind, D.W 2001, Geographic information systems and science, John Wiley &Sons Ltd. , England.

    Lo, C.P. & Yeung, A. 2009, Concepts and techniques of Geographic Information Systems, PHI Learning Private Limited, NewDelhi.