Geographic Information Systems PhD Teaching Assistant Chiriac Cosmin
Table of Contents
• Geographical Information Systems (GIS) ▫ History of GIS ▫ Definition ▫ Where are they used (application vs. research)
• Data in GIS ▫ Types of data ▫ Data Accuracy and Quality
• Organizing data in GIS ▫ Layering the data ▫ Topologies and relations in GIS ▫ Spatial database alternatives
3
• History of GIS • pre-computer era GIS
• Pioneers and initial developments
• Early GIS example (CGIS)
• Definition of GIS • Science or Tool?
• Definitions
• The common ground
Geographical Information Systems (GIS)
History of GIS
• Computer used GIS – in use since late 1960’s1
• Manual predecessors - perhaps 100 years earlier1
4
History of GIS
• important initial developments1,3: ▫ William Garrison and his students + Edgar Horwood –
University of Washington, Department of Geography ▫ Howard Fisher – Harvard Laboratory for Computer
Graphics (and Spatial Analysis) - SYMAP ▫ Roger Tomlison – Canada Geographic Information
Systems (CGIS) ▫ Jack Dangermond – Environmental Systems Research
Institute (ESRI) ▫ David P. Bickmore – Experimental Cartography Unit
(ECU)
9
History of GIS
• 1966 – one of the first, but probably the first GIS: ▫ Canadian Geographic Information System
• its purpose: ▫ Canada Land Inventory
• how it was done (roughly): ▫ aerial photography arranged in a mosaic
▫ scanning of the aerial photographs
▫ digitization (vectorization) of the aerial photographs
▫ classification of the digitized data
12 http://www.elsevier.com/connect/how-aerial-photography-altered-the-way-we-perceive-environmental-change
History of GIS
• developments in the field of GIS1,3 ▫ topological coding of boundaries (link/node)
▫ internal structuring of data (indexing scheme)
▫ GIS operations of overlay, area measurement
▫ experimental scanner had to be built for map input (drum scanner)
13
Definition of GIS
• GIS acronym for Geographical Information System
• Debate over whether it is a science or a tool ▫ GIScience, Geomatics, Geoinformatics
14
TOOL TOOLMAKING SCIENCE
Definition of GIS
There are many definitions provided6: 1. a system for capturing, storing, checking,
manipulating, analyzing, and displaying data which are spatially referenced to the Earth
2. any manual or computer based set of procedures used to store and manipulate geographically referenced data
3. an institutional entity, reflecting an organizational structure that integrates technology with a database, expertise and continuing financial support over time
15
Definition of GIS
4. an information technology which stores, analyses and displays both spatial and non-spatial data
5. a special case of information systems where the database consists of observations on spatially distributed features, activities, or events, which are definable in space as points, lines, or areas. A GIS manipulates data about these points, lines, and areas to retrieve data for ad hoc queries and analyses
6. a database system in which most of the data are spatially indexed, and upon which a set of procedures operated in order to answer queries about spatial entities in the database
16
Definition of GIS
7. an automated set of functions that provides professionals with advanced capabilities for the storage, retrieval, manipulation, and display of geographically located data
8. a powerful set of tools for collecting, storing, retrieving at will, transforming and displaying spatial data from the real world
9. a decision support system involving the integration of spatially referenced data in a problem-solving environment
17
Definition of GIS
10. a system with advanced geo-modelling capabilities
11. a form of MIS [Management Information System] that allows map display of the general information
18
Definition of GIS
• What is the common ground: ▫ it processes geographical information
▫ the focus is on analysis capability
19
20
• Types of data • Spatial (Raster/Vector)
• Attribute
• Data Accuracy and Quality • Accuracy
• Quality
• Error (inherent error/operational error)
20
Data in GIS
Types of data
• Images can be used as: ▫ basemaps aerial photographs
satellite images
topographic maps
▫ as attribute data (non-spatial) – linked to a certain location
23
Types of data
• Raster7, 8
▫ incorporate the use of a grid-cell data structure
▫ the size of the cells defines the accuracy and resolution of a raster image
▫ cells usually contain a single discrete value
▫ it is actually a matrix and, therefore, sophisticated mathematical modelling processes are possible
▫ many times they are obtained from vector-to-raster conversions
24
Types of data
• Advantages : ▫ except the origin point, e.g. bottom left
corner, no geographic coordinates are stored.;
▫ Due to the nature of the data storage technique data analysis is usually easy to program and quick to perform;
▫ The inherent nature of raster maps, e.g. one attribute maps, is ideally suited for mathematical modeling and quantitative analysis.;
▫ Appropriate for both discrete data, and continuous data
▫ Grid-cell systems are very compatible with raster-based output devices, e.g. electrostatic plotters, graphic terminals.
• Disadvantages : ▫ The cell size determines the resolution
at which the data is represented.; ▫ linear features hard to represent at
certain cell resolutions.; ▫ Raster maps inherently reflect only one
attribute or characteristic for an area; ▫ Since most input data is in vector form,
data must undergo vector-to-raster conversion. Besides increased processing requirements this may introduce data integrity concerns due to generalization;
▫ Most output maps from grid-cell systems do not conform to high-quality graphic needs.
25
Types of data
• Vector10: ▫ models: spaghetti (series of points, no connection)
primitive instancing (uses symbols located at x,y coordinates)
entity-by-entity (geometrical objects: points, lines, polygons)
26
Types of data
• Vector10: ▫ point or vertex: a pair of x,y coordinates
▫ line: a string, an ordered sequence of connected, non-
branching lines
▫ polygon (geometric ring): sequence of non-intersecting strings that close
one polygon/area: represented by an outer and 0 or more inner rings
27
x1, y1
x2, y2 x3, y3
…
Types of data – Vector Data Model
• Advantages : ▫ Data can be represented at its
original resolution and form without generalization.
▫ Graphic output is usually more aesthetically pleasing (traditional cartographic representation);
▫ Since most data, e.g. hard copy maps, is in vector form no data conversion is required;
▫ Accurate geographic location of data is maintained.
▫ Allows for efficient encoding of topology, and as a result more efficient operations that require topological information, e.g. proximity, network analysis.
• Disadvantages : ▫ The location of each vertex needs
to be stored explicitly; ▫ For effective analysis, vector data
must be converted into a topological structure;
▫ Algorithms for manipulative and analysis functions are complex and may be processing intensive;
▫ Complex Data Structures ▫ Continuous data, such as elevation
data, is not effectively represented in vector form; and
▫ Spatial analysis and filtering within polygons is impossible.
28
Attribute data
• Internal and external attribute tables
• External attribute data models: ▫ Tabular (outdated in GIS, limited indexing possibilities)
▫ Hierarchial (tree structure; many children, one parent)
▫ Network (tree structure; more parents)
▫ Relational (RDBS, indexing, most used in GIS)
▫ Object-Relational Databases (Geodatabase from ESRI)
29
Data Accuracy and Quality
• Accuracy is the closeness of results of observations to the true values or values accepted as being true11. ▫ positional absolute
relative
▫ attribute identifying geographic features correctly is extremely
important
in reality there is not that much homogeneity as areas on maps suggest
30
Data Accuracy and Quality
• Quality can simply be defined as the fitness for use for a specific data set11.
• According to US Spatial Data Transfer Standard, quality is defined by: ▫ Lineage
▫ Positional Accuracy
▫ Attribute Accuracy
▫ Logical Consistency
▫ Completeness
31
Data Accuracy and Quality
• Error11
▫ inherent Inherent error is the error present in source documents
and data (this includes scale, original data source).
▫ operational the amount of error produced through the data capture
and manipulation functions
32
Data Accuracy and Quality
• Inherent errors ▫ due to map scales depending on the scale of a map the a feature could be in
a certain buffer zone (1:20,000 – +/-20m )
▫ due to errors in the original data
▫ eliminating error is costly and time consuming
33
Data Accuracy and Quality
• Operational errors ▫ mislabelling of areas on thematic maps;
▫ misplacement of horizontal (positional) boundaries;
▫ human error in digitizing;
▫ classification error;
▫ GIS algorithm inaccuracies; and
▫ human bias.
35
36
• Layering the data
• Topologies and relations in GIS
• Spatial database alternatives
36
Organizing data in GIS
Layering the data
37 http://www.elsevier.com/connect/how-aerial-photography-altered-the-way-we-perceive-environmental-change
• Layers of GIS data can be organized in different ways
• They depend on ▫ the purpose of the system,
▫ on geographical characteristic, and
▫ on scale of original and final representation
Layering the data
• Thematic layers ▫ Vector ▫ Raster ▫ Image ▫ Attribute
• All the types of layers contained by a thematic layer can have data for the same feature, represented at different scales
38
Layering the data
• Scale ▫ a thematic layer can contain vector layers that can only
be visible at certain scales due to physical constraints
due to relevancy constraints
39
Topologies and relations in GIS
• Topology ▫ the mathematical definition of feature storage utilized
in vector GIS
▫ it has specific connectivity requirements for point, linear, and polygonal features.
41
Topologies and relations in GIS
• Polygons: ▫ must be covered by
▫ must not have gaps
▫ must not overlap…
• Lines ▫ must not intersect
▫ must not self intersect
▫ must not self overlap
▫ must not have psedonodes
• Points ▫ must be covered by
boundary
▫ must be covered by line
▫ must be properly inside polygons
▫ must be coincident with
▫ must be disjoint…
42
Topologies and relations in GIS
• Subtypes ▫ Localities: urban/rural
▫ Forests: Coniferous
Deciduous
Mixed
▫ Roads European
National
Regional
County
Local/Communal
▫ Roads
43
Topologies and relations in GIS
• Networks ▫ they define the connectivity and fluxes of graphical
entities
▫ they usually contain points and lines
▫ are specific to: transportation networks
energy distribution networks
communication networks
hydrographic networks
44
Topologies and relations in GIS
• they are helpful in the editing process as they highlight operational errors
• after errors are eliminated, internal topology (mathematical relations and indexing) is built
• any further editing implies the rebuilding of the internal topology
• data processing is faster once internal topology is built
46
Conclusions
• How is GIS used ▫ transaction processing systems (TPS)
▫ decision-support systems (DSS)
▫ research finding representations and processing algorithms useful
in DSS and TPS
implementing enhancements in the data input, processing, algorithms
opening the way for new implementation of GIS
building industry specific data models
47
Spatial database alternatives • Esri has a number of both single-user and multiuser geodatabases.
• Boeing's Spatial Query Server (Sybase ASE)
• Smallworld VMDS, the native GE Smallworld GIS database
• SpatiaLite extends Sqlite with spatial datatypes, functions, and utilities.
• IBM DB2 Spatial Extender can be used to enable any edition of DB2, including the free DB2 Express-C, with support for spatial types
• Oracle Spatial
• Microsoft SQL Server has support for spatial types since version 2008
• PostgreSQL DBMS (database management system) uses the spatial extension PostGIS to implement the standardized datatype geometry and corresponding functions.
• Teradata Geospatial includes 2D spatial functionality, OGC compliant, in its data warehouse system.
• MonetDB/GIS extension for MonetDB adds OGS Simple Features to the relational column-store database.[4]
• Linter SQL Server supports spatial types and spatial functions according to the OpenGIS specifications.
• MySQL DBMS implements the datatype geometry plus some spatial functions that have been implemented according to the OpenGIS specifications.[5] However, in MySQL version 5.5 and earlier, functions that test spatial relationships are limited to working with minimum bounding rectangles rather than the actual geometries. MySQL versions earlier than 5.0.16 only supported spatial data in MyISAM tables. As of MySQL 5.0.16, InnoDB, NDB, BDB, and ARCHIVE also support spatial features.
• Neo4j - Graph database that can build 1D and 2D indexes as Btree, Quadtree and Hilbert curve directly in the graph
• AllegroGraph - a Graph database provides a novel mechanism for efficient storage and retrieval of two-dimensional geospatial coordinates for Resource Description Framework data. It includes an extension syntax for SPARQL queries
• MongoDB, RavenDB, and RethinkDB support geospatial indexes in 2D
• SpaceBase is a real-time spatial database.[6]
• CouchDB a document based database system that can be spatially enabled by a plugin called Geocouch
• CartoDB is a cloud based geospatial database on top of PostgreSQL with PostGIS.
• StormDB is an upcoming cloud based database on top of PostgreSQL with geospatial capabilities.
• AsterixDB is an open source Big Data Management System with native geospatial capabilities.
• SpatialDB by MineRP is the world's first open standards (OGC) spatial database with spatial type extensions for the Mining Industry.[7]
• H2 supports geometry types[8] and spatial indices[9] as of version 1.3.173 (2013-07-28). An extension called H2GIS available on Maven Central gives full OGC Simple Features support.
• GeoMesa is a cloud-based spatio-temporal database built on top of Apache Accumulo and Apache Hadoop. GeoMesa supports full OGC Simple Features support and a GeoServer plugin.
• Ingres 10S and 10.2 include native comprehensive spatial support. Ingres includes the Geospatial Data Abstraction Library cross-platform spatial data translator.[10]
• Tarantool supports geospatial queries with RTREE index.
• SAP HANA supports geospatial with SPS08 [1].
48
Bibliography
1. Coppock, J. Terry, and David W. Rhind. "The history of GIS." Geographical information systems: Principles and applications 1.1 (1991): 21-43.
2. Michael F. Goodchild. (1997) What is Geographic Information Science?, NCGIA Core Curriculum in GIScience, http://www.ncgia.ucsb.edu/giscc/units/u002/u002.html, posted October 7, 1997
3. http://ibis.geog.ubc.ca/courses/klink/gis.notes/ncgia/u23.html#SEC23.3
4. http://www.ncgia.ucsb.edu/giscc/units/u002/u002.html 5. http://geospatialworld.net/magazine/MArticleView.aspx?ai
d=19111&Itemid=262
49
Bibliography
6. Maguire, David J. "An overview and definition of GIS." Geographical Information Systems: principles and applications 1 (1991): 9-20.
7. http://www.gsd.harvard.edu/gis/manual/dem/ 8. http://webhelp.esri.com/arcgisserver/9.3.1/java/index.
htm#geodatabases/raster_basics.htm 9. http://blog.safe.com/2007/12/fme-evangelism-weekly-
issue-4/ 10. Theobald, David M. "Topology revisited: representing
spatial relations." International Journal of Geographical Information Science 15.8 (2001): 689-705.
50