GIS and Databases Basics Dr. Tarendra Lakhankar
GIS and Databases Basics
Dr. Tarendra Lakhankar
• Basic geographic concepts• Introduction to GIS, coordinate system, projection, datum
• Data: Acquisition, Input and Management• Data model: vector vs. raster• Data source: map, attribute data (geocoding), GPS, remote sensing
• Data input: digitizing• Data quality and meta data• Data management: database
• Analysis
Where are we now?
2
Context of what we are learning
Satellite Images
Aerial Photographs
Maps
Digitizing
GPS (later)
Non‐spatial data(Attribute Data)
Spatial Database
Geo‐Reference
Join, Relate, Geocoding
3
• a System ‐ a group of connected entities and activities
• an Information System ‐ a set of procedures, executed on raw data, to produce information for decision making
• a Geographic Information System ‐ an information system that uses geographically referenced data
GIS Definition 2: Break Down Words
4
• Introduction to Database Management System (DBMS)• Recognize four database types
• Relational Database Basics• Basic understanding of database theory
• GeoDatabase Overview
• Demo:• Common GIS database operation• Join/Relate
Today’s Outline
5
Evolution of GIS Environments
6Application Programming Interface (API)
Evolution of GIS Environments
7
Building Information Modeling (BIM)
• Flat Files• Flat files are easier to understand• Difficult to manage and manipulate• Large file size
• Database• Data is organized or structured using a database model• Reduce data redundancy• Data integrity is improved• Can be “queried”‐‐many databases use the same language,
SQL (Structured Query Language), for formulating queries.
Flat File vs. Database
DATABASE = Data file(s) + data organization + processing ready
8
• Goal for any DBMS: efficient searching and linking of tabular data.
• GIS DBMS Goal: efficient manipulation (including search and linking) of spatial objects (points, lines, polygons, polylines), relationships between objects and tabular data (i.e., topology, attributes).
Database Management Systems
Data Software Hardware
Database Management SystemDBMS
9
•Field: One item of information per object (column)
Forest Trail Feature
Nantahala Bryson’s Knob Vista
Cherokee Slickrock Falls Ogrth
Pisgah Chimney Rock Wlife
Field vs. Column; Record vs. Row
•Record: Information items about one object (row)
Typically you operate on a field or column and select records or rows. A map object lights up when a row is selected
10
• There are four basic database structures:• Traditional
• Hierarchical• Network• Relational
• Recent development • Object Oriented (O‐O)
• Relational database is most widely used.
Ways of Organizing Information
11
Evolution of DBMS Technology
File System
Hierarchical DBMS Network DBMS
Relational DBMS Object‐Oriented System (OODBMS)
Object‐Relational ORDBMS
12
Example of Data Organization
13
Example of Data Organization
14
The Basic of Relational Database
15
• Database includes multiple tables
• Tables are joined by relationships
• Relational model is grounded in mathematics: relational algebra defines the mathematical rules by which tables are manipulated.
• Any kind of attribute search (lateral, vertical) is possible.
• Examples of relational database programs• Microsoft Access, Microsoft® SQL Server™, Oracle, DB2, FoxPro,
• MySQL, postgrSQL
Relational Database
16
• Eliminate duplicate information
• Assist in querying data
• Simpler to manipulate data
• Reduce disk space
• Relational model has been most successful within GIS (and within the database world in general)
Why Use a Relational Database?
17
• Key Fields (2 keys)
• Relationships (3 relationships)
• Referential Integrity
• Database Normalization
Relational Databases: Terminology
18
• Keys – used to create uniqueness and link tables together
• Primary Key: Uniqueness, eliminate Redundancy• Foreign Key: Linking tables, establishes relationships between tables
Key fields
19
Primary Key
• Primary keys uniquely identify each record in a table.• Primary keys become the foreign key in another table
20
Foreign Key
21
Data relationships: Cardinalities
ENGRG59910
ENG Courses
StudentsOne‐to‐Many (1:M)
Many students attend this class
StudentsCCNY ID
One‐to‐One (1:1)
Each CCNY student has a unique ID number (i.e., functional redundancy)
Students Classes
Many‐to‐Many (M:M)
Many students are enrolled in many classes
Each student is taking manyclasses
Each class has many students
22
One‐to‐one Relationships
•Only one matching record•Uses primary key for both tables•Use to limit access or isolate information
23
One‐to‐Many Relationships
• Most common type of relationship• Related between primary and foreign keys
24
Many‐to‐Many Relationships
• Not directly supported between tables• Use a junction table to relate• One order, many products• One product, many orders
25
Referential Integrity
PK
FK
• Maintain data accuracy• Prevents orphan records• Keeps relationships intact
26
Referential Integrity
27
• Most GIS packages still keep using hybrid solution: spatial data + attribute data (Arc+Info)
• The emergence of spatial database changes the way. Now many DBMS support spatial database: Oracle, DB2, MS SQL Server (commercial), and MySql, PostGreSql(open‐source)
Summary of Databases and GIS
Spatial Data
DBMSGeometric
DBMSAttribute
Geometric data•Usually hierarchical
• Invisible to the user
Attribute data• Almost entirely relational
•Manipulated by the user
28
• Arc/Info: Hybrid name, the history of ArcGIS.
• Without attribute data, spatial data will be of limited use.
Attribute Data and Spatial Data
29
• What is Geodatabase?
• Type of Geodatabase
• Geodatabase objects
Geodatabase
30
What is geodatabase?
A geodatabase (short for geographic database) is a physical store of geographic information (spatial, attribute, metadata, and relationships) inside a relational database management system (RDBMS).
31
• Personal Geodatabase for Microsoft Access• File Geodatabase (new since V9.2)• Workgroup Geodatabase (new since V9.2)
• SQL Server Express
• Enterprise Geodatabase: • 5 supported DBMSs:DB2, Informix, Oracle, MS SQl Server, PostgreSQL
Geodatabase Types Since Ver9.2
Increasing size and functionality
http://resources.arcgis.com/en/help/main/10.1/index.html#//003n00000007000000 32
What does a Geodatabase look like?
33
What does a Geodatabase look like?
34
Personal GeoDatabase (Access)
35
Geodatabase (file‐based)
36
Geodatabase objects
• basic objects: ‐ feature classes, ‐ feature datasets,‐ nonspatial tables.
• complex objects building on the basic objects:
‐ topology, ‐ relationship classes, ‐ geometric networks
37
• A feature class is a geographic feature include points, lines, polygons, and annotation feature class.
• Feature classes may exist independently in a geodatabase as stand‐alone feature classes or you can group them into feature datasets
Feature classes
The SouthAmerica geodatabase contains four stand‐alone feature classes:a point feature class of cities, a dimension feature class of distances between cities, a polygon feature class of countries, and an annotation feature class of country names
Source: www.esri.com 38
Feature datasets
• A feature dataset is composed of feature classes that have been grouped together so they can participate in topological relationships with each other. All the feature classes in a feature dataset must share the same spatial reference (or coordinate system)
• Edits you make to one feature class may result in edits being made automatically to some or all of the other feature classes in the feature dataset
In the CityWater geodatabase, three point feature classes and one line feature class were groupedinto the PublicWater feature datasetto create a geometric network called WaterNet.
Source: www.esri.com
39
• Feature class tables and nonspatialattribute tables.
• Both types of tables are created and managed in ArcCatalog and edited in ArcMap. Both display in the traditional row‐and‐column format. The difference is that feature class tables have one or more columns that store feature geometry.
• Nonspatial tables contain only attribute data (no feature geometry) and display in ArcCatalog with the table icon. They can exist in a geodatabase as stand‐alone tables, or they can be related to other tables or feature classes.
Tables
The cfcc_desc table in the SantaBarbara geodatabase contains attribute data for the Roads feature class (stored inside the Roads feature dataset).
Source: www.esri.com
40
• Feature: A geographic representation of a spatial object• Features: One row in a table represents one feature• Feature Classes: one table or more than one table• Feature Dataset: a set of feature classes
Organizing Geographic Features
41
GeoDatabase Elements
Feature class
Geometric network
Annotation class
Geodatabase
Relationship class
Table
Feature data set
42
• In a GIS, spatial relationships among feature classes in a feature dataset are defined by topology. You can choose whether to create topology for features.
• The primary spatial relationships that you can model using topology are adjacency, coincidence, and connectivity
• There are three types of topology available in the geodatabase: geodatabase topology (over 20 topology rules), map topology, and geometric network topology. Each type of topology is created from feature classes that are stored within a feature dataset. A feature class can participate in only one topology at a time
Topology
43
Example of Topology in a Geodatabase
44
Geometric Networks
• In the real world, examples of networks abound: streams joining together to form larger streams, pipes carrying water to homes and businesses throughout a city, and power lines carrying electricity.
• In a geodatabase, you can model each of these real‐world networks with a geometric network. Starting with simple point and line feature classes, you use ArcCatalog to create a geometric network that will enable you to answer questions such as: Which streams will be affected by a proposed dam? Which areas will be affected by a water main repair? What is the quickest route between two points in the network?
45
Geometric Network example
Lateral
Service
Main
Feed
ValveFeature Classes
Source: ESRI European User Conference
Geometric Network
46
Relationship Classes
• In a geodatabase, relationship classes provide a way to model real‐world relationships that exist between objects such as parcels and buildings or streams and water sample data. By using relationship classes, you can make your GIS database more accurately reflect the real world and facilitate data maintenance.
The relationships stored in a relationship class can be between two feature classes (such as buildings and parcels, top) or between a feature class and a nonspatial attribute table (such as streams and water quality sampling data, bottom).
47
• Provided by ESRI http://support.esri.com/index.cfm?fa=downloads.dataModels.gateway
• Goal: provide a practical template for implementing GIS projects
• Start to think about your final project now• Great start point for your GIS project
ESRI data models
48
• Address • Agriculture • Archiving • Atmospheric • Basemap• Biodiversity • Census‐Administrative Boundaries • Defense‐Intel • Energy Utilities • Energy Utilities ‐ MultiSpeak TM • Environmental Regulated Facilities • Forestry• Geology• GIS for the nation• Groundwater
Industry‐specific Data models
• Health • Historic Preservation and Archaeology • Homeland Security • Hydro • International Hydrographic Organization
(IHO) S‐57 for ENC • Land Parcels • Local Government • Marine • National Cadastre• Petroleum • Pipeline • Raster • Telecommunications • Transportation • Water Utilities
From: http://support.esri.com/index.cfm?fa=downloads.dataModels.gateway 49
Data model: national GIS
You can download it from ESRI website directly 50
• Definition• Query is the action or result of selecting a subset of records based on specific attribute values
• General Categories:• Attribute (Tabular) • Spatial
• Two main methods:• Boolean operators (AND, OR, NOT)• SQL operators (< > + = …)\
• Structured querying language (SQL)• The mathematical basis of relational databases led to a standard languages for querying data (SQL) that uses simple mathematical operators
• Relational databases allow the user to “nest” operations for complex queries
Queries
51
• Set Operators
• = Equal• < > Not Equal• < Less Than• > Greater Than• <= Less Than or Equal• >= Greater Than or Equal
• Relational Operators • Union• Intersection• Difference• Product
Database Use: Structured Query Language (SQL)
Aggregate Functions: “Summarize”
• Sum of values for all rows for a given column.
• Average of given column• Column Maximum• Column Minimum• Number of Rows (Count) that Satisfy a Condition
52
• Two Statements:• Females / Pop1990 < 0.55, then • Pop1990 > 100000 and Pop1990 < 200000
• Or, one statement:(Females/Pop1990 < 0.55) and ((Pop1990 > 100000) and (Pop1990 < 200000))
A Nested SQL Statement
Compute all instances where the % of females in the 1990 population is less than 55%Then identify all population centers for the 1990 census where this true if these are larger than 100,000 and less than 200,000
53
• Point Queries• what is at a particular location?
• Range Queries• what is in a particular area?
• Nearest Neighbor Queries• where is the nearest object to a particular location?
• Spatial Join Queries• where are the areas that have water supply and power supply?
• Spatial Aggregate Queries• where is the most populated region?
Spatial Queries
54
• Queries – selection operations that produce data subsets
• Join and Relate – bringing data together(one table with non‐spatial attribute, one table with features)
Common Attribute Operations
55
• General Categories:• Tabular – based on some information within the attribute tables, e.g., a common field
• Spatial – based on location: nearest, within or aggregate
• Geodatabase Relationship class
• Strategies• Join• Relate
Relations
CountyPerson
Age
Polygon_id = 157
Gpsid = 29LC =
Agriculture
56
Forest-ID ForestName
1 Nantahala
2 Cherokee
Join vs. Relate
• Join• Appends fields from second table —
with data for each record where a key field match is found (empty, otherwise)
• For 1:1 or M:1 only• In 1:M or M:M, it stops with first hit
(can’t add rows/records for additional relationships)
• Relate• Allows automatic access to a
related table’s records; keep tables physically separate
• For 1:M or M:M• Doesn’t add records to layer’s
table, so not limited by initial table’s size
Forest-ID Trail_Name Features Trailhead
1 Bryson's Knob Vista X1, Y12 Slickrock Falls Ogrth X2, Y21 North Fork Wfall X3, Y32 Cade's Cave Wlife X4, Y41 Appalachian Cmp X5, Y5
57
• Puts two tables together, on the fly, to make one table• One‐to‐one join (e.g., join state attribute data to state shapefile by StateName)
• One‐to‐many join (e.g., join code table to feature attribute table to add code description. Many records can use the same code value.)
• Each table in a join must have key attribute for matching• Must have same values and data types for key in both tables
Review: Table joins
58
Example join
+ =
59
• Field types are different (e.g., one is numeric and one is text)
Problems with joins
Text values left alignwhile numeric valuesright align
60
• Create a new field of the same type and use Field Calculator
Solution
61
• Both tables are same field types
Solution
62
• Data format varies
Problems with joins
Must remove dashes
63
• Introduction to Database Management System (DBMS)• 4 types DBMS
• Relational Database Basics• 2 keys• 3 relations• Data Normalization Form analysis
• Database Overview and Operation• Join vs relate
What did we learn today?
64
Your Project
• Do you had a topic?• Have you found a problem that you can solve using GIS?• What is your study area?• What data layers do you need? • Where can you find the data?• Do you need to collect your own data?• Do you need to convert your data (Projection, GeoReference)
Ideas to your final project
66