1 Introduction to Geographic Information Science Database Management Geography 4103 / 5103 Updates Last Lecture • We tried to explore the term “spatial model” by looking at definitions, taxonomies and examples • An understanding of the methods we use (analysis tools), appropriate data models and of the problem we face (modeling) are central • Deriving meaningful representations of events, occurrences or processes by making use of the power of spatial analysis • Modelbuilder: How do you like it?
17
Embed
Introduction to Geographic Information Science - colorado.edu · Introduction to Geographic Information Science Database Management ... • Exploring what Databasesand their elements
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Introduction to Geographic Information Science
Database Management
Geography 4103 / 5103
Updates
Last Lecture
• We tried to explore the term “spatial model” by looking at definitions, taxonomies and examples
• An understanding of the methods we use (analysis tools), appropriate data models and of the problem we face (modeling) are central
• Deriving meaningful representations of events, occurrences or processes by making use of the power of spatial analysis
• Modelbuilder: How do you like it?
2
Today‘s Outline
• We will look into Database Management Systems (DBMS)
• Exploring what Databases and their elementsare and what DBMS means
• Types of DBMS• How attribute data & feature info is managed
and stored• Operations on relational DBMS (Relational
Operators) to manipulate and query/selectdata
• Spatial data
Learning Objectives
• Database Management Systems (DBMS) • Databases and their elements• Types of DBMS• Relational Operators to manipulate and
query/select data• Spatial data
Databases and DBMS
It’s all about our data (attribute information & feature information…)
A database is a collection of data files that is structured (organized).
A database management system (DBMS) is a specialized computer program used to organize & manipulate (manage) the database (data storage, editing, and retrieval).
Oracle, Access, Postgres
3
- Often huge tables- Require maintenance (change, add, delete)
to store data properties and their relationships
- Must serve different people/applications for queries
- Protection from corrupting/deleting & access restrictions
- Geodatabases as a more complex type
DBMS & GIS
Logical vs. Physical Structures• Logical structure = database design (schema)
– “Logical specification of attributes and relationships”
– Conceptual model of items, mappings, cardinality
– Entity- relation diagram / notation (UML)
• Physical structure = database implementation
– Many possible implementations of any schema
– Depends on intended db use requirements
• Speed access, frequent update
• Flexible relationships
• Protect data security
Logical Structure (Schema)
• Bolstad’s (2005) forest trails database– Entity sets hold attributes*
– Relationships hold mappings• Use these for joining tables
– Cardinality (1-N, M-N) defines nature and direction of the relationship
• Review joins-and-relates
• Entity sets hold features, too…
Recreation
Activity
Features
M
4
cardinality
cardinality
cardinality
Attention: 1:M and M:N
Use relate instead of join
If join: only first element in shp filesgdb: relationships for all mappings
Spatial Joins…M:1 Landuse features (M)& descriptions (1)
Physical Structures/ Database Models
• … particular way of conceptually organizing multiple data files in a database (implementation)
• Flat File: text files• Hierarchical: parent-child• Network: nodes & links• Relational: tables related via keys• … Hybrid/ Object-oriented
Hierarchical and network database models have generally been replaced by the relational data model.
Data in a “text”formatted file (row/column format).“Initial stage format”
Advantages:Transparent, easily transportable
Disadvantages:Little structure, few error safeguards, no ability to cross-reference or link among entries
Flat File
5
Hierarchical DBMS Root entity & tree (e.g. ArcCatalog, Windows Explorer) and parent-children relationships
Simple, hard to capture complex relationships, slow searches
Redundancies exist (updates!) - duplicates
forests
trails
featuresredundant
Network DBMS
forests
trailsfeatures
activity
Eliminate redundancy - permit multiple parents for each child
Disadvantages: Advantages:difficult to implement fast searchdifficult to update flexible relationshipsdifficult to validate no duplicates
Hierarchic and Network DBMS In Practice
Redundant itemsNot an error (no way to avoid)
No redundant nodes, buterrors in relations
point 4 not part of edge fpoint 5 should be part
6
Physical Structures/ Database Models
• … particular way of conceptually organizing multiple data files in a database
• Flat File: text files• Hierarchical: parent-child• Network: nodes & links• Relational: tables related via keys• … Hybrid/ Object-oriented
Hierarchical and network database models have generally been replaced by the relational data model.
Relational DBMS
• Introduced by E.F. Codd (1968)– Mathematician at IBM, same time as Mandelbrot
• Most frequently encountered DBMS in GIS– Flexible– Wide range of data types
• Simple to implement, modify and understand– Bernhardsen: simple table structure permitted
development of SQL
• Sometimes retrieval is slow (so optimize tables)– Use fewer columns, fewer joins– Use relationship classes instead of joins
• Table: Data organized in rows and columns• Record (rows/tuples): a set of tuples represents logical entities (e.g. road, lake, land use polygon)• Field (column/item): The attribute (property) of the logical entity• Index/key: Attribute(s) used to identify, organize, or order records in a database (needed for relational algebra or joins; see below)
Terminology
ID AREA Perim Class Code27 39.2 55.4 a 11z14 192.4 77.3 a 119f
Long Lake 88,259.5 1 Limnetic zone143,285.3 2 Littoral zone
Sleepy Eye Lake 170,797.1 2 Littoral zone Mud Lake 193,318.5 2 Shallow lakesGoldsmith Lake 201,127.1 2 Littoral zone Emily, Lake 336,343.2 2 Littoral zone
349,528.7 1 Limnetic zone384,160.1 2 Littoral zone
Emily, Lake 420,798.4 1 Limnetic zoneSavidge Lake 479,709.7 2 Littoral zone Emily, Lake 545,381.8 1 Limnetic zoneDog Lake 635,537.0 2 Littoral zoneDuck Lake 1,126,331.9 1 Limnetic zoneWita Lake 1,354,583.2 2 Littoral zone
1,418,133.3 1 Limnetic zoneBallantyne Lake 1,428,331.5 1 Limnetic zoneWashington, Lake 1,914,835.3 1 Limnetic zone
1 937 698 6 1 Limnetic zone
Name AREA class Type4,040,675.7 1 Limnetic zone1,937,698.6 1 Limnetic zone
Washington, Lake 1,914,835.3 1 Limnetic zoneBallantyne Lake 1,428,331.5 1 Limnetic zone
1,418,133.3 1 Limnetic zoneDuck Lake 1,126,331.9 1 Limnetic zoneEmily, Lake 545,381.8 1 Limnetic zoneEmily, Lake 420,798.4 1 Limnetic zone
349,528.7 1 Limnetic zoneLong Lake 88,259.5 1 Limnetic zoneEmily, Lake 58,662.2 1 Limnetic zoneEmily, Lake 52,222.6 1 Limnetic zoneDog Lake 635,537.0 2 Littoral zoneWita Lake 1,354,583.2 2 Littoral zone Savidge Lake 479,709.7 2 Littoral zone
384,160.1 2 Littoral zone Emily, Lake 336,343.2 2 Littoral zone Goldsmith Lake 201,127.1 2 Littoral zone Sleepy Eye Lake 170,797.1 2 Littoral zone
143,285.3 2 Littoral zone Mud Lake 193,318.5 2 Shallow lakes
70,590.3 2 Shallow lakes64 588 5 2 Shallow lakes
Simple sort – ascending AREACompound sort – ascending Type, then descending AREA within Type
Constraints on relational implementation
• Rules for implementing tables appear to be fast and loose.
• In fact, two kinds of constraints allow flexibility yet preserve logical consistency.– Constraint 1 – limit the number of legal
operations on relational tables (Relational Algebra)
– Constraint 2 – Balance the amount of redundancy (Normal Forms)
This occurs when concepts of Normal Forms in relational tables are violated.
Pitfalls of Relational Tables
Constraint 2 – limit redundancy(do this with indexing keys & dependencies)
• Dependencies needed to make relational DBMS work. Dependency means that one column pre-determines another.
• Dependency Redundancy
(they complement and balance each other)– Too much bulky database, slower performance
– Not enough can’t find all the info in the table easily, and difficult to join when added information needed
Simple (Functional) Dependency
• Dfn: knowing one field in a row determines what the value in another field would be.
Example: Student DatabaseKnowing a Buff One number determines student nameKnowing name determines major (even Undeclared) Knowing major determines College (A&S, ENG)
• Functional dependencies are good (they’re simple)• Transitive dependencies are bad
• Transitive: sequence of simple dependencies in one table.• Bad because too much redundancy creates complex primary and foreign indexing keys (again, bulky, slow, and possibly contradictory)
14
How to resolve Constraint #2?• Normalization insures indexing keys provide just
the right amount of dependency in single table. – “Just the right amount” means that edits can be made in just one
table and propagated through the rest of the DBMS using table relationships (and joins).
– And database edits cannot easily corrupt the data (goal is to free the database of modification anomalies).
How to resolve Constraint #2?
• Normalize in stages, called “Normal Forms”– Each form inserts or eliminates dependencies
– Codd proposed six in a sequence – three added later
– First three needed for GIS
– When all three are in place, relational database contains only simple dependences
– A normalized database is suitable for general purpose queries, meaning special cases in the database should not require different query formulation than general cases.
Normal Forms
• 1st normal form: Atomic columns and cell values– every cell contains only one attribute value, and
– no repeat columns appear in any single table
• 2nd normal form: establish simple dependencies– attributes that do not make up the primary key are
functionally dependent only on the primary key
– Split tables to remove duplicate rows
• 3rd normal form: eliminate transitive dependencies– Split tables to remove dependent rows and columns
Six additional normal forms can be established, but GIS uses only these three…
15
Establish 1st Normal Form
BuffOne Student Name Major Dept College
1234..789 Sally Jones Interface Dsn CSI ENG
1357..246 Bob WillisPolicy,
Human GeogENVS, GEOG A&S
9876..321 Kathy DunnPhys Geog,
PolicyGEOG, ENVS A&S
5432..567 Hal Smith GIS GEOG A&S
5798..123 Carl TomlinAnalysis,
GISENVS, GEOG A&S
Problem? Cells can have only one value (thus queries need to recognize and isolate one major from list of possibly multiple majors – queries become more difficult than they need to be)
Establish 1st Normal Form
BuffOneStudent Name Major Dept College Major Dept College
1234..789 Sally JonesInterface
Dsn CSI ENG -- -- --
1357..246 Bob Willis Policy ENVS A&SHuman Geog GEOG A&S