Attribute data handling in a GIS environment.
Attribute data handling in a GIS environment.
Outline
Linking data to place
Definitions
Characteristics of DBMS
Types of database
Relational model
SQL
Database design
Linking data to place
Having previously defined how geography can be modeled within a GIS, we also need to consider how the characteristics (or attributes) of the geographic features are associated with that geography.
GIS DataAttribute linkages
Spatial data
P,L,A,Px
Attribute data
NOIR
x
y
points lines areas
Storing attribute data
Attribute data are traditionally stored separately from the coordinate data
(Can you suggest why?)
Feature identifiers enable a link to be made to an attribute table:point attribute tableline or arc attribute tablepolygon attribute table
# of coordinates
1 2
3
Polygon Attribute Table
The Polygon ID would be assigned by the GISoftware (when importing / creating the file), while the Feature ID would be assigned to each polygon by the user and used to link to existing databases.
Similarly we could create point or line linkages.
Storing attribute data
Polygon ID Feature ID
1 A2343
2 A2348
3 A4523
Storing attribute dataGood organization of the attribute data is very important
In socioeconomic GIS applications, the attribute data component is often much larger than the spatial component (i.e., relatively few census tracts, but hundreds of variables)
ID District Province101 Palma Merida102 S. Maria Merida103 Veralo Merida104 Bolo La Paz105 Jose La Paz106 Malabo La Paz107 Chilabo La Paz… … …
101102
103
104105
107106
P_Pop P_TFR Province397881 3.7 La Paz214084 3.2 Merida
… … …
Primary Key Foreign Key
Sheet1
IDDistrictProvince
101PalmaMerida
102S. MariaMerida
103VeraloMerida
104BoloLa Paz
105JoseLa Paz
106MalaboLa Paz
107ChilaboLa Paz
………
&A
Page &P
Sheet: Sheet1
P_Pop
P_TFR
Province
La Paz
214084.0
3.1666666666666665
Merida
…
…
…
Outline
Linking data to place
Definitions
Characteristics of DBMS
Types of database
Relational model
SQL
Database design
Definitions
Database – an integrated set of data on a particular subject
Geographic (=spatial) database - database containing geographic data of a particular subject for a particular area
Database Management System (DBMS) – software to create, maintain and access databases
Storing dataThere are two fundamental ways to store data:
As a simple file (e.g., a txt file) (example?)In a ‘database’
Simple file structures
Moving from files
todatabases
http://cs1.mcm.edu/~rob/class/dbms/notes/Chapt01/index.html
Advantages of DBs over files
Avoids redundancy and duplicationReduces data maintenance costsApplications are separated from the data
Applications persist over timeSupport multiple concurrent applications
Better data sharingSecurity and standards can be defined and enforced
ArcGIS topology rulesArcGIS attribute rules
http://desktop.arcgis.com/en/arcmap/latest/manage-data/editing-topology/geodatabase-topology-rules-and-topology-error-fixes.htmhttp://desktop.arcgis.com/en/arcmap/latest/manage-data/editing-attributes/about-maintaining-attribute-integrity-while-editing.htm
Historical disadvantages of DBs over files
Expense (e.g., an Oracle license would cost over $50k for a large commercial enterprise) (Although there are free / OS DBMS’s available such as MySQL, Postgre and MongoDB.)
Complexity – not an issue with current DBMSs, although with modern data sets you still must consider how best to organize your data.
Performance – generally not an issue with the current generation of DBMSs, although, again, with the terabyte-sized data sets available today this is becoming an issue once again.
Integration with other systems –not an issue with the current generation of DBMSs
Data volume—with massive data volumes processing slows down (this is where NoSQL, MongoDB and Hadoop shine)
https://www.mysql.com/https://www.postgresql.org/https://www.mongodb.com/
Characteristics of DBMS (1)
Data model support for multiple data types
MS Access: Text, Memo, Number, Date/Time, Currency, AutoNumber, Yes/No (Logical), BLOBs, OLE Object, Hyperlink, Lookup Wizard (other DBMS’s are similar)
Load data from files, databases and other applications
Indexes for rapid retrieval
http://www.webopedia.com/TERM/B/BLOB.htmlhttp://www.webopedia.com/TERM/O/OLE_DB.htmlhttp://en.wikipedia.org/wiki/Database_index
Characteristics of DBMS (2)
Query language – SQL (also QBE, …) (also Spatial SQL)
Security – controlled access to dataMulti-level groups
Controlled update using a transaction manager (think ATM)
Ensure attribute consistency (domain rules)
Backup and recovery
DBA toolsConfiguration, tuning
http://www.w3schools.com/sql/http://desktop.arcgis.com/en/arcmap/latest/manage-data/geodatabases/an-overview-of-attribute-domains.htmhttp://www.databaseanswers.org/what_is_a_dba.htm
Characteristics of DBMS (3)Applications
CASE tools (Computer-aided software engineering)Forms builderReportwriterInternet Application Server
Programmable API (Application Programming Interface)(Data abstraction layer)
http://en.wikipedia.org/wiki/Database_abstraction_layer
Role of DBMS
StorageIndexingSecurityQuery
Data entryEditing
VisualizationMappingAnalysis
System
GIS
DBMS
Data
Spatial datamanipulation
Spatial and attribute data
OutlineLinking data to place
Definitions
Characteristics of DBMS
Types of DBMS models
Relational model
SQL
Google Street View
Types of DBMS ModelsHierarchical
Network
Relational - RDBMS
Object-oriented - OODBMS
Object-relational – ORDBMS
Newer forms: NoSQL, NewSQL
Country
Province
County County
Province
Hierarchical
Hierarchical and Network
Hierarchical
Network
Node X YI 1 4II 4 4III 6 4IV 4 1
Line From To Left Right1 I III O A2 I IV B O3 III IV O C4 I II A B5 II III A C6 II IV C B
Poly LinesA 1,4,5B 2,4,6C 3,5,6
1
1
5
4
3
2
6
2 3 4 5 6
A
B C
1
2 3
4 5
6
III
III
IV
O = “outside” polygon
Relational Tables: Topological data model
Keys
Foreign keys
Sheet1
NodeXY
I14
II44
III64
IV41
&A
Page &P
Sheet1
LineFromToLeftRight
1IIIIOA
2IIVBO
3IIIIVOC
4IIIAB
5IIIIIAC
6IIIVCB
&A
Page &P
Sheet: Sheet1
Poly
Lines
A
1,4,5
B
2,4,6
C
3,5,6
Object-oriented DBMS
Inheritance, encapsulation
GE Smallworld
BC Hydro
http://en.wikipedia.org/wiki/Smallworldhttp://www.thefreelibrary.com/Smallworld,+BC+Hydro+and+Westech+Information+Collaborate+to+Develop...-a054570625
Overview
Network essentially a programmer's database model efficient but inflexible and hard to understand
Relational (ESRI Geodatabase)its only complex data type is the relationit is the only complete data model aimed at users instead of programmers relational query languages are easier to use than full-blown programming
languages rich underlying theory separation of implementation and design
Object-Oriented (GE Smallworld)an extension of object-oriented programming no generally agreed upon formal data model great freedom regarding complex data structures inheritance user-defined types encapsulation
Relational vs non-relation DBMSSQL DatabasesAlso known as relational databases, define and manipulate data based on structured query language (SQL). These are most popularly used and useful for handling structured data that organizes elements of data and standardizes how they relate to one another and to different properties.NoSQL DatabasesAlso known as non-relational databases, allow you to store and retrieve unstructured data using a dynamic schema. NoSQL is popularly used for its flexible ability to create a unique structure, and can be document, graph, column, or even KeyValue organized as a data structure.SQL has had a large lead over the non-relational alternatives for decades, but NoSQL is quickly closing the gap with popular databases such as MongoDB, Redis, and Cassandra.
Source
https://scalegrid.io/blog/2019-database-trends-sql-vs-nosql-top-databases-single-vs-multiple-database-use/
Current DBMS Market Shares
https://scalegrid.io/blog/2019-database-trends-sql-vs-nosql-top-databases-single-vs-multiple-database-use/https://scalegrid.io/blog/2019-database-trends-sql-vs-nosql-top-databases-single-vs-multiple-database-use/
Relational DBMS (2)Most popular type of DBMS∼60% of data in a DBMS is in a RDBMS (increasingly data is stored in MongoDB, a NoSQL platform based on JSON)
https://www.mongodb.com/https://en.wikipedia.org/wiki/JSON
Outline
Linking data to place
Definitions
Characteristics of DBMS
Types of databases
Relational model
SQL
Database designMongoDB
Relational DBMS (1)
Data stored as tuples (tup-el), conceptualized as tables
Table – data about a class of objectsTwo-dimensional list (array)Rows = objectsColumns = object states
(properties, attributes)
Row = record = tuple[# rows = cardinality]
Column = field = attribute = property[# of columns = degree]
Table = file = relation
FID = Primary Key= Index (FeatureID)
Row = object
Table = Object Class
Table
Foreignkey
Relation Rules (Codd, 1970)
Only one value in each cell (intersection of row and column)
All values in a column are about the same subject
Each row is unique
No significance in column sequence
No significance in row sequence
NormalizationProcess of converting tables to conform to Codd’srelational rules
Split tables into new tables that can be joined at query time
The relational join
Several levels of normalization (reduce redundancy)Forms: 1NF, 2NF, 3NF, etc.
Normalization creates many expensive joins
De-normalization is OK for performance optimization
Tax assessment database
Joined table
Data partially normalized into three subtables
Relational Join
Fundamental query operation (ArcGIS)
Occurs becauseNormalizationData created/maintained by different users, but integration needed for queries
Table joins use common keys (column values -- foreign keys)
Table (attribute) join concept has been extended to geographic case
http://desktop.arcgis.com/en/arcmap/latest/manage-data/tables/about-joining-and-relating-tables.htm
Outline
Linking data to place
Definitions
Characteristics of DBMS
Types of database
Relational model
SQL
Database design
SQL
Structured (or Standard) Query Language –(pronounced SEQUEL)
Developed by IBM in 1970s
Now de facto and de jure standard for accessing relational databases
Three types of usageStand alone queriesHigh level programmingEmbedded in other applications
Types of SQL Statements
Data Definition Language (DDL)Create, alter and delete dataCREATE TABLE , CREATE INDEX
Data Manipulation Language (DML)Retrieve and manipulate dataSELECT, UPDATE, DELETE, INSERT
Data Control Languages (DCL)Control security of dataGRANT, CREATE USER, DROP USER
Outline
Linking data to place
Definitions
Characteristics of DBMS
Types of database
Relational model
SQL
Database design
Steps involved in database creation
Data investigation: consider the type, quantity and qualities of data to be included in the database; the nature of the entities and attributes is decided (inventory of data, needs analysis).
Data modeling: form a conceptual model of data by examining the relations between entities and the characteristics of entities and attributes (logical design--infological model).
Steps involved in database creation
Database design: creation of a practical design for the database. This step depends upon and is constrained by the software being used. Field names, specific attribute types and structures (e.g., tables) are decided (physical design--datalogical model).
Database implementation: populating the database with attribute data. This is followed by monitoring and upkeep, fine tuning, modification and updating.
Database design perspectivesInfological problems deal with how to define the information to be provided by the system to satisfy the needs of its users.
Datalogical problems are about how to design the structure and operation of the system and to take full advantage of current information technology available.
Essentially, infological work refers to system analysis and conceptual modeling, and datalogical work to technical design and physical implementation of the system.
Data Model Levels
Increasingabstraction
Reality
Conceptual Model
Logical Model
Physical Model
Human-oriented
Computer-oriented
Infological
Datalogical
Relate back to Objects/Fields and Vector/Raster
ERMprocess
ERM
The identification of entities
The identification of relations between entities
The identification of attributes of entities (infologicalsteps)
The derivation of tables from this (datalogical steps)
Entity Relationship Modelling
ULM
http://edndoc.esri.com/arcobjects/9.2/NET_Server_Doc/manager/geodatabase/designing_a_geodatabase/a_note_a584253231.htm
ERM
Entity Entity NameRelationship Name
Oversight
Relationship
MandatoryExistence
Optional Existence
Attribute Name
Regional Authority
Water Dept
Municipality
EngineeringOne Many
ERM Relations
Mapping an ER Model into a table.
Example of a 1:M relation(ArcGIS example)
Example of a M:N relation(ArcGIS relations)
Student
Address
DoBStudentID
Student Name Address Student ID Date of Birth
Jane Doe Dunbar 456123 12/03/90
Jim Smith Kits 876562 01/10/89
Example of 1:1 relations
Students Courses
Course Instructor
http://support.esri.com/cn/knowledgebase/techarticles/detail/37544http://desktop.arcgis.com/en/arcmap/latest/manage-data/relationships/benefits-of-relationship-classes.htm
Database design perspectives
Prof. Börje Langefors recognized the importance of three contexts in the infological approach. They are the “organizational context, wherein organized collections of people/individuals are perceived; the language context, wherein organized collections of symbols and linguistic behaviors are perceived; and technical context, wherein organized collections of technical artifacts (computers, telecommunication technologies, software) are perceived” (Iivari & Lyytinen, 1998; p. 170).
http://isworld.student.cwru.edu/tiki/tiki-index.php?page=Langefors_Review
This quote also describes the situation with respect to a GIS within an organization.
Summary
Database – an integrated set of data on a particular subject
Databases offer many advantages over files
Relational databases dominate, but losing their dominance
Database design issues require careful consideration
Newer models, such as NoSQL and “NewSQL”, are rising as cloud computing and unlimited storage capacity become commonplace
http://en.wikipedia.org/wiki/NoSQLhttp://newsql.sourceforge.net/
Attribute data handling in a GIS environment.OutlineLinking data to placeGIS DataStoring attribute dataStoring attribute dataStoring attribute dataSlide Number 8OutlineDefinitionsStoring dataSimple file structuresMoving from files���to� databasesAdvantages of DBs over filesHistorical disadvantages of DBs over filesCharacteristics of DBMS (1)Characteristics of DBMS (2)Characteristics of DBMS (3)Role of DBMSOutlineTypes of DBMS ModelsHierarchical Hierarchical and NetworkRelational Tables: Topological data modelObject-oriented DBMSOverviewRelational vs non-relation DBMSCurrent DBMS Market SharesRelational DBMS (2)OutlineRelational DBMS (1)TableRelation Rules (Codd, 1970)NormalizationSlide Number 35Relational JoinOutlineSQLTypes of SQL StatementsOutlineSteps involved in database creationSteps involved in database creationDatabase design perspectivesData Model LevelsERMERMERM RelationsDatabase design perspectivesSummarySlide Number 50